0 tayangan

Diunggah oleh varunsingh214761

Probe Vehicle Data

- Synopsis for Face Recognition Algorithm
- Regression Techniques
- Chapter 4 Strat
- facerec1.pdf
- Application of Multivariate Control Chart for Improvement in Quality of Hotmetal - A Case Study
- Bond Risk Premia, Macro Economic Fundamentals and the Exchange Rate
- Solution Manual
- wp65_13.pdf
- Animals-03-01002 Perceptions (Asli) of Threatened Species Amnd Their Management on Urban Beaches (2013, 19 Pages)
- Studiu despre Ambitie
- Seixas Et Al 2016
- Electrical and Electronics Engineering
- data_pre-processing-1.pdf
- 2010_JNeurosciMethods_TSDSS
- Ramadan-30-1-2003
- Lever 2017
- 8D Presentation
- ANR
- Regression
- 2010-06-24_043236_Specialty_Toys

Anda di halaman 1dari 9

Repairing the Probe Vehicle Data

Zhaosheng Zhang, Diange Yang, Tao Zhang, Qiaochu He, and Xiaomin Lian

Abstract—Probe vehicle data are being increasingly applied error and data loss, which may increase the risk of instability

in urban dynamic traffic data collection. However, the mobility when these data are used in transportation systems. Therefore,

and scale limit of probe vehicles may lead to incomplete or in- data cleaning and repair are very important tasks for obtaining

accurate data and thus influence the measurement of the state

of traffic. At present, probe vehicle data are usually repaired by accurate dynamic traffic information [8].

linear interpolation or a historical average method, but the repair

accuracy is relatively low. To address the given problems, the II. R ELATED W ORK

multithreshold control repair method (MTCRM) was proposed

to clean and repair the probe vehicle data. The MTCRM adopts Probe vehicle data cleaning involves identifying errors in the

threshold control and a rule based on the approximate normal- raw data and then filtering or removing them. The removed

ization transform to clean abnormal traffic data and to fill in the data are regarded as missing information and are completed

missing data by a weighted average method and an exponential

by data repair. At present, although methods for traffic flow

smoothing method. In this approach, we combine topological road

network characteristics to fill in the missing data from data for data have been well developed, few studies have investigated

neighboring road sections and repair noisy data by reconstructing the cleaning and repair of probe vehicle data. Jarema et al. [9]

the principal components. This paper mainly focuses on analyzing proposed treating missing data by a historical average method

the component of the recurring pattern of probe vehicle data, that replaces the incorrect or missing data with historical data

which can provide guidelines for the subsequent traffic forecasts.

from the same period. Then, Jiang et al. [10] improved the

The findings of data repair for different grades of road in Beijing,

China, demonstrate that the mean repair error may meet the method by considering historical traffic data, average values of

requirements of traffic-state measurement, demonstrating that adjacent time periods, and data of the adjacent road sections

MTCRM can effectively clean probe vehicle data. to correct the missing data. Jacobson et al. [11] suggested

Index Terms—Data cleaning, data repair, multithreshold con- the threshold control algorithm, which identifies abnormal data

trol repair method (MTCRM), normalization transform, probe based on the idea that the values of traffic flow parameters for

vehicle data, reconstruction of principal components. a certain time interval should be within a reasonable range.

According to this principle, traffic parameters outside this range

I. I NTRODUCTION are recognized as incorrect data. However, in this method, it

is difficult to accurately determine the threshold values for the

W ITH THE rapid development of vehicle navigation sys-

tems, dynamic traffic data are being more widely ap-

plied [1]–[3]. Probe vehicles are widely used to collect dynamic

data range. Coifman [12] developed a method for determining

whether the data are reasonable using three parameters (vehicle

speed, flow, and occupancy). However, the application of their

traffic data because of their wide coverage, high precision,

method is limited because this method requires traffic flow

and excellent real-time performance [4], [5]. However, not

conditions to be determined and a valid range of average vehicle

all road sections can be covered by enough probe vehicles

length given. Some improved methods, such as analyses of

because of their high mobility and limited number [6]. Wireless

traffic flow data by clustering methods, have been suggested

communication may also cause loss of data; therefore, probe

in [13] and [14].

vehicle data may be incorrect or incomplete and thus affect the

All the given studies are based on traffic flow data, which

accuracy of traffic-state measurement. Reference [7] points out

include several types of information such as vehicle speed, flow,

that 50% of the collected traffic data have problems such as data

and occupancy. In cases of missing single-property data, the

traffic data can be completed by the other data. However, probe

Manuscript received June 18, 2012; revised August 19, 2012; accepted vehicle data only include vehicle speed information and cannot

August 26, 2012. Date of publication September 20, 2012; date of current

version February 25, 2013. This work was supported by the National High-tech

be cleaned by methods developed based on traffic flow. With

R&D Program (863 Program) under project (2012AA111901). The Associate the wide use of probe vehicle data [15]–[23], processing these

Editor for this paper was J. A. Miller. data is becoming increasingly important. Yu et al. [24] filled the

Z. Zhang, D. Yang (Corresponding author), T. Zhang, and X. Lian

(Corresponding author) are with Department of Automotive Engineering, missing data (less than four data points) by linear interpolation

Tsinghua University, Beijing 100084, China (e-mail: zzs08@mails. while deleting the source data for days, missing more than four

tsinghua.edu.cn; ydg@mail.tsinghua.edu.cn; zhang-t@mails.tsinghua.edu.cn; data points. Lv et al. [25] developed a nonparametric regression

lianxm@tsinghua.edu.cn).

Q. He is with Department of Industrial Engineering and Operation method for repairing missing data, although their approach

Research, University of California, Berkeley, CA 94709 USA (e-mail: requires a large amount of historical data under various traffic

heqc0425@berkeley.edu). conditions and is thus unsatisfactory when sufficient data are

Color versions of one or more of the figures in this paper are available online

at http://ieeexplore.ieee.org. unavailable. In summary, methods based on traffic flow cannot

Digital Object Identifier 10.1109/TITS.2012.2217378 be applied to repairing probe vehicle data; therefore, probe

1524-9050/$31.00 © 2012 IEEE

420 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 14, NO. 1, MARCH 2013

lation or historical average methods. As a result, the repair

precision is low, which may influence the accuracy of traffic-

state measurement.

In this paper, based on the periodicity and spatial relation-

ships of probe vehicle data, we develop a new method for

cleaning incorrect data by thresholding and a 3σ rule method

based on a pseudonormalization transformation. Missing data

are completed by numerical analysis taking into account the

topological characteristics of the road network. Additionally,

noisy data are filtered by principal component reconstruction

to further improve the accuracy of the data repair. It should Fig. 1. Changes of vehicle speed (z-axis, in km/h) with time (x-axis) and date

be noted that the content of this paper converged into single- (y-axis).

property data (speed) process, for which, in general, these

probe vehicle data combined with speed, occupancy, and flow

together cannot be directly provided by the traffic data provider.

As a result of the regularity of urban traveling, vehicle speed

shows periodicities and trends. The data collected on a road

section over multiple days can be expressed as a matrix X in

⎡x x ··· x ⎤

1,1 1,2 1,N

⎢ x2,1 x2,2 ··· x2,N ⎥

X=⎢

⎣ .. .. .. .. ⎥

⎦ (1)

. . . .

xM,1 xM,2 ··· xM,N

where xi,j represents the vehicle speed at moment j on

day i, M is the number of days of data collection, and

N is the number of data collected per day. The row vec-

Fig. 2. Removal of abnormal traffic data. The symbol +shows historical

tor Xi = xi,1 , xi,j , . . . , xi,N of X represents the vehi- average data. shows the upper threshold. shows the lower threshold.

cle speeds recorded at different moments on day i and × shows measurement data. The ellipse indicates detected abnormal data.

is called the date vector; analogously, the column vector

Xj = x1,j , xi,j , . . . , xM,j of X represents the vehicle speeds 2) Conditions of excessive missing data. When large

recorded at the same time point but on different days and is amounts of data are missing, the characteristics of traf-

called the moment vector. fic operation cannot be accurately described by the es-

If collection interval is shorter, then the vehicle speed varia- tablished traffic model; therefore, these data should be

tion and the noise will be greater, which may add to the difficul- deleted. If the missing data volume for one day reaches

ties of data processing. Conversely, a longer acquisition interval 10% of the total data or the continuously missing data

offers a smoother curve but does not reflect the real-time exceeds 5%, the data for the entire day are discarded.

conditions of traffic because of the lower acquisition frequency. 3) Identification and filtering of abnormal data. The col-

The Highway Capacity Manual of the USA recommends that lected vehicle speed data may deviate from the normal

an acquisition interval of 5 min is appropriate. Therefore, in speed data owing to factors such as traffic accidents and

our probe vehicle system, a 5-min collection interval is used, weather factors. Although the deviation may be reason-

which gives 288 data points per day. able considering that road traffic is unavoidable, as seen

Fig. 1 shows the changes of vehicle speed with time and from a long historical trend, such speed flow is occasional

date. The speed varies slightly from day to day, but the general and uncertain. This paper mainly focuses on analyzing the

trend remains similar. In this paper, missing or incorrect probe component of the recurring pattern of the probe vehicle

vehicle data are repaired based on the characteristics of time data, which can provide guides for the subsequent traffic

series and similarity analysis of adjacent roads. forecasts; however, this uncertainties may interfere with

the study of the characteristics of vehicle speed and traffic

planning. Thus, these data were removed from this paper.

A. Raw Data Screening

Fig. 2 shows the calculated mean value X j for each moment

Before further careful screening, raw data are rough-screened vector Xj , where the standard deviation of Xj is σj . The

by the following three steps to remove invalid data. 85% confidence interval of the moment vector Xj is [X j −

1) Filtering negative data. Negative vehicle speed data are 1.44σj , X j + 1.44σj ]. For the date vector Xi , if 5% of the

obviously errors; therefore, these data are removed and data are consecutively outside of the confidence interval, then

replaced with 0. the vector contains abnormal data.

ZHANG et al.: STUDY ON THE METHOD FOR CLEANING AND REPAIRING THE PROBE VEHICLE DATA 421

Fig. 3. Q–Q plots showing normality of the data before (left) and after (right) pseudonormalization transformation.

Instantaneous data are considered abnormal if they greatly

deviate from the center of the distribution. In this paper, a

probability model is established for the moment vector Xj , and

the data are then cleaned based on a 3σ rule.

The 3σ rule is applicable to data that follow a normal distri- Fig. 4. Schematic showing spatial distribution of road sections in a local

bution. Therefore, the vector Xj is first checked for normality area. Black circles represent road junctions. Lines represent roads. Numbers

with a quantile–quantile (Q–Q) plot. If it does not satisfy represent IDs of roads. Arrows point in the direction of traffic.

normality, the data in the vector should be transformed to a

IV. T RAFFIC DATA R EPAIR

pseudonormal distribution using a modified power function, as

shown in A. Missing Data Repair

xγ −1

(γ) γ , γ = 0 Missing data are very common in data collection and may be

x = (2) either isolated or consecutive. In this paper, the missing data are

ln(x), γ = 0.

processed, as explained in the following.

For the measured data values x1 , x2 , . . . , xM , Box and Cox For isolated missing data xt , a weighted average method is

[26] give a method for calculating the optimal exponential γ used for data repair. Contrary to the mean value repair method,

in (3), which makes the equation to give the maximum value, using a weighted average takes advantage of trends over time,

as follows: thus reducing the influence of variations in the adjacent data, as

1

(γ) 2

M shown in the following:

M

l(γ) = max − ln x −x (γ)

M i=1 i 1

2 T

x̂t = wk · xt+k (k = 0). (5)

M

W

+ (γ − 1) ln(xi ) (3) k=−T

i=1

In (5), x̂t is the repaired missing data, wk is the weighting

where coefficient, W is the sum of all weighted coefficients, and

T is the maximum interval for repairing data. Note that wk

1

xγ − 1

M

x(γ) = . (4) decreases further from the missing measurement point. The

M i=1 γ maximum interval T of the neighboring data is set to 3, and

the corresponding weight coefficients wk are set to 0.7, 0.2 and

To reduce the computational complexity, γ should be within 0.1, respectively. Considering that vehicle speed–time curves

the interval (0, 5). Fig. 3 shows data before and after this trans- usually display clear trends of increase or decrease, continuous

formation. The closer the sample data approach the straight missing data are repaired by a secondary exponential smoothing

line, the more the samples comply with a normal distribution. method as follows:

It is shown in Fig. 3 that the transformed data (x(γ) ) fol-

low a pseudonormal distribution. Thus, the confidence inter- x̂t+r = at + bt r (r = 1, 2, . . .) (6)

val of x(γ) can be determined using the 3σ rule: [x(γ) −

3σ (γ) ), x(γ) + 3σ (γ) )], where σ (γ) represents the standard where at and bt are intermediate variables, which are given in

deviation of x(γ) , and abnormal data are deleted based on the the following:

confidence interval. After the data identified as abnormal are (1) (2)

at = 2Qt − Qt

removed, the missing data are repaired by a weighted average (1) (2) (7)

method or an exponential smoothing method.

α

bt = 1−α Qt − Qt .

422 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 14, NO. 1, MARCH 2013

TABLE I

C ORRELATION C OEFFICIENTS OF N EIGHBORING ROAD S ECTIONS

(1) (2)

In (7), Qt is the primary exponential smoothing value, Qt In this paper, reconstruction of principal components is used to

is the secondary exponential smoothing value, and α ∈ (0, 1) is remove high-frequency noisy data. Compared with other noise

the smoothing coefficient, i.e., eliminating methods, reconstruction of principal components is

able to process the data for several days, and the regularity and

(1) (1)

Qt = αxt + (1 − α)Qt−1 trends of time series are also used, thereby reducing the volume

(2) (1) (2) of data processing and improving the precision at the same time.

Qt = αQt + (1 − α)Qt (8)

For a sampling data matrix X, each date vector corresponds

to a variable, and each moment vector corresponds to a sample.

Additionally, the quality of data repair can be improved by

The covariance matrix S for recording samples is

taking into account spatial similarities in traffic information,

such as traffic data for the upstream and downstream road 1

N

sections and nearby road sections. The experiment described S= (Xj − X j )(Xj − X j )T (11)

N − 1 j=1

in Fig. 4 provides the correlation between neighboring road

sections at some time.

Consider segment 23010 in Table I. The correlation coeffi- where T is the transpose of the matrix, λ1 λ2 · · · λM

cient with the neighboring road sections is relatively large but are the eigenvalues of S, and U{uj |j = 1, 2, . . . , M } indicates

decreases as the distance increases. Specifically, the correlation the corresponding orthogonal unit eigenvector matrix. The prin-

between the upstream and downstream road sections and seg- cipal component matrix Y of the data matrix X is given as

ment 23010 is relatively large, whereas the correlation between ⎡y y ··· y ⎤

1,1 1,2 1,M

segment 23010 and the parallel segments is relatively small; ⎢ y2,1 y2,2 ··· y2,M ⎥

therefore, the correlation coefficient can be calculated by the Y = XT U = ⎢

⎣ .. .. .. .. ⎥

⎦. (12)

adjacent road section. . . . .

The speed on road section 23010 can be calculated based on yN,1 xN,2 ··· yN,M

the similarities with adjacent road section, as expressed in the The mth main component contribution rate Zm of the principal

following: component matrix Y is shown as

Hω

M

x̂ω

t = βhω · xω

h (h) (9) Z m = λm λi (m = 1, 2, . . . , M ). (13)

h=1 i=1

where ω is the ID number of road that missing data, H ω is the Then, the contribution rate of the first m principal compo-

total number of adjacent segments of road ω, h is the sequence s

nents, Zm is

number of adjacent road, xω h (h) is the vehicle speed on adjacent

m

road h at time t, and βhω is a weight factor given in λi

s i=1

rhω Zm = . (14)

βhω = . (10) M

Hω λi

rτω i=1

τ =1

The principal components of matrix Y are used to recon-

In (10), rhω is the coefficient of correlation between the road struct the original data matrix X. Taking the first p principal

section that is missing data and its neighboring road section. In components with contribution rates Zm s

95%, the recon-

general, the correlation coefficients are different for different

structed matrix X is given as

segment at each day; thus, for the data without correlation ⎡y

coefficient, rhω was set to the average value of all the correlation 1,1 y y

1,2 ··· 0⎤

1,p

coefficients at those days earlier than the appointed day. ⎢ y2,1 y2,2 y2,p ··· 0⎥

X = ⎢

⎣ .. .. .. .. .. ⎥

⎦U

−1

(15)

. . . . .

B. Repair Method for Noisy Data yN,1 xN,2 yN,p ··· 0

In traffic flow analyses, dimension reduction is frequently where U −1 represents the inverse of U.

used to isolate the important information in the data. Principal The effect of noise reduction using this PCA-based method

component analysis (PCA) is a main tool for dimensionality is shown in Fig. 5. After the high-frequency noise in the data is

reduction. Earlier research has suggested that traffic flow can reduced, the curve is smoother, and the fluctuations are smaller,

be classified into an eigenflow plus noise [27]. These noise data assuming that the transient characteristics of speed data remain

refer to those data that cannot reflect the traffic characteristic. normal.

ZHANG et al.: STUDY ON THE METHOD FOR CLEANING AND REPAIRING THE PROBE VEHICLE DATA 423

methods described earlier.

The experimental results were evaluated using the following

indicators.

relative error (rerr):

x̂ − x

rerr =

x

mean absolute relative error (mrerr):

n

1

x̂i − xi

mrerr =

n i=1 xi

x̂i − xi

marerr = max

xi

Fig. 5. Comparison of traffic data before and after PCA noise reduction. The

symbol ∗ shows raw data before noise reduction. + shows PCA data after noise coefficient of equivalence (EC):

reduction. − shows noise.

n

(x̂i − xi )2

i=1

EC = 1 − .

n

2

n

(x̂i ) + (xi )2

i=1 i=1

EC reflects the match error between the actual data and the

repair data. A value of EC > 0.9 was considered to indicate an

excellent repair.

When repairing the data with the weighted average method,

the maximum interval T of the neighboring data is set to 3, and

the corresponding weight coefficients wk were set to 0.7, 0.2

and 0.1, respectively. For consecutive missing data, the expo-

nential smoothing coefficient (α) was empirically determined

as 0.5, and the neighboring segments in speed estimation by

neighbor segment spatial similarity were H = 5. Expressway

data repair by the given method is shown in the following.

In Fig. 7, the data marked with stars are raw data, the data

Fig. 6. Standardized residual and autocorrelation function of traffic data after

marked with round solid points are modified data, and the

PCA noise-reduction treatment. data marked with crosses are man-made abnormal traffic data.

The right side of Fig. 7 shows a magnified portion of the left

The standard deviations of the residual and autocorrelation side of Fig. 7. Noise is removed from the modified data by

function are shown in Fig. 6. Almost all (96.5%) of the standard reconstruction of principal components. Compared with the

residual deviations fall in the range (−2, 2), and the autocorre- raw data, the processed vehicle speed not only maintains the

lation with time lag approaches zero, indicating that the residual transient characteristics but also eliminates the high-frequency

is a normally distributed white noise. noise in the data, thus smoothing the speed curve and laying a

foundation for subsequent traffic-state data identification. The

V. E XPERIMENTAL VALIDATION traffic abnormal data in different times (one day) were artifi-

Different types of roads are associated with different den- cially randomly introduced and were processed by MTCRM,

sities of probe vehicles and, therefore, different data qualities. conventional linear interpolation, and history average methods,

We validated the above data cleaning/repair methods with data respectively. The results are compared and shown in Fig. 8.

collected on four types of roads in Beijing, China: freeways, The errors of the repaired data are shown in Table II. As

express roads, arterial roads, and minor arterial roads. Probe shown in Table II, compared with the conventional historical

vehicle data are provided by traffic service providers; it col- average and linear interpolation method, MTCRM shows sig-

lected from 23 000 vehicles, and these data covered 85% of nificant advantages in precision and optimality. The 89% of the

the minor arterial roads and superior roads in Beijing with an relative error for the expressway is within 3%, and the average

interval of 5 min between January 1, 2011 and March 30, 2011. absolute relative error is 2.3%. The repair error statistics for

As the missing and incorrect data were removed, the vehicle different types of abnormal data are shown in Table III, where

speed data within 81 days were left. Missing and incorrect more than two consecutive abnormal data points are called

data were artificially introduced to information recorded on consecutive abnormal data.

424 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 14, NO. 1, MARCH 2013

TABLE II TABLE III

E RROR C OMPARISON FOR D IFFERENT R EPAIR M ETHODS S TATISTICS OF D IFFERENT T YPES OF A BNORMAL DATA E RRORS

As shown in the given table, discrete abnormal data were and the maximum error of consecutive abnormal data is 4.1%.

repaired relatively accurately, with a maximum relative error of The repair results for minor arterial roads, arterial roads, and

2.5%. In cases of multiple consecutive neighboring abnormal freeways are shown in Fig. 9.

ZHANG et al.: STUDY ON THE METHOD FOR CLEANING AND REPAIRING THE PROBE VEHICLE DATA 425

Fig. 9. Repair of vehicle speed data recorded on different types of roads. (a) Minor arterial road data repair. (b) Minor arterial road repair error. (c) Arterial road

data repair. (d) Arterial road repair error. (e) Freeway data repair. (f) Freeway repair error.

As shown in Fig. 9, the relative error of minor arterial road data Figs. 8 and 9 also clearly indicate morning and evening rush

repair is relatively large, whereas the error associated with the hours on the four types of roads. The vehicle speeds are sub-

freeway is smaller, with a relative error within ±1.5% for 95% stantially lower during rush hours and higher during the night.

of the time, and within ±3% even for time periods with obvious Additionally, the rush hour patterns were consistent on the

speed fluctuation. The difference can be mainly attributed to dif- four types of roads, and such periodicity and consistency can

ferent vehicle speeds and the number of probe vehicles on the two facilitate the identification and repair of traffic data. Table IV

types of roads. Compared with freeways, there are fewer probe lists indicators of repair error other than errors resulting from

vehicles, and the vehicle speed is lower on minor arterial roads. historical averaging.

426 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 14, NO. 1, MARCH 2013

TABLE IV

E RROR I NDEXES FOR D IFFERENT ROAD T YPES

The errors calculated for different road types are shown in [8] X. Y. Wang, J. L. Zhang, and X. Y. Yang, The Theoretical Approaches of

Table IV. For the repair of data from express roads, for which Traffic Flow Data Cleaning and State Identify as well as Optimize Control.

Beijing, China: Science, 2011, pp. 13–18.

there is smaller data fluctuation, the mean absolute error was [9] F. Jarema, C. Dahlin, and R. Gillmann, “FHWA study tour for european

only 0.76%, and the EC was 0.96. The quality of probe vehicle traffic monitoring programs and technologies,” Federal Highway Admin.,

data is significantly improved without requiring more probe U.S. Dept. Transp., Washington, DC, 1997.

[10] G. Y. Jiang, L. H. Gang, X. D. Zhang, and J. F. Wang, “Malfunction

vehicles or additional data processing equipment, indicating identifying and modifying of dynamic traffic data,” J. Traffic Transp. Eng.,

that MTCRM can effectively clean probe vehicle data. vol. 4, no. 1, pp. 121–125, Jan. 2004.

[11] L. N. Jacobson, N. L. Nihan, and J. D. Bender, “Detecting erroneous loop

detector data in a freeway traffic management system,” Transp. Res. Rec.,

VI. C ONCLUSION vol. 1287, pp. 151–166, Mar. 1990.

[12] B. Coifman, “Improved velocity estimation using single loop detectors,”

The precision of the current repair and cleaning method for Transp. Res. A, Policy Pract., vol. 35, no. 10, pp. 863–880, Dec. 2001.

probe vehicle data is low, which may influence the accuracy [13] X. Y. Gong, “Traffic flow data filtering algorithms based on data mining,”

in Proc. Nat. ITS Syst. Traffic Inf. Collect. Integr. Tech., Hangzhou, China,

of traffic-state prediction. In this paper, we have developed 2003, pp. 163–173.

new numerical analysis methods for cleaning probe vehicle [14] L. Sun and J. Zhou, “Development of multi regime speed-density rela-

data. A new method is presented for data repair and cleaning tionships by cluster analysis,” Transp. Res. Rec., vol. 1934, pp. 64–71,

Feb. 2005.

using principal component reconstruction based on the traffic [15] A. Simroth and H. Zähle, “Travel time prediction using floating car data

information and topological characteristics of the road network. applied to logistics planning,” IEEE Trans. Intell. Transp. Syst., vol. 12,

Experimental validation shows that abnormal data can be no. 1, pp. 243–253, Mar. 2011.

[16] J. F. Ehmke, S. Meisel, and D. C. Mattfeld, “Floating car based travel

effectively identified by a 3σ rule based on an approximate times for city logistics,” Transp. Res. C, Emerging Technol., vol. 21, no. 1,

normalization transform. Principal component reconstruction pp. 228–352, Apr. 2012.

can effectively exploit the periodicity and trends in traffic data [17] B. Mehran, M. Kuwahara, and F. Naznin, “Implementing kinematic wave

theory to reconstruct vehicle trajectories from fixed and probe sensor

to reduce the influence of noise on probe vehicle data and data,” Transp. Res. C, Emerging Technol., vol. 20, no. 1, pp. 144–163,

improve the data accuracy. This paper has focused on the single- Feb. 2012.

property data (speed) process in probe vehicle data as those [18] Q. Ou, R. L. Bertini, J. W. C. van Lint, and S. P. Hoogendoorn, “A theoret-

combined with speed, occupancy, and flow together cannot be ical framework for traffic speed estimation by fusing low-resolution probe

vehicle data,” IEEE Trans. Intell. Transp. Syst., vol. 12, no. 3, pp. 747–

directly provided by the traffic data provider; nevertheless, we 756, Sep. 2011.

believe our newly developed method has great potential for [19] J. J. V. Díaz, D. F. Llorca, A. B. R. González, R. Q. Mínguez, Á. L.

future transportation research. Llamazares, and M. Á. Sotelo, “Extended floating car data system: Exper-

imental results and application for a hybrid route level of service,” IEEE

Trans. Intell. Transp. Syst., vol. 13, no. 1, pp. 25–35, Mar. 2012.

[20] S. Breitenberger, K. Bogenberger, M. Hauschild, and K. Laffkas, “Ex-

R EFERENCES tended floating car data—An overview,” in Proc. World Congr. Intell.

[1] T. Zhang, D. G. Yang, T. Li, K. Q. Li, and X. M. Lian, “An improved Transp. Syst., Madrid, Spain, Nov. 2003.

virtual intersection model for vehicle navigation at intersections,” Transp. [21] L. Lin, T. Osafune, and M. Lenardi, “Floating car data system enforce-

Res. C, Emerging Technol., vol. 19, no. 3, pp. 413–423, Jun. 2011. ment through vehicle to vehicle communications,” in Proc. 6th Int. Conf.

[2] J. W. Ding, C. F. Wang, F. H. Meng, and T. Y. Wu, “Real-time vehicle ITS Telecommun., Jun. 2006, pp. 122–126.

route guidance using vehicle-to-vehicle communication,” IET Commun., [22] S. Maerivoet and S. Logghe, “Validation of travel times based on cellular

vol. 4, no. 7, pp. 870–883, Apr. 2010. loating vehicle data,” in Proc. Eur. Congr. Intell. Transp. Syst., Aalborg,

[3] F. Dion, J. S. Oh, and R. Robinson, “Virtual testbed for assessing probe Denmark, Jun. 2007.

vehicle data in IntelliDrive systems,” IEEE Trans. Intell. Transp. Syst., [23] S. Messelodi, M. Modena, M. Zanin, F. G. B. De Natale, F. Granelli,

vol. 12, no. 3, pp. 635–644, Sep. 2011. E. Betterle, and A. Guarise, “Intelligent extended floating car data col-

[4] J. E. Naranjo, F. Jiménez, F. J. Serradilla, and J. G. Zato, “Comparison lection,” Expert Syst. Appl. Int. J. Arch., vol. 36, no. 3, pp. 4213–4227,

between floating car data and infrastructure sensors for traffic speed es- Apr. 2009.

timation,” in Proc. 13th Int. IEEE Conf. Intell. Transp. Syst., Workshop [24] L. Yu, L. Yu, Y. Qi, and H. Wen, “Traffic incident detection algorithm for

Emergent Coop. Technol. Intell. Transp. Syst., 2010. urban expressways based on probe vehicle data,” J. Trans. Syst. Eng. Inf.

[5] Z. Yang, “Research and implementation of large-scale FCD processing,” Technol., vol. 8, no. 4, pp. 36–41, Aug. 2008.

M.S. thesis, Dept. Pattern Recogn. Intell. Syst., Univ. Sci. Technol. China, [25] W. F. Lv, Y. Liang, T. Y. Zhu, and D. D. Wu, “An FCD compensation

He Fei, China, 2010. model based on traffic condition trends matching,” in Proc. 4th ICCIT,

[6] J. E. Naranjo, F. Jiménez, F. J. Serradilla, and J. G. Zato, “Floating car 2009, pp. 1201–1206.

data augmentation based on infrastructure sensors and neural networks,” [26] G. E. P. Box and D. R. Cox, “An analysis of transformations,” J. R. Stat.

IEEE Trans. Intell. Transp. Syst., vol. 13, no. 1, pp. 107–114, Mar. 2012. Soc. B, Methodol., vol. 26, no. 2, pp. 211–252, Apr. 1964.

[7] M. Zhong, P. Lingras, and S. Sharma, “Estimation of missing traffic [27] A. Lakhina, K. Papagiannaki, M. Crovella, C. Diot, E. D. Kolaczyk, and

counts using factor, genetic, neural, and regression techniques,” Transp. N. Taft, “Structural analysis of network traffic flows,” in Proc. SIGMET-

Res. C, Emerging Technol., vol. 12, no. 2, pp. 139–166, Apr. 2004. RICS, 2004, pp. 61–72.

ZHANG et al.: STUDY ON THE METHOD FOR CLEANING AND REPAIRING THE PROBE VEHICLE DATA 427

Zhaosheng Zhang was born in Hezhe, China, in Qiaochu He received the B.S. degree in automo-

1984. He received the B.S. degree in automotive en- tive engineering from Tsinghua University, Beijing,

gineering from Hunan University, Changsha, China, China, in 2011. He is currently working toward the

in 2008. He is currently working toward the Ph.D. Ph.D. degree in operation research with the De-

degree in automotive engineering with the Depart- partment of Industrial Engineering and Operation

ment of Automotive Engineering, Tsinghua Univer- Research, University of California, Berkeley.

sity, Beijing, China. His research interests include convex optimiza-

His research interests include data processing, ve- tion, stochastic processes, and their applications in

hicle navigation, and path planning. service operation management.

Diange Yang received the B.S. and Ph.D. degrees Xiaomin Lian received the B.S., M.S., and Ph.D.

in automotive engineering from Tsinghua University, degrees in automotive engineering from Tsinghua

Beijing, China, in 1996 and 2001, respectively. University, Beijing, China, in 1982, 1986, and 1997,

He is currently an Associate Professor with the respectively.

Department of Automotive Engineering, Tsinghua He is currently a Professor with the Department of

University. His research interests include intelligent Automotive Engineering, Tsinghua University. His

transport systems, vehicle electronics, and vehicle research interests include vehicle Global Positioning

noise measurement. System navigation, vehicle electronics, and vibration

Dr. Yang received the Second Prize from the control.

National Technology Invention Rewards of China

in 2010 and the Award for Distinguished Young

Science and Technology Talent of the China Automobile Industry in 2011.

automotive engineering from Tsinghua University,

Beijing, China, in 2005 and 2010, respectively.

He is currently a Postdoctoral Researcher with the

Department of Automotive Engineering, Tsinghua

University. His research interests include vehicle

navigation and electronic control.

- Synopsis for Face Recognition AlgorithmDiunggah oleh23980hcasdjkn
- Regression TechniquesDiunggah olehRitesh Raman
- Chapter 4 StratDiunggah olehblaujlarp
- facerec1.pdfDiunggah olehRavi Kiran
- Application of Multivariate Control Chart for Improvement in Quality of Hotmetal - A Case StudyDiunggah olehMohamed Hamdy
- Bond Risk Premia, Macro Economic Fundamentals and the Exchange RateDiunggah olehTim
- Solution ManualDiunggah olehAndy Singh
- wp65_13.pdfDiunggah olehopopppo
- Animals-03-01002 Perceptions (Asli) of Threatened Species Amnd Their Management on Urban Beaches (2013, 19 Pages)Diunggah olehPrasetya
- Studiu despre AmbitieDiunggah olehMariana Claudia Panait
- Seixas Et Al 2016Diunggah olehLuanaSeixas
- Electrical and Electronics EngineeringDiunggah olehpuesyo
- data_pre-processing-1.pdfDiunggah olehArika Putri
- 2010_JNeurosciMethods_TSDSSDiunggah olehpraba821
- Ramadan-30-1-2003Diunggah olehanon_490829703
- Lever 2017Diunggah olehcfisicaster
- 8D PresentationDiunggah olehR
- ANRDiunggah olehAfaq Sana
- RegressionDiunggah olehblw2002
- 2010-06-24_043236_Specialty_ToysDiunggah olehJayaKhemani
- BHU PET 2011 StatisticesDiunggah olehaglasem
- 7 bivariate edaDiunggah olehAman Sodi
- Measuring the Impact of Foreign Institutional Investments on S&P CNX Nifty - A Pragmatic StudyDiunggah olehchitkarashelly
- Week 05Diunggah olehKshipra Koranne
- STATISTICAL MODEL (TECHNICAL AND STATISTICAL)Diunggah olehagarwala47
- Building Predictive Models in R Using the Caret Package_Kuhn_2008Diunggah olehsedkol
- tmpA0C2Diunggah olehFrontiers
- Virtual Instrumentation Based Fetal Ecg ExtractionDiunggah olehSajin Sajii
- Vitality Case Final CommentsDiunggah olehJam Shan
- Isoplot3Diunggah olehbefoa

- Ramp MeteringDiunggah olehvarunsingh214761
- Analysis in GISDiunggah olehvarunsingh214761
- Precipitation fluctuations in the Himalaya and its vicinity An analysis based on temperature records from Nepal_2000.pdfDiunggah olehvarunsingh214761
- Precipitation fluctuations in the Himalaya and its vicinity An analysis based on temperature records from Nepal_2000.pdfDiunggah olehvarunsingh214761
- 1-s2.0-S1364815207001867-mainDiunggah olehvarunsingh214761
- CBNT Practical Problem1_CodeDiunggah olehvarunsingh214761
- Lecture 11Diunggah olehvarunsingh214761
- Lecture 05 1Diunggah olehvarunsingh214761
- Traffic Sensing IMU GPSDiunggah olehvarunsingh214761
- Nearest Neighbour algorithmDiunggah olehvarunsingh214761
- Hot_SpotDiunggah olehvarunsingh214761
- Advice Leaflet SCOOTDiunggah olehvarunsingh214761
- Traffic Sensing_IMU_GPS.pdfDiunggah olehvarunsingh214761
- IMPORTANT_Traffic Sensing Through AccelerometersDiunggah olehvarunsingh214761
- Total Station and Its Applications in Surveying - GIS ResourcesDiunggah olehvarunsingh214761
- Lecture 17 Vehicle Routing Problem LectureDiunggah olehvarunsingh214761
- geomedicine.pdfDiunggah olehvarunsingh214761
- c371f47805c345fa84d32ac8a675046e.pdfDiunggah olehvarunsingh214761
- Lecture 16Diunggah olehvarunsingh214761
- Chap_10Diunggah olehvarunsingh214761
- Finalppt Revised 30-04-2018Diunggah olehvarunsingh214761
- Lecture_01.pptDiunggah olehvarunsingh214761
- Lecture 04Diunggah olehvarunsingh214761
- ITS Benefits and Cost ConsiderationDiunggah olehvarunsingh214761
- THESIS Algorithm for Detection of Hot Spots of Traffic Through Analysis of GPS DataDiunggah olehEndless Love
- Lecture 16 Arc Routing ProblemDiunggah olehvarunsingh214761
- Lecture 16 Arc Routing ProblemDiunggah olehvarunsingh214761
- managing-gis-3.pdfDiunggah olehvarunsingh214761
- enterprise-gis.pdfDiunggah olehvarunsingh214761

- Slide WhistleDiunggah olehsfreud1
- digital media assignment lesson planDiunggah olehapi-245443624
- Vitaliano Health EconDiunggah olehparatroop6662000
- english lesson planDiunggah olehapi-304114437
- The Threads of Emotions- PhulkariDiunggah olehEditor IJTSRD
- fs5 episode 1-2Diunggah olehapi-312125739
- Eveskcige Annotated BibliographyDiunggah olehGlenn E. Malone, Ed.D.
- PONGE - AGUKOH - Youth Agenda Participation and Representation of Young Women in Political Parties in KenyaDiunggah olehpongeweb
- Lab Manual - Level 3_mini ProjectDiunggah olehNur Syahira
- AdrDiunggah olehSivam Nirosan
- Automated Pavement Imaging Program (APIP) for Pavement Cracks Classification and QuantificationDiunggah olehTung-Chai Ling
- oral pres rubricDiunggah olehapi-233545455
- wams-METELDiunggah olehSebastián Ribadeneira
- Shoe Export ProjectDiunggah olehPradeep Sharma
- bRAIN gYMDiunggah olehshannay2010
- Jungk-R-Brighter-Than-a-Thousand-Suns--.pdfDiunggah olehSubhas Chandra Ganguly
- Teachers Developing Assessment for Learning Impact on Student AchievementDiunggah olehHaziqBahari
- mmDiunggah olehMohit Sharma
- Up the Anthropologist (Laura Nader).docxDiunggah olehSandro Henique Calheiros Lôbo
- Human Resource Management AuditDiunggah olehAshu Dwivedi
- NCBIDiunggah olehambl12
- modified age of exploration essayDiunggah olehapi-277782109
- Testing of Concrete BlocksDiunggah olehNicole Fajardo De Lemos
- edad 543 assignment 4 walserDiunggah olehapi-334954287
- Mollering Sociology TrustDiunggah olehDanica Risco
- atps2Diunggah olehSingh Anuj
- Macro 2 Assignment 1 2018Diunggah olehNguyễn Tuấn Anh
- Amul Ice CreamDiunggah olehRaj Kumar
- biochimia creieruluiDiunggah olehRadu Badoiu
- A2 - Graphical MethodsDiunggah olehAzer Asociados Sac