Anda di halaman 1dari 4

2016 International Conference on Intelligent Transportation, Big Data & Smart City

Early Warning of Traffic Accident in Shanghai Based on Large Data set Mining
Yang Yanbin, Zhou Lijuan, Leng Mengjun, Sun Ling
Shanghai Maritime University, College of Transport &Communications, Shanghai, 201306, China
429773746@qq.com

Abstract—Through the classification and regression analysis Data mining is the process of extracting knowledge from
on traffic accident statistics in Shanghai from July 2014 to specific forms of data. For specific data, specific issues,
April 2015, the paper puts forward a forecasting model of choosing one or more algorithms to find the hidden rules of
traffic accident incidences, by which we provides the index the data, that is implicit and meaningful knowledge, to
system of traffic accident, including month, week, weather and provide scientific support for decision making. The basic
wind speed. Using this model to calculate the range of traffic process of data mining is as follows:
accident simultaneously. Finally, making decisions
and recommendations for controlling traffic accidents and A. Data preparation
rescue related based on analyzing safe levels, which has Select the data applicable to data mining applications, the
important guiding significance to the traffic accident quality of research data, in order to further analyze the
prevention and traffic safety management in our country.
preparation, and determine the analytical methods to be
Keywords- data mining; traffic accident; regression analysis; carried out. We analyze the main data source of traffic
incidence; safety levels Introduction accidents in Shanghai in recent years. In order to data mining
more effectively , but also includes a number of relevant data,
such as Shanghai's time information, temperature
I. INTRODUCTION information, weather information, etc..
According to the global traffic and police department
B. Data reorganization and conversion
statistics, the number of traffic accidents in the world for
about 500 thousand people last year. There are 104 thousand On the basis of open data of the Shanghai municipal
people in China, accounting for 1/5 of the total number of government, using soda data, public data and private data,
deaths worldwide traffic accidents, ranking first in the world. taking into account the accident data is the government
And a lot of traffic accidents happened because of the statistics and manual sorting, and is mainly used for the
unreasonable setting of the road itself, the need is hurry to analysis of accident statistics, accident data is incomplete,
change the status quo, to reduce the incidence of accidents. redundancy and ambiguity, not for data mining algorithm
At present, the road traffic accident analysis and decision directly, the need for data processing and classification.
etc. basically in the manual processing stage, and manual C. Data mining
processing is the main cause of low efficiency and poor
accuracy of decision analysis of the large amount of data After cleaning and conversion, the original data of the
traffic accident. Therefore, it is imperative to carry out accident is suitable for mining data sets, data mining on this
scientific research and effective improvement on the analysis data set to complete the extraction of knowledge, to find the
and decision making of road traffic accidents. But the appropriate knowledge model for decision analysis. For
existing navigation system only for speeding, and specific data, specific issues, choose one or more data
monitoring of the high incidence of road sweeping voice mining algorithm, find the hidden rules, rules and patterns,
prompt to have shortcomings, in view of the road ahead of and provide the solution to the problem.
the drivers prone to defects, improving vigilance on the D. Result analysis
traffic accidents, the user vigilance, thus reducing the
probability of road accidents. Interpret the results of data mining and evaluate the
This paper makes analysis on whether the various factors results, remove the meaningless part, the meaning of the
of Shanghai traffic accidents influencing traffic accidents. rules or patterns to analyze again, and ultimately to be easy
Through the collation of a large initial record of accident to understand and identify the way to provide decision
data, and screening the influence factors by significance makers.
analysis, to comprise the new accident record. The accident III. ROAD TRAFFIC ACCIDENTS DATA MINING IN
rate model was fitted by Lingo, and the influence factors on SHANGHAI
the traffic accidents rate were derived.
The goal of data mining is to discover hidden and
meaningful knowledge from databases. There are many data
mining algorithms and they applies to broad functional areas,
II. ACCIDENT DATA MINING which includes classification, estimation and prediction,
clustering, association, sequence discovery and
characterization. Regression analysis, time series analysis,
cluster analysis and others are general methods.

978-1-5090-6061-0/17 $31.00 © 2017 IEEE 18


DOI 10.1109/ICITBS.2016.149
For the analysis on Shanghai traffic accident data, West wind 2
considering that this paper is to explore the correlation north wind 3
between Shanghai traffic accidents and various influencing northeaster 4
factors, then obtain the probability of road accidents in all northwester 5
circumstances, pointing out specific measures. Therefore, we southwester 6
expand the analysis from the following aspects. southeaster 7
south wind 8
A. Classification
In order to establish a reasonable index system of traffic Table 5 Wind speed categories
accidents, nine possible influencing factors are selected out, Wind speed Reference values
such as month, week, time, temperature, weather, wind grade 3 1
direction, wind speed, whether there is camera and whether grade 3-4 2
the road is smooth. We classify all factors at first, sorting grade 3-5 3
month by 1 to 12 and week by 1 to 7. Time, temperature, grade 4-5 4
weather, wind direction and wind speed according to the grade 4-6 5
following categories respectively.
Table 1 Time categories
Through significance testing on the correlation between
Time Reference values
the accident frequency and the influencing factors, to screen
0:00-1:59 1 out power factors of accidents. Based on the correlation
2:00-3:59 2 analysis results, choosing and removing the influence factors
4:00-5:59 3 of accidents. Finally, seven influencing factors of month,
6:00-7:59 4 week, time, temperature, weather, wind direction and wind
8:00-9:59 5 speed are ascertain.
10:00-11:59 6
12:00-13:59 7 B. Regression analysis
14:00-15:59 8 First of all, making data processing on the traffic
16:00-17:59 9 accidents frequency corresponding to month, and then we
18:00-19:59 10 knows relation between month and traffic accidents
20:00-21:59 11 frequency by fitting as follows.
22:00-23:59 12

Table 2 Temperature categories


Temperature Reference values
-10-0℃ 1
0-5℃ 2
5-10℃ 3
10-15℃ 4
15-20℃ 5
20-25℃ 6
25-30℃ 7

Table 3 Weather categories Figure 1. relation fitting on month and traffic accidents frequency
Weather Reference values
heavy rain 1 From the chart above, the number of accidents in
thundershower 2 Shanghai occurred at least in February, in the September,
moderate rain 3 October and November occurred more. In February, most
rainstorm 4 people go home for the New Year, the Shanghai traffic
clear 5 volume tends to the lowest, so the number of occurrences are
shower 6 minimum. In the September, October and November, on the
overcast 7 one hand because the students term begins, on the other hand
sleet 8 due to the National Day holiday, and the vehicles increased,
light rain 9 so the number of occurrences also increased and is in line
cloudy 10 with reality.
Based on the analysis of other influencing factors, we
Table 4 Wind direction categories can get the conclusion:
Wind direction Reference values 1) Week
east wind 1

19
The number of accidents on Monday, Thursday and of the regression equation is very good, the regression
Friday mostly, and also in the first and the last two working equation is significant, the regression model is setting up.
days, people are generally become undisciplined, prone to
traffic accidents.
C. Model of accident occurrence rate
2) Time
As we know, the number of traffic accidents in the According to the relationship between the number of
morning and evening peak hours more than other times, that accidents and the various influence factors, we first assume
is, more accidents occurs in 6:00-8:00 and 16:00-19:00. that the relationship between the incidence rate and the
3) Temperature influence factors is as follows:
The number of traffic accidents in each temperature Y  k1 x13  k 2 x12  k 3 x1  k 4 x 24  k 5 x 23  k 6 x 22  k 7 x 2
range is relatively average, but with the increase of  k 8 x35  k 9 x34  k10 x33  k11 x32  k12 x3  k14 x 42  k15 x 4 (2-8)
temperature, the number of traffic accidents has increased 2 3 2
slowly.  k16 ln( x5 )  k17 x  k18 x 6  k19 x  k 20 x  k 21 x 7
6 7 7

4) Weather According to the value we set and the corresponding Y


Frequent traffic accidents mainly occurs in light rain and value, we give the constraint conditions:
cloudy, which makes people listless and inattention. And in k1  k 2  k 3  k 4  k 5  k 6  k 7  k 8  k 9 + k10 + k11  k12
 k13  k14  k15  k16  k17  k18  k19  k 20  k 21  1
(2-9)
heavy rain, rainstorm and other weather, people will be more
careful, so the frequency of traffic accidents is few. After processing the data, the coefficients of the function
5) Wind direction are fitted by lingo, and the results are as follows:
Traffic accidents happen mostly in southeaster, mainly
because China is in the east of the Eurasian continent in
Pacific, southeast monsoon comes in summer, which also
verifies the influence of temperature on frequency of
accidents.
6) Wind speed
Accidents happens more in three wind speed, and as the
wind speed increases, the number of accidents decreased
slowly.
According to the fitting mentioned above, we can find
out the relationship between the number of accidents and the
various influence factors, and the model of the number of
accidents is obtained as follows: Figure 2. lingo example solution
Table 6 Regression equation of the influencing factors and the number of
accidents
As a result, we get the relationship between the incidence
Influence Form
Regression equation R2 of accidents in Shanghai and the influencing factors.
factor ula
ln y   0.0033x13  0.0642 x12  0.3009 x1 0.87
5.4948 (2- Y  0.4508  10 4 x13  0.1834  10 2 x12  0.02058x1
Month
67 1) 0.8176  10 2 x24  0.1345 x23  0.7518 x22  1.6172 x2 (2-10)
ln y  0.0039 x24  0.0683 x23  0.4083 x22 2
0.85 (2- 0.01799 ln( x5 )  0.8932  10 x7
Week 0.9363 x2  2.6051
38 2)
As we can see from the function, there is a greater link
ln y  0.0008 x35  0.0247 x34  0.2828 x33 between the number of accidents per day, namely the
0.94 (2- accident rate and the months, weeks, weather and wind
Time 1.3777 x32  2.4163 x3  5.527
61 3) speed. Therefore, we select the month, week, weather and
wind speed as the 4 factors of the accident rate index system,
Temperat ln y  0.0285 x43  0.43 x42 0.83 (2- as shown below:
ure 2.0652 x4  1.1673 95 4)
ln y  1.6416 ln( x5 )  0.5452 0.98 (2-
Weather
09 5)
Wind ln y  0.0274 x62  0.0879 x6 0.80 (2-
direction 1.5834 23 6)
Wind ln y  0.5349 x73  4.8008 x72 0.90 (2-
speed 13.243 x7  16.668 26 7)

From the table we can see that the of the regression


equation is greater than 0.8, indicating that the fitting effect

20
Figure 3. Index system of accident occurrence rate (3) According to the scope of the traffic accident
incidence, we put forward the safety level, and provide the
In this way, we can calculate the probability of corresponding measures and the concept of the volunteer aid
occurrence of traffic accidents according to the month, week, station in different safety level.
the weather and wind speed. (4) In order to develop the traffic accident rate model
better, the classification of the current traffic data need to be
IV. MODEL APPLICATION
more reasonable, in addition to the current traffic accident
According to the function that we have obtained, as well data, other data such as vehicle mileage, road information
as the value of each variable range .We find out the and lane number data that could influence traffic accident,
maximum value of the traffic accident rate is 1.2397, the we need to collect and improve the modal as soon as
minimum value is 0.9303. possible.
That is when on Tuesday January, the weather is cloudy,
wind speed at the 4-6 level, the probability of traffic REFERENCES
accidents achieve maximum, we should watch rigorously. [1] Hayakawa H, Fischbeck P S, Fischhoff B. Traffic accident statistics
When on Monday August, the weather is rain, wind speed at and risk perceptions in Japan and the United States[J]. Accident
the 3 level, the probability of traffic accident reach the Analysis & Prevention, 2000, 32(6):827-35.
minimum instead. A possible reason is that we will be more [2] Evans A W. Estimating transport fatality risk from past accident
data[J]. Accident Analysis & Prevention, 2003, 35(4):459-72.
careful in a rainy day, not prone to traffic accidents, but we
also need to remind people to be careful. [3] Liu Jun, “Traffic accident analysis based on Data Mining
Technology” [J]. Transport Information and Safety, 2008, 26(1):73-
According to the range of traffic accidents rate, we give 76. (in Chinese)
the safety level, as shown in the following table: [4] Li Ganshan, “Study on the Traffic Accident Fatality Data in Yunnan
Table 7 Safety level classification Province of China” [J]. China Safety Science Journal, 2007,
Safety level Range of accident rate 17(7):72-80.
7 0.9303-0.9746
6 0.9747-1.0189
5 1.0190-1.0632
4 1.0633-1.1075
3 1.1076-1.1518
2 1.1519-1.1961
1 1.1962-1.2400

According to the set of safety levels, we can take the


appropriate measures to prevent the occurrence of traffic
accidents that can be avoided. For example, at the higher
safety level, we can set up the electronic warning system,
remind people to be careful with some sharp turns or a large
crowd; at the lower safety level, it needs not only the
electronic warning, but also needs the corresponding traffic
police and other personnel to maintain the traffic situation, in
order to avoid the occurrence of traffic accidents.
We can analyze the incidence of a certain area of the
traffic accident, then as for the "golden 5 minutes" rescue
time of the traffic accident, we can set up a volunteer aid
station at the right place for every hospital. So we can solve
the serious lack of national emergency common sense, but
missed the most the effective rescue time problem.
V. CONCLUSIONS AND RECOMMENDATIONS
(1) Due to the rapid growth of motor vehicles, drivers
and road mileage and the rapid development of economy,
traffic accidents and casualties and economic losses caused
by traffic accidents in Shanghai city have also increased
rapidly.
(2) Through the analysis of the traffic accident situation
and the data of the influence factors in July 2014 - April
2015. We have got the traffic accident rate index system
includes four parts: month, week, weather, wind speed, with
the application of the index system we can get the rate of
traffic accidents.

21

Anda mungkin juga menyukai