ISPRS, XXXVI-2/W25
KEY WORDS: spatial data mining, principles, cloud model, date fields, applications
ABSTRACT:
A growing attention has been paid to spatial data mining and knowledge discovery (SDMKD). This paper presents the principles
of SDMKD, proposes three new techniques, and gives their applicability and examples. First, the motivation of SDMKD is
briefed. Second, the intension and extension of SDMKD concept are presented. Third, three new techniques are proposed in this
section, i.e. SDMKD-based image classification that integrates spatial inductive learning from GIS database and Bayesian
classification, cloud model that integrates randomness and fuzziness, data field that radiate the energy of observed data to the
universe discourse. Fourth, applicability and examples are studied on three cases. The first is remote sensing classification, the
second is landslide-monitoring data mining, and the third is uncertain reasoning. Finally, the whole paper is concluded and
discussed.
2
ISSTM 2005, August, 27-29, 2005, Beijing, China
Exceptions Outliers that are isolated from common rules or derivate A monitoring point with much bigger
from other data observations very much movement.
Knowledge is rule plus exception. A spatial rule is a pattern piece of generic knowledge is virtually in the form of rule
showing the intersection of two or more spatial objects or plus exception.
space-depending attributes according to a particular spacing
or set of arrangements (Ester, 2000). Besides the rules,
during the discovering process of description or prediction, 3. TECHNIQUES FOR SDMKD
there may be some exceptions (also named outliers) that
derivate so much from other data observations (Shekhar, Lu, Because SDMKD is an interdisciplinary subject, there are
Zhang, 2003). They identify and explain exceptions various techniques associated with the abovementioned
(surprises). For example, spatial trend predictive modeling different knowledge (Li et al., 2002). They may include,
first discovered the centers that are local maximal of some probability theory, evidence theory (Dempster-Shafer),
non-spatial attribute, then determined the (theoretical) trend spatial statistics, fuzzy sets, cloud model, rough sets, neural
of some non-spatial attribute when moving away from the network, genetic algorithms, decision tree, exploratory
centers. Finally few deviations are found that some data were learning, inductive learning, visualization, spatial online
away from the theoretical trend. These deviations may analytical mining (SOLAM), outlier detection, etc. , main
arouse suspicious that they are noise, or generated by a techniques are briefed in Table.3.
different mechanism. How to explain these outliers?
Traditionally, outliers’ detection has been studies via Some of techniques (Table.3) are further developed and
statistics, and a number of discordancy tests have been applied, for example, the algorithms in spatial inductive
developed. Most of them treat outliers as “noise” and they learning include AQ11 and AQ15 by Michalski, AE1 and
try to eliminate the effects of outliers by removing outliers or AE9 by Hong, CLS by Hunt, ID3, C4.5 and C5.0 by Quinlan,
develop some outlier-resistant methods (Hawkins, 1980). In and CN2 by Clark, etc (Di, 2001). And the implementation
fact, these outliers prove the rules. In the context of data of data mining in spatial database is still needed to be further
mining, they are meaningful input signals rather than noise. studied. The following is our proposed techniques, SDMKD-
In some cases, outliers represent unique characteristics of the based image classification, cloud model, and data field.
objects that are important to an organization. Therefore, a
3
ISPRS, XXXVI-2/W25
attributes. GIS data are used in training area selection for
3.1 SDMKD-based image classification
Bayesian classification, generating learning data of two
Based on the integration of remote sensing and GIS (Li, Guan, granularities, and testing area selection for classification
2002), this subsection presents an approach to combine accuracy evaluation. And the ground control points for
spatial inductive learning with Bayesian image classification image rectification are also chosen from GIS data. It
in a loose manner, which takes learning tuple as mining implements inductive learning in spatial data mining via C5.0
granularities for learning knowledge subdivide classes into algorithm on the basis of learning granularities. Figure 1
subclasses, i.e. pixel granularity and polygon granularity, and shows the principle of the method.
selects class probability values of Bayesian classification,
shape features, locations and elevations as the learning
Bayesian classification
Figure 1. Flow diagram of remote sensing image classification with inductive learning
In Figure 1, first, the remote sensing images are classified confidence; (3) If several rules are activated and the
initially by Bayesian method before using the knowledge, and confidence values are the same, then let the final class be the
the probabilities of each pixel to every class are retained. same as the rule with the maximum coverage of learning
Second, inductive learning is conducted by the learning samples; and (4) If no rule is activated, then let the final class
attributes. Learning with probability simultaneously makes be the default class.
use of the spectral information of a pixel and the statistical
information of a class since the probability values are derived
from both of them. Third, the knowledge on the attributes of 3.2 Cloud model
general geometric features, spatial distribution patterns and The cloud model is a model of the uncertainty transition
spatial relationships is further discovered from GIS database, between qualitative and quantitative analysis, i.e. a
e.g. the polygons of different classes. For example, the mathematical model of the uncertainty transition between a
water areas in the classification image are converted from linguistic term of a qualitative concept and its numerical
pixels to polygons by raster to vector conversion, and then representation data. A piece of cloud is made up of lots of
the location and shape features of these polygons are cloud drops, visible shape in a whole, but fuzzy in detail,
calculated. Finally, the polygons are subdivided into which is similar to the natural cloud in the sky. So the
subclasses by deductive reasoning based on the knowledge, terminology cloud is used to name the uncertainty transition
e.g. class water is subdivided into subclasses such as river, model proposed here. Any one of the cloud drops is a
lake, reservoir and pond. In Figure 2, the final classification stochastic mapping in the discourse universe from qualitative
results are obtained by post-processing of the initial concept, i.e. a specified realization with uncertain factors.
classification results by deductive reasoning. Except the With the cloud model, the mapping from the discourse
class label attribute, the attributes for deductive reasoning are universe to the interval [0,1] is a one-point to multi-point
the same as that in inductive learning. The knowledge transition, i.e. a piece of cloud while not a membership curve.
discovered by C5.0 algorithm is a group of classification As well, the degree that any cloud drop represents the
rules and a default class, and each rule is together with a qualitative concept can be specified. Cloud model may mine
confidence value between 0 and 1. According to how the spatial data with both fuzzy and stochastic uncertainties, and
rule is activated that the attribute values match the conditions the discovered knowledge is close to human thinking. Now,
of this rule, the deductive reasoning adopts four strategies: (1) in geo-spatial science, the cloud model has been further
If only one rule is activated, then let the final class be the explored to spatial intelligent query, image interpretation,
same as this rule; (2) If several rules are activated, then let the land price discovery, factors selection, mechanism of spatial
final class be the same as the rule with the maximum
4
ISSTM 2005, August, 27-29, 2005, Beijing, China
data mining, and landslide-monitoring (Li, Du, 2005;Wang, ( x − Ex ) 2
−
2 ( En ′ )2
2002) etc.
[3] Calculate
y=e ;
The cloud model well integrates the fuzziness and [4] Drop (xi, yi) is a cloud-drop in the discourse universe;
randomness in a unified way via three numerical [5] Repeat step [1]-[4] until N cloud-drops are generated.
characteristics, Expected value (Ex), Entropy (En), and
Hyper-Entropy (He). In the discourse universe, Ex is the Simultaneously, the input of the backward normal cloud
position corresponding to the center of the cloud gravity, generator is the quantitative positions of N cloud-drops, xi
whose elements are fully compatible with the spatial (i=1,…,N), and the certainty degree that each cloud-drop can
linguistic concept; En is a measure of the concept coverage, represent a linguistic term, yi(i=1,…,N), while the output is
i.e. a measure of the spatial fuzziness, which indicates how the three numerical characteristics, Ex, En, He, of the
many elements could be accepted to the spatial linguistic linguistic term represented by the N cloud-drops. The
concept; and He is a measure of the dispersion on the cloud algorithm in details is:
drops, which can also be considered as the entropy of En. In N
the extreme case, {Ex, 0, 0}, denotes the concept of a Ex = N1 ∑xi
deterministic datum where both the entropy and hyper [1] Calculate the mean value of xi(i=1,…,N), i=1 ;
entropy equal to zero. The greater the number of cloud drops,
(x − Ex)
the more deterministic the concept. Figure 2 shows the three Eni = − i
numerical characteristics of the linguistic term “displacement 2lnyi
[2] For each pair of (xi, yi), calculate ;
is 9 millimeters (mm) around”. Given three numerical [3] Calculate the mean value of Eni (i=1,…, N),
characteristics Ex, En and He, the cloud generator can N
produce as many drops of the cloud as you would like. En = 1
N ∑ En
i =1
i
;
[4] Calculate the standard deviation of Eni,
N
1
He = ∑(Eni − En)2
N i=1 .
With the given algorithms of forward and backward cloud
generators, it is easy to build the mapping relationship
inseparably and interdependently between qualitative concept
and quantitative data. The cloud model improves the
weakness of rigid specification and too much certainty, which
Figure 2.Three Numerical characteristics comes into conflict with the human recognition process,
appeared in commonly used transition models. Moreover, it
The above three visualization methods are all implemented performs the interchangeable transition between qualitative
with the forward cloud generator in the context of the given concept and quantitative data through the use of strict
{Ex, En, He}. Despite of the uncertainty in the algorithm, mathematic functions, the preservation of the uncertainty in
the positions of cloud drops produced each time are transition makes cloud model well meet the need of real life
deterministic. Each cloud drop produced by the cloud situation. Obviously, the cloud model is not a simple
generator is plotted deterministically according to the combination of probability methods and fuzzy methods.
position. On the other hand, it is an elementary issue in
spatial data mining that spatial concept is always constructed 3.3 Data field
from the given spatial data, and spatial data mining aims to The obtained spatial data are comparatively incompleteness.
discover spatial knowledge represented by a cloud from the Each datum in the concept space has its own contribution in
database. That is, the backward cloud generator is also forming the conception and the concept hierarchy. So it is
necessary. It can be used to perform the transition from data necessary for the observed data to radiate their data energies
to linguistic terms, and may mine the integrity {Ex, En, He} from the sample space to their parent space. In order to
of cloud drops specified by many precise data points. Under describe the data radiation, data field is proposed.
the umbrella of mathematics, the normal cloud model is
common, and the functional cloud model is more interesting. Spatial data radiate energies into data field. The power of the
Because it is common and useful to represent spatial data field may be measured by its potential with a field
linguistic atoms (Li et al., 2001), the normal compatibility function. This is similar with the electric charges contribute
cloud will be taken as an example to study the forward and to form the electric field that every electric charge has effect
backward cloud generators in the following. on electric potential everywhere in the electric field. So the
function of data field can be derived from the physical fields.
The input of the forward normal cloud generator is three The potential of a point in the number universe is the sum of
numerical characteristics of a linguistic term, (Ex, En, He), all data potentials.
and the number of cloud-drops to be generated, N, while the N
ri 2
output is the quantitative positions of N cloud drops in the − ∑ ρi
p = k ⋅e
data space and the certain degree that each cloud-drop can
i =1
represent the linguistic term. The algorithm in details is:
[1] Produce a normally distributed random number En’ with
where, k is a constant of radiation gene, ri is the distance from
mean En and standard deviation He;
the point to the position of the ith observed data, ρi is the
[2] Produce a normally distributed random number x with
certainty of the ith data, and N is the amount of the data.
mean Ex and standard deviation En’;
With a higher certainty, the data may have greater
contribution to the potential in concept space. Besides them,
space between the neighbor isopotential, computerized grid
5
ISPRS, XXXVI-2/W25
density of Descartes coordinate, etc. may also make their is to discover rules to subdivide waters in polygon granularity,
contributions to the data field. the other in to discover rules to reclassify dry land, garden
and forest in pixel granularity. The land use layer (polygon)
4. APPLICABILITY AND EXAMPLES and contour layer (line) are selected for these purposes.
Because there are few contours and elevation points, it is
SDMKD may be applied in many spatial data referenced difficult to interpolate a DEM accurately, instead, the
fields. Here are our three studied cases, remote sensing contours are converted to height zones, such as <50m, 50-
classification, Landslide monitoring data mining, and spatial 100m, 100-200m and >200m, which are represented by
uncertain reasoning. polygons.
4.1 Remote sensing image classification In the learning for subdividing waters, several attributes of
The land use classification experiment is performed in the the polygons in land use layer are s calculated as condition
Beijing area using SPOT multi-spectral image and 1:100000 attributes, such as area, location of the center, compactness
land use database. Discover knowledge from GIS database (perimeter^2/(4π•area)), height zone, etc. The classes are
and remote sensing image data can improve land use river (code 71), lake (72), reservoir (73), pond (74) and forest
classification. Two kinds of knowledge on land use and shadow (99). 604 water polygons are learned, 10 rules are
elevation will be discovered from GIS database, which are discovered (Table 5).
applied to subdivide water and green patches respectively.
Pixel granularity is to subdivide green patches, and polygon
granularity is to subdivide water.
7
ISPRS, XXXVI-2/W25
BT31: the displacements are very big south, higher scattered
4.2 Landslide monitoring and very instable;
Baota landslide locates in Yunyang, Chongqing, China. And BT32: the displacements are big south, low scattered and
the landslide monitoring started from June 1997. Up to now, more instable;
this database on the displacements has amounted to BT33: the displacements are big south, high scattered and
1Gigabytes, and all the attributes are numerical very instable; and
displacements, i.e. dx, dy, and dh. Respectively, the BT34: the displacements are big south, high scattered and
properties of dx, dy, and dh, are the measurements of more instable.
displacements in X direction, Y direction and H direction of
the landslide-monitoring points, and |dx|, |dy| an |dh| are their Figure 5 visualizes the above rules, where different rules
absolute values. In the following, it is noted that all spatial represented by ellipses with different colors, and the number
knowledge is discovered from the databases with the in ellipses denotes the number of rules. The generalized
properties of dx, dy, and dh, while |dx|, |dy| an |dh| are only result at a higher hierarchy than that of Figure 4 in the feature
used to visualize the results of spatial data mining. And the space is the displacement rule of the whole landslide, i.e. “the
properties of dx are the major examples. The linguistic terms whole displacement of Baota landslide are bigger south (to
of different displacements on dx, dy and dh may be depicted Yangtze River), higher scattered and extremely instable”.
by the pan-concept hierarchy tree (Figure 6) in the conceptual Because large amounts of consecutive data are replaced by
space, which are formed by cloud models (Figure 11). It can discrete linguistic terms, the efficiency of spatial data mining
be seen from Figure 6 and Figure 10 that the nodes “very can be improved. Meanwhile, the final result mined will be
small” and “small” both have the son node “9 millimeters stable due to the randomness and fuzziness of concept
around”, which indicates that the pan-concept hierarchy tree indicated by the cloud model.
is a pan-tree structure.
All the above landslide-monitoring points form the potential
From the observed landslide-monitoring values, the backward field and the isopotential lines spontaneously in the feature
cloud generator can mine Ex, En and He of the linguistic space. Intuitively, these points can be grouped naturally into
term indicating the level of that landslide displacement, i.e. clusters. These clusters represent different kinds of spatial
gain the concept with the forward cloud generator. Then, objects recorded in the database, and naturally form the
with the three gained characteristics, the forward cloud cluster graph. Figure 5(a) further shows all points’ potential.
generator can reproduce as many deterministic cloud-drops as
you would like, i.e. produce synthetic values with the
backward cloud generator.
(a) All points’ potential (b) Potential without exceptions (c) All points’ cluster graph (d) Cluster graph without exceptions
Figure 6. Landslide-monitoring points’ potential form the clusters and cluster graph
′
the rule “If A, then B”. CGA is the X-conditional cloud [4] Calculate y = Ex B ± − 2 ln(μ) En B , let (y, μ) be cloud drops.
generator for linguistic term A, and CGB is the Y conditional
B If x<ExA, “-”is adopted, while id x>ExA, “+” is adopted.
cloud generator for linguistic term B. Given a certain input x, [5] Repeat step [1] to [4], generate cloud drops as many as
CGA generates random values μi . These values are you want.
considered as the activation degree of the rule and input to
CGB. The final outputs are cloud drops, which forms a new
B Figure 7 is the output cloud of a one-factor one-rule generator
cloud. with one input. We can see that the cloud model based
reasoning generate uncertain result. The uncertainty of the
Combining the algorithm of X and Y conditional cloud linguistic terms in the rule is propagated during the reasoning
generators (Li, 2005), we present the following algorithm for process. Since the rule output is a cloud, we can give the final
one-factor one-rule reasoning. result in several forms. 1) One random value; 2) Several
[1] En′A = G(EnA, HeA) //Produce random values that satisfy random values as sample results; 2) Expected value, which is
with the normal distribution probability of mean EnA and the mean of many sample results; 4) Linguistic term, which is
standard deviation HeA. represented by a cloud model, and the parameters of the
model are obtained by inverse cloud generator method.
9
ISPRS, XXXVI-2/W25
Figure 7. Output cloud of one-rule reasoning Figure 8. Input-output response of one-rule reasoning
If we input a number of values to the one-factor rule and If the rule antecedent has two or more factors, such as “If A1,
draw the inputs and outputs in a scatter plot, we can get the A2,…, An, then B”, we call the rule multi-factor rule. In this
input-output response graph of the one-factor one-rule case, a multi-dimensional cloud model represents the rule
reasoning (Figure 8). The graph looks like a cloud band, not a antecedent. Figure 9 is a two-factor one-rule generator, which
line. Closer to the expected values, the band is more focusing, combines a 2-dimensional X-conditional cloud generator and
while farther to the expected value, the band is more a 1-dimensional Y-conditional cloud generator. It is easy to
dispersed. This is consistent with human being’s intuition. give the reasoning algorithm on the basis of the cloud
The above two figures and the discussion show that the cloud generator algorithms stated in section 3.2. And consequently,
model based uncertain reasoning is more flexible and multi-factor one-rule reasoning is conducted in a similar way.
powerful than the conventional fuzzy reasoning method.
ExB
ExA2 EnA2 HeA2 EnB drop(zi , μi)
HeB CGB
ExA1 μi
EnA1
CGA
HeA1 drop(x , y , μi )
(x,y)
Figure 11.A situation of two activated rules Figure 15 is the input-output response surface of the rules.
The surface is an uncertain surface and the roughness is
uneven. Closer to the center of the rules (the expected values
of the clouds), the surface is more smooth showing the small
uncertainty; while at the overlapped areas of the rules, the
surface is more rough showing large uncertainty. This proves
that the multi-factor multi-rule reasoning also represents and
propagates the uncertainty of the rules as the one-factor one-
rule reasoning does.
encouraged discover knowledge from spatial databases and 3. CRESSIE,N., 1991, Statistics for Spatial Data. (New
apply the knowledge in image interpretation for spatial data York: John Wiley and Sons)
updating. The applications of inductive learning in other 4. DASU, T., 2003, Exploratory Data Mining And Data
image data sources, e.g. TM, SAR, and the applications of Cleaning (New York: John Wiley & Sons, Inc.)
other SDMKD techniques in remote sensing image 5. DI, K.C., 2001, Spatial Data Mining and Knowledge
classification, may be the valuable future directions of further Discovery (Wuhan: Wuhan University Press)
study. 6. ESTER, M. et al., 2000, Spatial data mining: databases
primitives, algorithms and efficient DBMS support.
Cloud model integrated the fuzziness and randomness in a Data Mining and Knowledge Discovery, 4, 193-216
unified way via the algorithms of forward and backward 7. GALLANT, S. I., 1993, Neural Network Learning And
cloud generators in the contexts of three numerical Expert Systems (Cambridge: MIT Press)
characteristics, {Ex, En, He}. It took advantage of human 8. HAN, J., KAMBER, M., 2001, Data Mining: Concepts
natural language, and might search for the qualitative concept and Techniques (San Francisco: Academic Press)
described by natural language to generalize a given set of 9. HAN, J., 1998, Towards on-line analytical mining in
quantitative datum with the same feature category. Moreover, large databases. ACM SIGMOD-Record
the cloud model could act as an uncertainty transition 10. HAINING, R., 2003, Spatial Data Analysis: Theory and
between a qualitative concept and its quantitative data. With Practice (Cambridge: Cambridge University Press)
this method, it was easy to build the mapping relationship 11. HAWKINS,D., 1980, Identifications of
inseparably and interdependently between qualitative concept outliers(London: Chapman and Hall)
and quantitative data, and the final discovered knowledge 12. HONG J.R., 1997, Inductive Learning – algorithm,
with hierarchy could match different demands from different theory and application (Beijing: Science Press)
level users. Data field radiated the energy of observed data to 13. LI D.R., GUAN Z.Q., 2002, Integration and realization
the universe discourse, considering the function of each data of spatial information system (Wuhan: Wuhan
in SDMKD. In the context of data field, the conceptual space University Press)
represented different concepts in the same characteristic 14. LI, D.R., CHENG, T., 1994, KDG: Knowledge
category, while the feature space depicts were complicated Discovery from GIS - Propositions on the Use of KDD
spatial objects with multi-properties. The isopotential of all in an Intelligent GIS. In Proc. ACTES, The Canadian
data automatically took shape a concept hierarchy. At the Conf. on GIS,
same time, the isopotential of all objects automatically took 15. LI D.R., et al., 2001, On spatial data mining and
shape clusters and clustering hierarchy. The clustering or knowledge discovery (SDMKD), Geomatics and
associations between attributes at different cognitive levels Information Science of Wuhan University, 26(6):491-
might make many combinations. 499
16. LI D.R., et al., 2002, Theories and technologies of
The experimental results on Baota landslide monitoring show spatial data mining and knowledge discovery.
the cloud model-based spatial data mining can reduce the task Geomatics and Information Science of Wuhan
complexity, improve the implementation efficiency, and University, 27(3): 221-233
enhance the comprehension of the discovered spatial 17. LI D.R., WANG S.L., LI D.Y.,2005, Theories and
knowledge. On the basis of the cloud model and data field, Applications of Spatial Data Mining (Beijing: Science
the experimental results on Baota landslide-monitoring Press)
indicated cloud model- and data field- based SDMKD could 18. LI D.Y., DU Y., 2005, Artificial Intelligence with
reduce the task complexity, improve the implementation Uncertainty (Beijing: National Defense Industry Press)
efficiency, and enhance the comprehension of the discovered 19. MILLER H J, HAN J, 2001, Geographic Data Mining
spatial knowledge. Cloud model was a model for concept and Knowledge Discovery (London and New York:
representation and uncertainty handling. A series of Taylor and Francis)
uncertainty reasoning algorithms were presented based on the 20. MUGGLETON, S., 1990, Inductive Acquisition Of
algorithms of cloud generators and conditional cloud Expert Knowledge (Wokingham, England: Turing
generators. The example case of spatial uncertainty Institute Press in association with Addison-Wesley)
reasoning showed that knowledge representation and 21. PAWLAK,Z., 1991, Rough Sets: Theoretical Aspects
uncertainty reasoning based on cloud model were more Of Reasoning About Data (London: Kluwer Academic
flexible, effective and powerful than the conventional fuzzy Publishers)
theory based methods. 22. QUINLAN, J., 1986, Introduction of decision trees.
Machine Learning, (5), 239-266
23. SHAFER, G., 1976, A Mathematical Theory of
ACKNOWLEDGEMENTS Evidence (Princeton: Princeton University Press)
This study is supported by National Basic Research Program 24. SHEKHAR S., LU C.T., Zhang P., 2003, A Unified
of China (973 Program) “Transform mechanism among Approach to Detecting Spatial Outliers.
observed data, spatial information, and geo-knowledge”, GeoInformatica,7(2):139-166
National Natural Science Foundation of China (70231010), 25. SOUKUP T., DAVIDSON I., 2002, Visual Data
China Postdoctoral Science Foundation (2004035360), and Mining: Techniques and Tools for Data Visualization
Wuhan University (216-276081). and Mining (New York: Wiley Publishing, Inc.)
26. WANG, S.L., 2002, Data Field and Cloud Model -
REFERENCES Based Spatial Data Mining and Knowledge Discovery.
Ph.D. Thesis (Wuhan: Wuhan University)
1. ARTHURS, A. M., 1965, Probability theory
27. ZADEH, L.A., 1975, The concept of linguistic variable
(London:Dover Publications)
ant its application to approximate reasoning.
2. BUCKLESS,B.P., PETRY,F.E., 1994, Genetic
Information Science, 8, 199-249.
Algorithms (Los Alamitos, California: IEEE Computer
Press)
12
ISSTM 2005, August, 27-29, 2005, Beijing, China
13