Anda di halaman 1dari 17

Fuzzy Dimension To Databases

Punam Bedi, Harmeet Kaur, Ankit Malhotra

Abstract
Traditional databases handle data, which is crisp, deterministic and precise in nature. However our
reasoning and decision-making process is uncertain and vague in nature. This paper gives an insight into
the world of uncertainty. The concept of fuzziness in databases and the ways of handling the fuzzy queries
to databases / fuzzy databases are explained in this paper. We have proposed two models by which
uncertainty can be handled in databases. The first model deals with fuzzy query to a crisp database while
the second model deals with storage and retrieval of fuzzy information in database. A prototype of both the
models has also been implemented in JAVA.

Keywords:
Fuzzy Logic, Fuzzy Sets, Fuzzy Relational Databases, Fuzzy SQL

1. Uncertainty: A Modern Outlook

Most of our traditional tools for formal modeling, reasoning and computing are
crisp, deterministic and precise in nature. Precision assumes that the parameters
of a model represent exactly either our perception of the phenomenon modeled
or the features of the real system that has been modeled. Certainty eventually
indicates that we assume the structures and parameters of the model to be
definitely known.

However, if the model or theory asserts factuality, then the modeling language
has to be suited to model the characteristics of the situation under study
appropriately. However we have a problem. For factual models or modeling
languages, two major complications arise:

1. Real situations are very often not crisp and deterministic and cannot be
described precisely i.e. real situations are very often uncertain or vague
in a number of ways.

2. Complete description of a real system would require far more detailed


data than a human being could ever recognize and process
simultaneously.

Hence, among the various paradigmatic changes in science and mathematics in


last century, one such has been the concern of the concept of uncertainty. In
science this change is manifested by a gradual transition, from a view, which
stated that uncertainty is undesirable to an alternative view that accepts
uncertainty as an integral part of the whole system that is essential to model the
real world.

There are three basic types of uncertainties discussed in literature as

1. Fuzziness : Lack of definite or sharp distinctions. The alternate


terms used for it are
q Vagueness
q Cloudiness
q Haziness

2. Discord : Disagreement in choosing among several alternatives.


The synonyms for it are
q Dissonance
q Incongruity
q Discrepancy

3. Nonspecificity : Two or more alternatives are left unspecified. The


synonyms for it are
q Variety
q Generality
q Diversity

The last two types of uncertainties can be classified as a higher uncertainty type,
ambiguity, which means any situation in which it remains unclear which of
several alternatives should be accepted as the genuine one. In general, ambiguity
results from lack of certain distinctions characterizing an object, from conflicting
distinctions or from both of these.

Our paper deals with implementation of fuzziness in databases.

2. Fuzzy Sets : Basic Concepts


2.1 Introduction

An important point in the evolution of modern concept of uncertainty was the


publication of a seminal paper by Lofti A Zadeh [17], in which Zadeh introduced
a theory whose objects fuzzy sets are sets with boundaries that are not precise and
the membership in this fuzzy set is not a matter of true or false, but rather a
matter of degree. This concept was called Fuzziness and the theory was called
Fuzzy Set Theory.
Fuzziness can be defined as the vagueness concerning the semantic meaning of
events, phenomenon or statements themselves. It is particularly frequent in all
areas in which human judgment, evaluation and decisions are important.

As an example consider a student record database system. Supposing we want to


find bright and young students in the whole batch. For a crisp system we would
specify the query as

PROJECT (Student_Name)
WHERE 19 ≤ AGE ≤ 23 and 3 ≤ GPA ≤ 4

But this system has a major flaw. Consider a student, Krishna whose age is 24
and has a good GPA of 4 out of 4. He should have been selected but is not. It is
because of the rigid boundary conditions set by the normal crisp logic.

In fuzzy logic we would do the same by specifying two fuzzy sets YOUNG and
GPA

1 1

0 17 19 23 25 0 3 3.5 4
(a) (b)
Fig. 1 : (a) Age ; (b) GPA

and each student will have some membership grade associated with the two sets.
So according to our definition Krishna will have a non–zero membership grade
although it will be less than other students in the age group 19-23.

Hence even Krishna will be included in the result set to be considered as Krishna
also satisfies the query to some extent, which is represented by its membership
grade.

Definition:

When A is a fuzzy set and x is a relevant object, the proposition “x is a member


of A” is not necessarily either true or false, as required by the two-valued logic,
but it may be true only to some degree, the degree to which x is actually a
member of A, is a real number in the interval [0, 1
Theoretically, if X is a collection of objects denoted generically by x, then a fuzzy
set F in X is a set of ordered pairs,

F = {(x, µ F(x))|x ε X},


µ F(x) is called the membership function (or grade of membership) of x in F that
maps X to the membership space M. The range of the membership function is a
subset of the nonnegative real numbers whose supremum is finite[10].

Fuzzy Set Operators and Fuzzy Logic


For crisp sets, the basic operations are, namely,

q Union, OR
q Intersection, AND
q Complement, NOT

As an analogy, for fuzzy sets we define fuzzy operators that allow us to


manipulate the fuzzy sets. We similarly have fuzzy complements, intersection
and union operators but they are not uniquely defined i.e. as membership
functions, they are also context – dependent [5].

However an important dissimilarity exists there between traditional set / logic


and fuzzy set theory. Traditionally there is a distinction between a union
operation of sets and OR of logic as is the case with intersection and AND also.
But in fuzzy theory there is no such distinction between the logical and set
operators [10] i.e.

Fuzzy union ≡ Fuzzy OR


Fuzzy intersection ≡ Fuzzy AND
Fuzzy complement ≡ Fuzzy NOT

We define some standard fuzzy operations as :

q Fuzzy Complement,
~A(x) = 1 - A(x)

q Fuzzy Union,
(A∪B)(x) = max[A(x), B(x)].

q Fuzzy Intersection,
(A∩B)(x) = min[A(x), B(x)].
More information regarding fuzzy operators and their properties can be found in
[5], [10].

3. Fuzzy Databases
3.1 Need For Fuzzy Databases

As the application of database technology moves outside the realm of a crisp


mathematical world to the realm of the real world, the need to handle imprecise
information becomes important, because a database that can handle imprecise
information shall store not only raw data but also related information that shall
allow us to interpret the data in a much deeper context, e.g. a query “Which
student is young and has sufficiently good grades?” captures the real intention of
the user’s query than a crisp query as

SELECT * FROM STUDENT


WHERE AGE < 19 AND GPA > 3.5

Such a technology has wide applications in areas such as medical diagnosis,


employment, investment etc. because in such areas subjective and uncertain
information is not only common but also very important.

3.2 Techniques for implementation of Fuzziness in Databases

One of the major concerns in the design and implementation of fuzzy databases
is efficiency i.e. these systems must be fast enough to make interaction with the
human users feasible.

In general, we have two feasible ways to incorporate fuzziness in databases:

1. Making fuzzy queries to the classical databases (discussed in section


4).
2. Adding fuzzy information to the system (discussed in section 5).

3.3 Classification of Data

The information data can be classified as following :

1. Crisp : There is no vagueness in the information.


e.g., X = 13
Temperature = 90°

2. Fuzzy : There is vagueness in the information and this can be further


divided into two types as
a. Approximate Value : The information data is not totally vague and
there is some approximate value, which is known and the data, lies
near that value.
e.g., 10 X 15
Temperature 85°
These are considered have a triangular shaped possibility distribution
as shown below

-d X +d
( APPROX X )
Fig. 2: Possibility Distribution for an approximate value

The parameter, d gives the range around which the information value
lies.

b. Linguistic Variable: A linguistic variable is a variable that apart from


representing a fuzzy number also represents linguistic concepts
interpreted in a particular context. Each linguistic variable is defined in
terms of a variable which either has a physical interpretation (speed,
weight etc.) or any other numerical variable (salary, absences, gpa etc.)

A linguistic variable is fully characterized by a quintuple <v,T,X,g,m>


where,
v - is the name of the linguistic variable.
T - is the set of linguistic terms that apply to this variable.
X - is the universal set of the values of X.
g - is a grammar for generating the linguistic terms.
m - is a semantic rule that assigns to each term t ε T, a fuzzy set on
X.

The information in this case is totally vague and we associate a fuzzy set with the
information. A linguistic term is the name given to the fuzzy set.

e.g., X is SMALL
Temperature is HOT
These are considered have a trapezoidal shaped possibility distribution as shown
below

SMALL
1

0
α β ã ä
Fig. 3: Possibility Distribution for a Linguistic Term
SMALL for the Linguistic Variable HEIGHT

There are four parameters associated with a linguistic term as α, β, ã and ä as


shown in the Fig. 3. For the range [β , γ] the membership value is 1.0, while for
the range [α , β] and [γ , δ] the membership value remains between [0.0, 1.0].

4. Fuzzy Querying to Relational Databases


4.1 The proposed model

The easiest way of introducing fuzziness in the database model is to use classical
relational databases and formulate a front end to it that shall allow fuzzy
querying to the database. A limitation imposed on the system is that because we
are not extending the database model nor are we defining a new model in any
way, the underlying database model is crisp and hence the fuzziness can only be
incorporated in the query.

To incorporate fuzziness we introduce fuzzy sets / linguistic terms on the


attribute domains / linguistic variables e.g. on the attribute domain AGE we
may define fuzzy sets as YOUNG, MIDDLE and OLD. These are defined as the
following:

1 Young Middle Old

0
αY βY γY,αM δY,βM γΜ,αΟ δΜ,βO γO δO
Fig. 4 : Age
For this we take the example of a student database which has a table STUDENTS
with the following attributes:

a. Name b. Age c. Course b. Percentage c. Absences

Name Age Course Percentage Absences


Ankit 19 12 83 13
Anuj 17 10 80 9
Sumit 18 11 83 6
Rahul 19 12 56 12
Bishop 19 12 65 32
Neha 18 11 77 23
Malini 17 10 69 10
Rocky 16 9 79 13
Sandeep 19 12 75 6
Nagesh 19 12 83 6

Fig. 5 : A snapshot of the data existing in the database.

4.2 Meta Knowledge

At the level of meta knowledge we need to add only a single table, LABELS with
the following structure:
LABELS

Label Column_Name Alpha Beta Gamma Delta


Fig. 6 : Meta Knowledge

This table is used to store the information of all the fuzzy sets defined on all the
attribute domains. A description of each column in this table is as follows:

• Label: This is the primary key of this table and stores the linguistic term
associated with the fuzzy set.
• Column_Name: Stores the linguistic variable associated with the given
linguistic term.
• Alpha, Beta, Gamma, Delta: Stores the range of the fuzzy set as shown in
Fig. 3 above.

4.3 Implementation:

The main issue in the implementation of this system is the parsing of the input
fuzzy query.
As the underlying database is crisp, i.e. no fuzzy data is stored in the database,
the INSERT query will not change and need not be parsed therefore it can be
presented to the database as it is.

During parsing the query is parsed and divided into the following

1. Query Type: Whether the query is a SELECT, DELETE or UPDATE.


2. Result Attributes: The attributes that are to be displayed used only in
the case of the SELECT query.
3. Source Tables: The tables on which the query is to be applied.
4. Conditions: The conditions that have to be specified before the
operation is performed. It is further sub-divided into Query Attributes
(i.e. the attributes on which the query is to be applied) and the
linguistic term. If the condition is not fuzzy i.e. it does not contain a
linguistic term then it need not be subdivided.

The implementation of the proposed system has been done in JAVA using a
MySQL database as the backend and the mm.mysql.jdbc-1.2c type 3 JDBC
driver.

5. Fuzzy Extension to Relational Databases


5.1 The proposed model

In the previous section, we have discussed how vague queries can be used on
relational databases. We now present the design of a Fuzzy Relational Database
in which not only the fuzzy queries can be applied rather fuzzy information can
also be stored in it.

Considering the same database as given in section 4.1, with the difference that,
now the attributes AGE, PERCENTAGE and ABSENCES can have fuzzy
information and the remaining are considered to be crisp.

Based on the information data classification, the attributes in the database are
defined to be two types:
• Type 1 : The attribute can store only crisp values.
• Type 2 : The attribute is fuzzy and can take either a crisp value, an
approximate value or a linguistic term.

Name Age Course Percentage Absences


Ankit OLD 12 GOOD 13
Anuj MIDDLE 10 80 9
Sumit 18 11 83 LOW
Rahul OLD 12 BAD 12
Bishop 19 12 65 HIGH
Neha 18 11 AVERAGE 23
Malini 17 10 69 10
Rocky MIDDLE 9 79 13
Sandeep APPROX 19 12 APPROX 75 APPROX 6
Nagesh APPROX 19 12 83 APPROX 6

Fig. 7: Snapshot of the data existing in the database.

5.2 Metadata

In this case, at the level of Meta knowledge we require three tables as discussed
below.
1. COLUMNS_IN_DB

Column_Name Type
Fig. 8. (a): Part of Meta Knowledge

This table stores the types (section 5.1) of all the attributes in the table. A
description of each column in this table is as follows:
• Column_Name: This is the primary key of this table and its tuples
correspond to the attributes in the table, STUDENTS.
• Type: This stores the type of the corresponding attribute and this
can have two values, namely, 1 and 2 for the two types as
mentioned in section 5.1 (crisp and fuzzy).

2. APPROXIMATE_VALUES_TABLE

Column_Name Margin
Fig. 8 (b): Part of Meta Knowledge

This table stores the parameter, d as shown in Fig. 2. A description of each


column in this table is as follows:
• Column_Name: This is a foreign key here and corresponds to
COLUMNS_IN_DB.
• Margin: This corresponds to the parameter, d.

3. LABELS
Column
Label Alpha Beta Gamma Delta
Name
Fig. 8 (c): Part of Meta Knowledge
This table is used to store the information of all the fuzzy sets defined on
all the attribute domains, along with there parameters, α, β, ã and ä, as
shown in Fig. 3. A description of each column in this table is as follows:
• Label: This is the primary key of this table and stores the linguistic
term associated with the fuzzy set.
• Column_Name: This is a foreign key here and corresponds to
COLUMNS_IN_DB.
• Alpha / Beta / Gamma / Delta: These correspond to the
parameters α, β, ã and ä.

5.3 Implementation

Here again the main issue in the implementation of this system is the parsing of
the input fuzzy query.

During parsing the query is parsed and divided into the following

1. Query Type: Whether the query is an INSERT, SELECT, DELETE or


UPDATE.
2. Result Attributes: The attributes that are to be displayed used only in
the case of the SELECT query.
3. Source Tables: The tables on which the query is to be applied.
4. Conditions: The conditions that have to be specified before the
operation is performed. It is further sub-divided into Query Attributes
(i.e. the attributes on which the query is to be applied) or the linguistic
term or the approximate value. If the condition is not fuzzy i.e. it does
not contain a linguistic term then it need not be subdivided.

The implementation of the proposed system has been done in JAVA using a
MySQL database as the backend and the mm.mysql.jdbc-1.2c type 3 JDBC
driver.

6. Query Language:

The syntax of the query language remains the same for both the models and is
defined as follows:

SELECT
The syntax of the SELECT statement is as follows

SELECT <ATTRIBUTE1> [, <ATTRIBUTE2> …]


FROM <TABLE1> [, <TABLE2> …]
[WHERE <CONDITION1> [<CON> <CONDITION2> …]]
[THOLD relational operator x | #x]

where the CONDITION would be defined as follows

1. ATTRIBUTE relational operator CONSTANT


2. ATTRIBUTE1 relational operator ATTRIBUTE2 (both attributes
should be compatible).
3. ATTRIBUTE is|are TERM (where both attribute and the
linguistic term (defined in section 3.3) should be compatible).

And CON is a connective that is used to combine two conditions e.g. OR,
AND etc.

e.g. SELECT NAME


FROM STUDENTS
WHERE PERCENTAGE > 85 AND ABSENCES ARE LOW

THOLD specifies the alpha cut [5][10] that is to be applied to the result
set.

#x specifies the number of records that need to be returned. If x is equal


to 0 all entries are considered to be a part of the result and their
corresponding membership values are returned along with the entries.

7. Data Manipulation Language


7.1 Data Manipulation Language for the first model

The type of operations and their syntax that we shall allow to the database are:

a. INSERT
This is the same as specified in SQL and has the following structure,

INSERT INTO <TABLE>


VALUES (<expression1>, ….)

e.g. INSERT INTO STUDENTS


VALUES (“Ankit”, 19,12,85,10)

b. DELETE
The structure of the DELETE statement is
DELETE
FROM <TABLE>
[WHERE <CONDITION1> [<CON> <CONDITION2> …]]
where CONDITION and CON is defined the same as in section 6 above.

e.g. DELETE
FROM STUDENTS
WHERE PERCENTAGE > 85 AND ABSENCES ARE LOW

c. UPDATE
The structure of the DELETE statement is
UPDATE <TABLE>
SET VALUES <ATTRIBUTE1> = <expression1>
[, <ATTRIBUTE2> = <expression2> …]
[WHERE <CONDITION1> [<CON> <CONDITION2> …]]

where CONDITION and CON is defined the same as defined in section 6.


e.g. UPDATE STUDENTS
SET PERCENTAGE = 85
WHERE PERCENTAGE < 85 AND ABSENCES ARE LOW

This is a minimal set of operators and more can be added if the need arises.

7.2 Data Manipulation Language for the second model

The type of operations and their syntax that we shall allow to the database are:

a. INSERT

In the INSERT statement an approximate value can also be inserted into


the database. The syntax of the INSERT statement is given below

INSERT INTO TABLE


VALUES (<expression1> , ...)

Where expression can be an approximate value as well as a linguistic


term.

e.g. INSERT INTO STUDENTS


VALUES (“Ankit”, APPROX 19,12,85,about 10)

b. DELETE
The structure of the DELETE statement is
DELETE
FROM <TABLE>
[WHERE <CONDITION1> [<CON> <CONDITION2> …]]

where CONDITION and CON is defined the same as in Section 6.

e.g. DELETE
FROM STUDENTS
WHERE PERCENTAGE > APPROX 85 AND ABSENCES ARE LOW

c. UPDATE
The structure of the DELETE statement is
UPDATE <TABLE>
SET VALUES <ATTRIBUTE1> = <expression1>
[, <ATTRIBUTE2> = <expression2> …]
[WHERE <CONDITION1> [<CON> <CONDITION2> …]]

Where expression can be an approximate value as well as a linguistic


term.

Also here CONDITION and CON is defined the same as in Section 6.

e.g. UPDATE STUDENTS


SET (PERCENTAGE = GOOD)
WHERE PERCENTAGE < 85 AND ABSENCES ARE LOW

This is a minimal set of operators and more can be added if the need arises.

8. Conclusion and Future Research


8.1 Conclusion

We have designed and implemented two models for incorporating fuzziness in


databases. Fuzzy query of database is discussed in first model (section 4) and
storage of fuzziness is discussed in second model (section 5). The fuzziness is in
the form of approximate values or linguistic variables, which can be used only in
queries in the first model but can be used as attribute values in the second model.
We have successfully implemented both the models in JAVA. Few screen shots
are available at www.tsucorp.net/ankit.

Even though second model is more flexible, but fuzzy databases are still not very
much in use because people are reluctant to replace their crisp data by fuzzy data
before they are convinced that it is worthwhile or necessary to do so. From this
point of view, the first model scores over the second model as it can be used with
crisp data and also it is making use of the power of fuzzy theory.

8.2 Future Research

A limitation in the proposed model of section 5 is that the handling of the


linguistic variables has not been complete. We have yet to implement linguistic
hedges i.e. modifiers such as ‘very’, which on being applied to a linguistic term
changes it’s semantic meaning. In future we will work to overcome this
limitation.
References

[1] Bosc P., Liétard I and Pivert O. “Evaluation of flexible queries : The
quantified statement case”, Technologies for Constructing Intelligent
Systems I, Physica-Verlag Heidelberg New York, pp 337-350 (2002)

[2] Chiang D., Chow L. R. and Hsien N. “Fuzzy information in extended


fuzzy relational databases”, Fuzzy Sets and Systems 92, pp.1-10. (1997)

[3] Cao T. H., Rossiter J. M., Martin T. P. and Baldwin J. F. “ On the


implementation of Fril++ for object-oriented logic programminhg with
uncertainty and fuzziness”, Technologies for Constructing Intelligent
Systems II, Physica-Verlag Heidelberg New York, pp 393-406 (2002)

[4] Kaushik S, Nanda H, “Web Based Access of Relational Databases Using


Fuzzy Natural Language Queries”, International Conference on Cognitive
Systems, Delhi, India (1999).

[5] Klir G. J. and Yuan B. [2001], “Fuzzy Sets and Fuzzy Logic : Theory and
Applications”, Prentice Hall, Inc. Englewood Cliffs, N. J., U.S.A.

[6] Medina J. M., Pons O., Vila M.A. “GEFRED, A Generalized Model of
Fuzzy Relational Databases”. Information Sciences, 76, 1-2, pp 87-109.
(1994)

[7] Yang Q., Zhang W., Liu C., Wu J., Nakajima H. and Rishe N.D. “Efficient
Processing of Nested Fuzzy SQL Queries in a Fuzzy Database”, IEEE
Trans. On Knowledge and Data Eng., vol. 13, no. 6, pp. 884-901, Nov/Dec
2001

[8] Zadeh, L.A. “Fuzzy Sets.” Information and Control, 8(3), pp. 338-353.
(1965)

[9] Zadeh L. A. “A new direction in AI : Toward a computational theory of


perceptions”, Technologies for Constructing Intelligent Systems I,
Physica-Verlag Heidelberg New York, pp 3-20 (2002)

[10] Zimmerman J. [2001], “Fuzzy Set Theory – And It’s Applications”, Kluwer
Academic Publishers, Norwell, Massachusetts, U.S.A.
Profile of the authors

Name : Dr. (Ms.) Punam Bedi


Qualification : M.Tech.(Computer Science), Ph.D (Computer Science)
Age : 40 years
Gender : Female
Institution : Department of Computer Science,
Delhi University – 110 007.
Email : punam_bedi@hotmail.com
Address : J - 35, Kirti Nagar, New Delhi -110 015
Contact Number : 011-7667591, 011-5446576®

Name : Ms. Harmeet Kaur


Qualification : M.C.A.
Age : 29 years
Gender : Female
Institution : Department of Computer Science, Hans Raj College,
Delhi University, Delhi – 110 007
Email : harmeet_negi@rediffmail.com
Address : 19, M. S. Flats, Type – III, Timarpur, Delhi - 110 054
Contact Number : 011-7667545/46, 011-3817894®

Name : Mr. Ankit Malhotra


Qualification : Student of Bachelor of Information Science
Age : 20 years
Gender : Male
Institution : Department of Computer Science,
Hans Raj College, Delhi University,
Delhi - 110007
Email : ankitmalhotra@mail.com
Address : 102, Priya Enclave, Near Karkardooma Courts,
Delhi – 110 092
Contact Number : 011-2370840®, 011-2379871®

Anda mungkin juga menyukai