Section 1:
Q1. Explain different control measures of database security mechanisms [CO3]
A DBMS typically includes a database security and authorization subsystem that
is responsible for ensuring the security portions of a database against
unauthorized access.
Access Control: The security mechanism of a DBMS must include provisions
for restricting access to the database as a whole; this function is called access
control and is handled by creating user accounts and passwords to control
login process by the DBMS.
Inference control: Controlling the access to a statistical database(which is
used to provide statistical information or summaries of values based on
various criteria) is a security issue.
The counter-measures to statistical database security problem is called
inference control measures.
Flow control: Prevents information from flowing in such a way that it reaches
unauthorized users. Flow control is necessary for covert channels.( pathways
that violate the security policy of an organization).
Data Encryption: The data is encoded using some coding algorithm. An
unauthorized user who access encoded data will have difficulty deciphering
it, but authorized users are given decoding or decrypting algorithms (or keys)
to decipher data.
Mandatory access control is a security mechanism that classifies data and users
based on security classes. It is typically combined with the discretionary access
control mechanisms.
Typical security classes are Top secret (TS), secret (S), confidential (C), and
unclassified (U), where TS is the highest level and U the lowest: TS ≥ S ≥ C ≥
U.
The commonly used model for multilevel security, known as the Bell-
LaPadula model, classifies each subject (user, account, program) and
object (relation, tuple, column, view, operation) into one of the security
classifications, T, S, C, or U: clearance (classification) of a subject S as
class(S) and to the classification of an object O as class(O).
Two restrictions are enforced on data access based on the subject/object
classifications:
1. A subject S is not allowed read access to an object O unless class(S) ≥
class(O). This is known as the simple security property.
2. A subject S is not allowed to write an object O unless class(S) ≤ class(O).
This known as the star property (or * property).
Each attribute value in a tuple is associated with a corresponding security
classification.
A multilevel relation schema R with n attributes would be represented as
R(A1,C1,A2,C2, …, An,Cn,TC)
Where each Ci represents the classification attribute associated with
attribute Ai. And TC is the tuple classification attribute.
The value of the TC attribute in each tuple t is the same as the highest of a
attribute classification values within t.
A multilevel relation will appear to contain different data to subjects
(users) with different clearance levels.
It is possible to store a single tuple in the relation at a higher classification
level and produce the corresponding tuples at a lower-level classification
through a process known as filtering.
The apparent key of a multilevel relation is the set of attributes that would
have formed the primary key in a regular(single-level) relation.
In some cases, it is necessary to store two or more tuples at different
classification levels with the same value for the apparent key. This leads to
the concept of polyinstantiation where several tuples can have the same
apparent key value but have different attribute values for users at
different classification levels.
The entity integrity rule for multilevel relations states that:
All attributes that are members of the apparent key must not be null
and must have the same security classification within each individual
tuple.
All other attribute values in the tuple must have a security
classification greater than or equal to that of the apparent key. This
constraint ensures that a user can see the key if the user is permitted
to see any part of the tuple at all.
3. Explain discretionary based security mechanism(05) [CO3]
The typical method of enforcing discretionary access control in a database
system is based on the granting and revoking privileges.
Types of discretionary Privileges:
The Account level: At this level, the DBA specifies the particular privileges
that each account holds independently of the relations in the database.
The relation (or table level): At this level, the DBA can control the privilege
to access each individual relation or view in the database.
The privileges at the account level apply to the capabilities provided to the
account itself and can include:
CREATE SCHEMA/TABLE
CREATE VIEW
ALTER
DROP MAD CST pe CV Send karne bola
MODIFY
SELECT etc.
The second level of privileges applies to the relation level, whether they
are base relations or virtual (view) relations.
Access matrix model :The granting and revoking of privileges generally
follow this authorization model for discretionary privileges.
The rows of a matrix M represents subjects (users, accounts,
programs)
The columns represent objects (relations, records, columns, views,
operations).
Each position M(i,j) in the matrix represents the types of privileges
(read, write, update) that subject i holds on object j.
The owner account holder can pass privileges on any of the owned
relation to other users by granting privileges to their accounts.
Suppose that A1 wants to allow A3 to retrieve information from employee
table and also to be able to propagate the SELECT privilege to other
accounts. A1 can issue the command:
GRANT SELECT ON EMPLOYEE TO A3 WITH GRANT OPTION;
6. What is temporal and bitemporal relation. Give an example for each. (05)
[CO4].
Temporal Relations:
A relation in which time related data is incorporated for storing historical data is
known as temporal relation.
This can be done using two methods:
1. Valid time relation: The associated time is a valid time in the real world.
We can convert the two relations EMPLOYEE and DEPARTMENT into valid
time relations by adding the attributes Vst (Valid Start Time) and Vet (Valid
End Time), whose data type is DATE in order to provide day granularity. In
EMP_VT, each tuple V represents a version of an employee’s information
that is valid (in the real world) only during the time period [V.Vst,V.Vet].
2. Transaction Time Relation: The associated time with is the value of the
system time clock.
In a transaction time database, whenever a change is applied to the
database, the actual timestamp of the transaction that applied the
change (insert, delete, or update) is recorded.
Such a database is most useful when changes are applied
simultaneously in the majority of cases—for example, real-time stock
trading or banking transactions.
The two relations EMPLOYEE and DEPARTMENT are converted into
transaction time relations by adding the attributes Tst (Transaction
Start Time) and Tet (Transaction End Time), whose data type is typically
TIMESTAMP.
FIG 26.9
Drill-down
Drill-down is performed by stepping down a concept hierarchy for the dimension time.
Initially the concept hierarchy was "day < month < quarter < year."
On drilling down, the time dimension is descended from the level of quarter to the level of month.
When drill-down is performed, one or more dimensions from the data cube are added.
It navigates the data from less detailed data to highly detailed data.
Slice
The slice operation selects one particular dimension from a given cube and provides a new
sub-cube. Consider the following diagram that shows how slice works.
Here Slice is performed for the dimension "time" using the criterion time = "Q1".
It will form a new sub-cube by selecting one or more dimensions.
Dice
Dice selects two or more dimensions from a given cube and provides a new sub-cube.
Consider the following diagram that shows the dice operation.
The dice operation on the cube based on the following selection criteria involves three
dimensions.
(location = "Toronto" or "Vancouver")
(time = "Q1" or "Q2")
(item =" Mobile" or "Modem")
Pivot
The pivot operation is also known as rotation. It rotates the data axes in view in order to
provide an alternative presentation of data. Consider the following diagram that shows the
pivot operation.
Operational data in the source system falls into two broad categories.
• Current value: The values are transient or transitory. As business
transactions happen, the values change. E.g. Customer name and address.
• Periodic Status: In this category, the value of the attribute is preserved as the
status every time a change occurs. E.g. data about an insurance policy.
• Two major types of data extractions from the source operational
systems: “as is”(static) data and data of revisions.
• Data storage
Data storage covers the process of loading the data from
the staging area into the data warehouse repository.
All functions for transforming and integrating the data are
completed in the data staging area.
Architecture and Services
• Information delivery
The information delivery component makes it easy for the
users to access the information either directly from the
enterprise-wide data warehouse, from the dependent data
marts, or from the set of conformed data marts.
Most of the information access in a data warehouse is
through online queries and interactive analysis sessions.
Almost all modern data warehouses provide for online
analytical processing (OLAP). The users perform complex
multidimensional analysis using the information cubes in the MDDBs.
Q5.Write a short note on challenges in ETL functions. [CO6]
ETL functions are challenging primarily because of the nature of the source
systems. Most of the challenges in ETL arise from the disparities among the
source operational systems.
Source systems are very diverse and disparate.
• There is usually a need to deal with source systems on multiple platforms
and different operating systems.
• Many source systems are older legacy applications running on obsolete
database technologies.
• Generally, historical data on changes in values are not preserved in source
operational systems. Historical information is critical in a data warehouse.
• Quality of data is dubious in many old source systems that have evolved
over time.
• Source system structures keep changing over time because of new
business conditions. ETL functions must also be modified accordingly.
• Gross lack of consistency among source systems is commonly prevalent.
Same data is likely to be represented differently in the various source
systems. For example,data on salary may be represented as monthly
salary, weekly salary, and bimonthly salary in different source payroll
systems.
• Even when inconsistent data is detected among disparate source systems,
lack of a means for resolving mismatches escalates the problem of
inconsistency.
• Most source systems do not represent data in types or formats that are
meaningful to the users. Many representations are cryptic and ambiguous.
• Designing ETL functions is time consuming and arduous.
Q6. List and explain basic tasks involved in data transformation. [CO6]
Selection:
This takes place at the beginning of the whole process of data
transformation.
Select either whole records or parts of several records from the
source systems.
The task of selection usually forms part of the extraction function
itself.
Splitting/joining:
This task includes the types of data manipulation you need to
perform on the selected parts of source records.
Sometimes, you will be splitting the selected parts even further
during data transformation.
Sometimes joining the parts selected from many different
source system
Conversion.
Conversions is done for two primary reasons—
1. To standardize among the data extractions from disparate
source systems.
2. To make the fields usable and understandable to
the users.
Summarization.
It may be that none of users ever need data at the lowest
granularity for analysis or querying.
For example, for a grocery chain, sales data at the lowest level
of detail for every transaction at the checkout may not be needed.
So, in this case, the data transformation function includes
Summarization of daily sales by product and by store. Hence data is
easily understandable.