Anda di halaman 1dari 69

Created by Amit S

Assignment No.1
Title of Assignment: Implement a system using Inheritance in ORDBMS. multivalued Attributes and

Relevant Theory / Literature Survey:

ORDBMS Definition
An object relational database is also called an object relational database management system (ORDBMS). This system simply puts an object oriented front end on a relational database (RDBMS). When applications interface to this type of database, it will normally interface as though the data is stored as objects. However the system will convert the object information into data tables with rows and columns and handle the data the same as a relational database. Likewise, when the data is retrieved, it must be reassembled from simple data into complex objects.

About Oracle Objects and Object Types

Oracle object types are user-defined data types that make it possible to model complex real-world entities such as customers and purchase orders as unitary entities--"objects"--in the database. Oracle object technology is a layer of abstraction built on Oracle's relational technology. New object types can be created from any built-in database types and any previously created object types, object references, and collection types. Metadata for user-defined types is stored in a schema that is available to SQL, PL/SQL, Java, and other published interfaces. Object types and related object-oriented features such as variable-length arrays and nested tables provide higherlevel ways to organize and access data in the database. Underneath the object layer, data is still stored in columns and tables, but you are able to work with the data in terms of the real-world entities--customers and purchase orders, for example--that make the data meaningful. Instead of thinking in terms of columns and tables when you query the

Created by Amit S database, you can simply select a customer. Internally, statements about objects are still basically statements about relational tables and columns, and you can continue to work with relational data types and store data in relational tables as before. But now you have the option to take advantage of object-oriented features too. You can begin to use object-oriented features while continuing to work with most of your data relationally, or you can go over to an object-oriented approach entirely. For instance, you can define some object data types and store the objects in columns in relational tables. You can also create object views of existing relational data to represent and access this data according to an object model. Or you can store object data in object tables, where each row is an object.

Advantages of Objects
In general, the object-type model is similar to the class mechanism found in C++ and Java. Like classes, objects make it easier to model complex, real-world business entities and logic, and the reusability of objects makes it possible to develop database applications faster and more efficiently. By natively supporting object types in the database, Oracle enables application developers to directly access the data structures used by their applications. No mapping layer is required between client-side objects and the relational database columns and tables that contain the data. Object abstraction and the encapsulation of object behaviors also make applications easier to understand and maintain. Below are listed several other specific advantages that objects offer over a purely relational approach. Objects Can Encapsulate Operations Along with Data Objects Are Efficient Objects Can Represent Part-Whole Relationships

Basic Components of Oracle Objects Object-Relational Elements

Object-relational functionality introduces a number of new concepts and resources. These are briefly described in the

Created by Amit S

following sections.

Object Types
An object type is a kind of data type. You can use it in the same ways that you use more familiar data types such as NUMBER or VARCHAR2. For example, you can specify an object type as the data type of a column in a relational table, and you can declare variables of an object type. You use a variable of an object type to contain a value of that object type. A value of an object type is an instance of that type. An object instance is also called an object. Object types also have some important differences from the more familiar data types that are native to a relational database:

A set of object types does not come ready-made with the database. Instead, you define the object types you want. Object types are not unitary: they have parts, called attributes and methods.

You can think of an object type as a structural blueprint or template and an object as an actual thing built according to the template.

Type Inheritance
You can specialize an object type by creating subtypes that have some added, differentiating feature, such as an additional attribute or method. You create subtypes by deriving them from a parent object type, which is called a super type of the derived subtypes. Subtypes and super types are related by inheritance: as specialized versions of their parent, subtypes have all the parent's attributes and methods plus any specializations that are defined in the subtype itself. Subtypes and super types connected by inheritance make up a type hierarchy.

When you create a variable of an object type, you create an instance of the type: the result is an object. An object has

Created by Amit S

the attributes and methods defined for its type. Because an object instance is a concrete thing, you can assign values to its attributes and call its methods.

Design Analysis / Implementation Logic: Implementation: Object Tables

An object table is a special kind of table in which each row represents an object. For example, the following statements create a person object type and define an object table for person objects: CREATE TYPE person AS OBJECT ( name VARCHAR2(30), phone VARCHAR2(20) ); CREATE TABLE person_table OF person; You can view this table in two ways:

As a single-column table in which each row is a person object, allowing you to perform object-oriented operations As a multi-column table in which each attribute of the object type person, namely name and phone, occupies a column, allowing you to perform relational operations

For example, you can execute the following instructions: INSERT INTO person_table VALUES ( "John Smith", "1-800-555-1212" ); SELECT VALUE(p) FROM person_table p WHERE = "John Smith"; The first statement inserts a person object into person_table,

Created by Amit S

treating person_table as a multi-column table. The second selects from person_table as a single-column table, using the VALUE function to return rows as object instances.

An array is an ordered set of data elements. All elements of a given array are of the same data type. Each element has an index, which is a number corresponding to the element's position in the array. The number of elements in an array is the size of the array. Oracle allows arrays to be of variable size, which is why they are called varrays. You must specify a maximum size when you declare the array type. For example, the following statement declares an array type: CREATE TYPE prices AS VARRAY(10) OF NUMBER(12,2); The VARRAYs of type PRICES have no more than ten elements, each of datatype NUMBER(12,2). Creating an array type does not allocate space. It defines a datatype, which you can use as:

The datatype of a column of a relational table. An object type attribute. The type of a PL/SQL variable, parameter, or function return value.

A varray is normally stored in line, that is, in the same tablespace as the other data in its row. If it is sufficiently large, Oracle stores it as a BLOB. A varray cannot contain LOBs. This means that a varray also cannot contain elements of a user-defined type that has a LOB attribute.

Nested Tables

Created by Amit S

A nested table is an unordered set of data elements, all of the same datatype. It has a single column, and the type of that column is a built-in type or an object type. If the column in a nested table is an object type, the table can also be viewed as a multi-column table, with a column for each attribute of the object type. For example, in the purchase order example, the following statement declares the table type used for the nested tables of line items: CREATE TYPE lineitem_table AS TABLE OF lineitem; A table type definition does not allocate space. It defines a type, which you can use as

The datatype of a column of a relational table. An object type attribute. A PL/SQL variable, parameter, or function return type.

When a column in a relational table is of nested table type, Oracle stores the nested table data for all rows of the relational table in the same storage table. Similarly, with an object table of a type that has a nested table attribute, Oracle stores nested table data for all object instances in a single storage table associated with the object table. For example, the following statement defines an object table for the object type PURCHASE_ORDER: CREATE TABLE purchase_order_table OF purchase_order NESTED TABLE lineitems STORE AS lineitems_table; The second line specifies LINEITEMS_TABLE as the storage table for the LINEITEMS attributes of all of the PURCHASE_ORDER objects in PURCHASE_ORDER_TABLE. A convenient way to access the elements of a nested table individually is to use a nested cursor.

Created by Amit S Testing The object person is created with name & phone no

Create the person_table using the object person Nested table purchase_order table with puchase_order and lineitems is created.

Conclusion: Multivalued implemented. attributes and inheritance in ORDBMS is

Assignment No.2 Title of Assignment:

Implement K-Means Data Mining Clustering Algorithm. Relevant Theory / Literature Survey: (Brief Theory Expected) .

What is K-Means Clustering?

In simple words, it is an algorithm to classify or to group your objects based on attributes/features into K number of group. K is positive integer number. The grouping is done by minimizing the sum of squares of distances between data and the corresponding cluster centroid. Thus, the purpose of Kmean clustering is to classify the data.

Step by step k means clustering algorithm

Created by Amit S

Step 1. Begin with a decision on the value of k = number of clusters Step 2. Put any initial partition that classifies the data into k clusters. You may assign the training samples randomly, or systematically as the following: 1. Take the first k training sample as single-element clusters 2. Assign each of the remaining (N-k) training sample to the cluster with the nearest centroid. After each assignment, recomputed the centroid of the gaining cluster. Step 3 . Take each sample in sequence and compute its distance from the centroid of each of the clusters. If a sample is not currently in the cluster with the closest centroid, switch this sample to that cluster and update the centroid of the cluster gaining the new sample and the cluster losing the sample. Step 4 . Repeat step 3 until convergence is achieved, that is until a pass through the training sample causes no new assignments. If the number of data is less than the number of cluster then we assign each data as the centroid of the cluster.

Created by Amit S Each centroid will have a cluster number. If the number of data is bigger than the number of cluster, for each data, we calculate the distance to all centroid and get the minimum distance. This data is said belong to the cluster that has minimum distance from this data.

Applications of K-mean clustering

There are a lot of applications of the K-mean clustering, range from unsupervised learning of neural network, Pattern recognitions, Classification analysis, Artificial intelligent, image processing, machine vision, etc. In principle, you have several objects and each object have several attributes and you want to classify the objects based on the attributes, then you can apply this algorithm.

Design Analysis / Implementation Logic:

Numerical Example of K-Means Clustering

The basic step of k-means clustering is simple. In the beginning we determine number of cluster K and we assume the centroid or center of these clusters. We can take any random objects as the initial centroids or the first K objects in sequence can also serve as the initial centroids. Then the K means algorithm will do the three steps below until convergence Iterate until stable (= no object move group): 1. Determine the centroid coordinate 2. Determine the distance of each object to the centroids 3. Group the object based on minimum distance Suppose we have several objects (4 types of medicines) and each object have two attributes or features as shown in table below. Our goal is to group these objects into K=2 group of medicine based on the two features (pH and weight index). Object attribute 1 (X):attribute 2 (Y): pH weight index Medicine A 1 1 Medicine B 2 1 Medicine C 4 3 Medicine D 5 4

Created by Amit S Each medicine represents one point with two attributes (X, Y) that we can represent it as coordinate in an attribute space as shown in the figure below.

1. Initial value of centroids : Suppose we use medicine A and medicine B as the first centroids. Let the coordinate of the centroids, then and and denote

2. Objects-Centroids distance : we calculate the distance between cluster centroid to each object. Let us use Eculidian Distance, then we have distance matrix at iteration 0 is

Created by Amit S

Each column in the distance matrix symbolizes the object. The first row of the distance matrix corresponds to the distance of each object to the first centroid and the second row is the distance of each object to the second centroid. For example, distance from medicine C = (4, 3) to the first centroid is , and its distance to

the second centroid is , etc. 3. Objects clustering : We assign each object based on the minimum distance. Thus, medicine A is assigned to group 1, medicine B to group 2, medicine C to group 2 and medicine D to group 2. The element of Group matrix below is 1 if and only if the object is assigned to that group.

4. Iteration-1, determine centroids : Knowing the members of each group, now we compute the new centroid of each group based on these new memberships. Group 1 only has one member thus the centroid remains in . Group 2 now has three members, thus the centroid is the average coordinate among the three members: .

Created by Amit S

5. Iteration-1, Objects-Centroids distances : The next step is to compute the distance of all objects to the new centroids. Similar to step 2, we have distance matrix at iteration 1 is

6. Iteration-1, Objects clustering: Similar to step 3, we assign each object based on the minimum distance. Based on the new distance matrix, we move the medicine B to Group 1 while all the other objects remain. The Group matrix is shown below

7. Iteration 2, determine centroids: Now we repeat step 4 to calculate the new centroids coordinate based on the clustering of previous iteration. Group1 and group 2 both has two members, thus the new centroids are and

Created by Amit S

8. Iteration-2, Objects-Centroids distances : Repeat step 2 again, we have new distance matrix at iteration 2 as

9. Iteration-2, Objects clustering: Again, we assign each object based on the minimum distance.

We obtain result that . Comparing the grouping of last iteration and this iteration reveals that the objects does not move group anymore. Thus, the computation of the k-mean clustering has reached its stability and no more iteration is needed. We get the final grouping as the results Object Feature 1 (X):Feature 2 (Y):Group (result) weight index pH Medicine A 1 1 1 Medicine B 2 1 1 Medicine C 4 3 2 Medicine D 5 4 2

Created by Amit S Testing:

When User Click picture box to input new data(X,Y)the program will make group/cluster the data by minimizing the sum of squares of distances between data and the corresponding cluster centroid. Each dot is representing an object and the coordinates (X, Y) represents the two attributes of the object. The colours of the dot and label number represents the cluster.

Thus grouped all the user data (X,Y)into three clusters by minimizing the sum of squares of distances between data and the corresponding cluster centroid.

Assignment No.3 Title of Assignment:

Design a Database. Web-based application using ASP involving

Created by Amit S

Relevant Theory / Literature Survey:

The need for ASP

Why bother with ASP at all, when HTML can serve your needs? If you want to display information, all you have to do is fire up your favorite text editor, type in a few HTML tags, and save it as an HTML file. But wait what if you want to display information that changes? Supposing youre writing a page that provides constantly changing information to your visitors, for example, weather reports, stock quotes, a list of your girlfriends, etc, HTML can no longer keep up with the pace. What you need is a system that can present dynamic information. And ASP fits the bill perfectly.

What is Active Server Pages?

Active Server Pages (ASPs) are Web pages that contain server-side scripts in addition to the usual mixture of text and HTML tags. Server-side scripts are special commands you put in Web pages that are processed before the pages are sent from the server to the web-browser of someone who's visiting your website. When you type a URL in the Address box or click a link on a webpage, you're asking a web-server on a computer somewhere to send a file to the web-browser (also called a "client") on your computer. If that file is a normal HTML file, it looks the same when your web-browser receives it as it did before the server sent it. After receiving the file, your web-browser displays its contents as a combination of text, images, and sounds. In the case of an Active Server Page, the process is similar, except there's an extra processing step that takes place just before the server sends the file. Before the server sends the Active Server Page to the browser, it runs all server-side scripts contained in the page. Some of these scripts display the current date, time, and other information. Others process information the user has just typed into a form, such as a page in the website's guestbook. And you can write your own code to put in whatever dynamic information you want. To distinguish Active Server Pages from normal HTML pages, Active Server Pages are given the ".asp" extension.

Created by Amit S

Requirements to run ASP

Since the server must do additional processing on the ASP scripts, it must have the ability to do so. The only servers which support this facility are Microsoft Internet Information Services & Microsoft Personal Web Server. Let us look at both in detail, so that you can decide which one is most suitable for you. Internet Information Services This is Microsofts web server designed for the Windows NT platform. It can only run on Microsoft Windows NT 4.0, Windows 2000 Professional, & Windows 2000 Server. The current version is 5.0, and it ships as a part of the Windows 2000 operating system. Personal Web Server This is a stripped-down version of IIS and supports most of the features of ASP. It can run on all Windows platforms, including Windows 95, Windows 98 & Windows Me. Typically, ASP developers use PWS to develop their sites on their own machines and later upload their files to a server running IIS. If you are running Windows 9x or Me, your only option is to use Personal Web Server 4.0.

The Object Model

ASP is a scripting environment revolving around its Object Model. An Object Model is simply a hierarchy of objects that you may use to get services from. In the case of ASP, all commands are issued to certain inbuilt objects, that correspond to the Client Request, Client Response, the Server, the Session & the Application respectively. All of these are for global use

Created by Amit S

Request: To get information from the user Response: To send information to the user Server: To control the Internet Information Server Session: To store information about and change settings for the user's current Web-server session Application: To share application-level information and control settings for the lifetime of the application The Request and Response objects contain collections (bits of information that are accessed in the same way). Objects use methods to do some type of procedure (if you know any object-oriented programming language, you know already what a method is) and properties to store any of the object's attributes (such as color, font, or size).

Created by Amit S Design Analysis / Implementation Logic:

Implementation: Database Connectivity

<HTML> <HEAD> </HEAD> <BODY> <% Dim DB Set DB = Server.CreateObject (ADODB.Connection) DB.Open("PROVIDER=Microsoft.Jet.OLEDB.4.0;DATA SOURCE=" + C:\Databases\Students.mdb) Dim RS Set RS = Server.CreateObject (ADODB.Recordset) RS.Open SELECT * FROM Students, DB %> </BODY> </HTML> The first few lines are the opening HTML tags for any page. Theres no ASP code within them. The ASP block begins with the statement, Dim DB which is a declaration of the variable that we are going to use later on. The second line, Set DB = Server.CreateObject (ADODB.Connection) does the following two things: Firstly, the right-hand-side statement, Server.CreateObject() is used to create an instance of a COM object which has the ProgID ADODB.Connection. The Set Statement then assigns this reference to our variable, DB. Now, we use the object just created to connect to the database using a Connection String. The string, "PROVIDER=Microsoft.Jet.OLEDB.4.0;DATA SOURCE=" +

Created by Amit S C:\Databases\Students.mdb is a string expression that tells our object where to locate the database, and more importantly, what type the database is whether it is an Access database, or a Sybase database, or else, is it Oracle. (Please note that this is a Connection String specific to Access 2000 databases. This example does not use ODBC.) If the DB.Open statement succeeds without an error, we have a valid connection to our database under consideration. Only after this can we begin to use the database. The immediate next lines, Dim RS Set RS = Server.CreateObject (ADODB.Recordset) serve the same purpose as the lines for ADODB.Connection object. Only now were ADODB.Recordset! Now, RS.Open SELECT * FROM Students, DB is perhaps the most important line of this example. Given an SQL statement, this line executes the query, and assigns the records returned to our Recordset object. The bare-minimum syntax, as you can see, is pretty straight-forward. Of course, the Recordset.Open (...) method takes a couple of more arguments, but they are optional, and would just complicate things at this juncture. Inserting Data into a Table <HTML> <HEAD> <TITLE>Student Records</TITLE> </HEAD> <BODY> <% Dim DB Set DB = Server.CreateObject (ADODB.Connection) DB.Mode = adModeReadWrite DB.Open("PROVIDER=Microsoft.Jet.OLEDB.4.0;DATA SOURCE=" + C:\Databases\Students.mdb) Dim RS Set RS = Server.CreateObject (ADODB.Recordset) RS.Open Students, DB, adOpenStatic, adLockPessimistic creating the creating an

Created by Amit S RS.AddNew RS (FirstName) = Kavitha RS (LastName) = Nair RS (Email) = RS (DateOfBirth) = CDate(4 Feb, 1980) RS.Update %> </BODY> </HTML>

Updating Records <HTML> <HEAD> <TITLE>Student Records</TITLE> </HEAD> <BODY> <% Dim DB Set DB = Server.CreateObject (ADODB.Connection) DB.Mode = adModeReadWrite DB.Open("PROVIDER=Microsoft.Jet.OLEDB.4.0;DATA SOURCE=" + C:\Databases\Students.mdb) Dim RS Set RS = Server.CreateObject (ADODB.Recordset) RS.Open SELECT * FROM Students WHERE FirstName Kavitha, DB, adOpenStatic, adLockPessimistic RS (Email) = RS (DateOfBirth) = CDate(4 Feb, 1980) RS.Update %> </BODY> </HTML> Deleting Records <HTML> <HEAD> <TITLE>Student Records</TITLE> </HEAD> <BODY> <% Dim DB Set DB = Server.CreateObject (ADODB.Connection) DB.Mode = adModeReadWrite

Created by Amit S DB.Open("PROVIDER=Microsoft.Jet.OLEDB.4.0;DATA SOURCE=" + C:\Databases\Students.mdb) DB.Execute (DELETE * FROM Students WHERE FirstName = Kavitha) %> </BODY> </HTML>

Retrieving Data <HTML> <HEAD> <TITLE>Student Records</TITLE> </HEAD> <BODY> <% Dim DB Set DB = Server.CreateObject (ADODB.Connection) DB.Open("PROVIDER=Microsoft.Jet.OLEDB.4.0;DATA SOURCE=" + C:\Databases\Students.mdb) Dim RS Set RS = Server.CreateObject (ADODB.Recordset) RS.Open SELECT * FROM Students, DB If RS.EOF And RS.BOF Then Response.Write There are 0 records. Else RS.MoveFirst While Not RS.EOF Response.Write RS.Fields (FirstName) Response.Write RS.Fields (LastName) Response.Write <HR> RS.MoveNext Wend End If %> </BODY> </HTML>

Created by Amit S

1. Insert data into Student Database. 2. Establish the connectivity with the database 3. Insert record, Delete Record , Update Record and retrieve data from the database.

A web based application for student registration is implemented with ASP. The application also performs adding new student, deleting a student and modifying a students record .

Assignment No.4
Title of Assignment: To create a simple multi-dimensional cube. Relevant Theory / Literature Survey: Installation of Analysis Services Of MSSQL 2000 is the primary requirement. When we installed MSSQL 2000 Analysis Services, Analysis Manager was also installed as a tool . What is a Cube? Cubes are the main objects in online analytic processing (OLAP), a technology that provides fast access to data in a data warehouse. A Cube is a set of data that is usually constructed from a subset of a data warehouse and is organized and summarized into a multidimensional structure defined by a set of dimensions and measures. A Cube provides an easy-to-use mechanism for querying data with quick and uniform response times. Every cube has a schema, which is the set of joined tables in the data warehouse from which the cube draws its source

Created by Amit S

data. The central table in the schema is the fact table, the source of the cube's measures. The other tables are dimension tables, the sources of the cube's dimensions. A cube is defined by the measures and dimensions that it contains. For example, a cube for sales analysis includes the measures Item_Sale_Price and Item_Cost and the dimensions Store_Location, Product_Line, and Fiscal_Year. This cube enables end users to separate Item_Sale_Price and Item_Cost into various categories by Store_Location, Product_Line, and Fiscal_Year. Each cube dimension can contain a hierarchy of levels to specify the categorical breakdown available to end users. For example, the Store_Location dimension includes the level hierarchy: Continent, Country, Region, State_Province, City, Store_Number. Each level in a dimension is of finer granularity than its parent. For example, continents contain countries, and states or provinces contain cities. Similarly, the hierarchy of the Fiscal_Year dimension includes the levels Year, Quarter, Month, and Day.

Dimension levels are a powerful data modeling tool because they allow end users to ask questions at a high level and then expand a dimension hierarchy to reveal more detail. Cubes are immediately subordinate to the database in the object hierarchy. A database is a container for related and cubes the objects they share. You must create a database before you create a cube. Data warehousing Objects Fact tables and dimension tables are the two types of objects commonly used in dimensional data warehouse schemas. Fact tables are the large tables in your warehouse schema that store business measurements. Fact tables typically contain facts and foreign keys to the dimension tables. Fact tables represent data, usually numeric and additive, that can be analyzed and examined. Dimension tables, also known as lookup or reference tables, contain the relatively static data in the warehouse.

Created by Amit S

Dimension tables store the information you normally use to contain queries Star Schema The star schema is the simplest data warehouse schema. It is called a star schema because the diagram resembles a star, with points radiating from a center. The center of the star consists of one or more fact tables and the points of the star are the dimension tables. Hierarchies Hierarchies are logical structures that use ordered levels as a means of organizing data. A hierarchy can be used to define data aggregation. Design Analysis / Implementation Logic:

The assignment includes

1. Prepare Analysis Services, as our environment, for the cube model we intend to design; 2. Create the basic cube model; 3. Perform dimension design and other steps as part of the cube creation process; 4. Save the model; 5. Design storage for the cube we have planned; 6. Process the cube and 7. Overview basic cube browse functionality. Testing:
(Input/ Output):

Created by Amit S

Created by Amit S

Created by Amit S

Created by Amit S

Conclusion: A simple multi-dimensional cube is created and studied

Assignment No.5
Title of Assignment: Study OF LDAP (Light weight Directory Access Protocol) Relevant Theory / Literature Survey:

Created by Amit S

Directory Service A Directory is like a database: you can put information in, and later retrieve it. But it is specialized. Some typical characteristics are: designed for reading more than writing, offers a static view of the data, simple updates without transactions. Directories are tuned to give quick-response to high-volume lookup or search operations. A Directory Service sports all of the above, plus a network protocol used to access the directory. And perhaps also a replication scheme, a data distribution scheme. The Lightweight Directory Access Protocol (LDAP) is a protocol for accessing online directory services. It runs directly over TCP, and can be used to access directory services back-ended by X.500, standalone LDAP directory services or other kinds of directory servers. X500 LDAP was originally developed as a front end to X.500, the OSI directory service. X.500 defines the Directory Access Protocol (DAP) for clients to use when contacting directory servers. DAP is a heavyweight protocol that runs over a full OSI stack and requires a significant amount of computing resources to run. LDAP runs directly over TCP and provides most of the functionality of DAP at a much lower cost. This use of LDAP makes it easy to access the X.500 directory. X500 in more depth In X.500, the namespace is explicitly stated and is hierarchical. Such namespaces require relatively complicated management schemes. The naming model defined in X.500 is concerned mainly with the structure of the entries in the namespace, not the way the information is presented to the user. Every entry in a X.500 Directory Information Tree, or DIT, is a collection of attributes, each attribute composed of a type element and one or more value elements. The X.500 standard defines 17 object classes for directories as a baseline. Being extensible, X.500 directories may include other objects defined by implementors. The 17 basic object classes include:

Created by Amit S

Alias Country Locality Organization Organizational Unit Person

Objects in these object classes are defined by their attributes. Some of the basic 40 attribute types include:

Common Name (CU) Organization Name (O) Organizational Unit Name (OU) Locality Name (L) Street Address (SA) State or Province Name (S) Country (C)

Putting this all together, an unambiguous entry for an addressee would be specified by its distinguished name, say {C=US, O=Acme, OU=Sales, CN=Fred} Sample X.500 hierarchy. Starting at the highest level, or Root, we can traverse the tree to successively lower levels, called Country, Organization, and Common Name, for instance. Applications and users access the directory via a directory user agent, or DUA. A DUA transfers the directory request to a DSA, or Directory System Agent, via DAP, the Directory Access Protocol. The directory itself is composed of one or more DSAs. The DSAs can either communicate among themselves to share directory information or may perform what is called a referral, i.e., direct the DUA to use a specific DSA. Referrals may occur when DSAs are not set up to exchange directory information, perhaps due to lack of interworking agreements between the administrators, or for security reasons. LDAP The LDAP standard defines

a network protocol for accessing information in the directory. It defines the operations one may perform e.g. search, add, delete, modify, change name. It also defines how operations and data are conveyed.

Created by Amit S

an information model defining the form and character of the information a namespace defining how information is referenced and organized an emerging distributed operation model defining how data may be distributed and referenced (v3) Both the protocol itself and the information model are extensible

Data Types Any data types can be into the directory: Text, Photos, URLs, Pointers to whatever, Binary data, Public Key certificates. Different types of data are held in attributes of different types. Each attribute type has a particular syntax. The LDAP standard describes a rich set of standard attribute types and syntax (based on X.500's set). Plus, you may define your own attributes, syntax, and even object classes -- you can tailor your directory to your own site's specific needs. The information model and namespace They are based on Entries. An entry is simply a place where one stores attributes. Each attribute has a type and one or more values. Entries themselves are "typed". This is accomplished by the objectClass attribute. The namespace is hierarchical, so it has the concept of fully-qualified names called Distinguished Names (DN).

Created by Amit S

Here, test Entry's dc=stanford, dc=edu"






Accessing an LDAP-based directory is accomplished by using a combination of DN, filter, and scope. A base DN indicates where in the hierarchy to begin the search. A filter specifies attribute types, assertion values, and matching criteria. A scope indicates what to search: the base DN itself, one level below the base DN, the entire sub-tree rooted at the base DN. How does LDAP work? LDAP directory service is based on a client-server model. One or more LDAP servers contain the data making up the LDAP directory tree. An LDAP client connects to an LDAP server and asks it a question. The server responds with the answer, or with a pointer to where the client can get more information (typically, another LDAP server). No matter which LDAP server a client connects to, it sees the same view of the directory; a name presented to one LDAP server references the same entry it would at another LDAP server. This is an important feature of a global directory service, like LDAP. Key Points of LDAP:

LDAP is an extensive, vendor-independent, open, network PROTOCOL standard: so accessing data is done transparently across a highly heterogeneous network (i.e. the Internet). An LDAP-based directory supports any type of data. Can configure an LDAP-based directory to play essentially any role. The LDAP protocol directly supports various forms of strong security (authentication, privacy, and integrity) technology. Can use general-purpose directory technology, such as LDAP, to glue together disparate facets of cyberspace, e.g. email, security, white& yellow-pages, directories, collaborative tools, MBone, etc.

Induvidual LDAP records

Created by Amit S

What's in a name? The DN of an LDAP entry All entries stored in an LDAP directory have a unique "Distinguished Name," or DN. The DN for each LDAP entry is composed of two parts: the Relative Distinguished Name (RDN) and the location within the LDAP directory where the record resides. The RDN is the portion of your DN that is not related to the directory tree structure. Most items that you'll store in an LDAP directory will have a name, and the name is frequently stored in the cn (Common Name) attribute. Since nearly everything has a name, most objects you'll store in LDAP will use their cn value as the basis for their RDN. If I'm storing a record for my favorite oatmeal recipe, I'll be using cn=Oatmeal Deluxe as the RDN of my entry.

My directory's base DN is dc=foobar,dc=com I'm storing all the LDAP records for my recipes in

The RDN of my LDAP record is cn=Oatmeal Deluxe

Given all this, what's the full DN of the LDAP record for this oatmeal recipe? Remember, it reads backwards - just like a host name in DNS.
cn=Oatmeal Deluxe,ou=recipes,dc=foobar,dc=com

Now it's time to tackle the DN of a company employee. For user accounts, you'll typically see a DN based either on the cn or on the uid (User ID). For example, the DN for FooBar's employee Fran Smith (login name: fsmith) might look like either of these two formats:

(login-based) LDAP (and X.500) use uid to mean "User ID", not to be confused with the UNIX uid number. Most companies try to give everyone a unique login name, so this approach makes good sense for storing information about employees. You don't have to worry about what you'll do when you hire the next Fran Smith, and if Fran changes her name (marriage? divorce? religious experience?), you won't have to change the DN of the LDAP entry.

Created by Amit S

(name-based) Here we see the Common Name (CN) entry used. In the case of an LDAP record for a person, think of the common name as their full name. One can easily see the downside to this approach: if the name changes, the LDAP record has to "move" from one DN to another. As indicated above, you want to avoid changing the DN of an entry whenever possible. An example of an induvidual LDAP entry. Let's look at an example. We'll use the LDAP record of Fran Smith, an employee from Foobar, Inc. The format of this entry is LDIF, the format used when exporting and importing LDAP directory entries. dn: uid=fsmith, ou=employees, dc=foobar, dc=com objectclass: person objectclass: organizationalPerson objectclass: inetOrgPerson objectclass: foobarPerson uid: fsmith givenname: Fran sn: Smith cn: Fran Smith cn: Frances Smith telephonenumber: 510-555-1234 roomnumber: 122G o: Foobar, Inc. mailRoutingAddress: mailhost: userpassword: {crypt}3x1231v76T89N uidnumber: 1234 gidnumber: 1200 homedirectory: /home/fsmith loginshell: /usr/local/bin/bash To start with, attribute values are stored with case intact, but searches against them are case-insensitive by default. Certain attributes (like password) are case-sensitive when searching. Let's break this entry down and look at it piece by piece. dn: uid=fsmith, ou=employees, dc=foobar, dc=com This is the full DN of Fran's LDAP entry, including the whole path to the entry in the directory tree. LDAP (and

Created by Amit S X.500) use uid to mean "User ID," not to be confused with the UNIX uid number. objectclass: person objectclass: organizationalPerson objectclass: inetOrgPerson objectclass: foobarPerson One can assign as many object classes as are applicable to any given type of object. The person object class requires that the cn (common name) and sn (surname) fields have values. Object Class person also allows other optional fields, including givenname, telephonenumber, and so on. The object class organizationalPerson adds more options to the values from person, and inetOrgPerson adds still more options to that (including email information). Finally, foobarPerson is Foobar's customized object class that adds all the custom attributes they wish to track at their company. uid: fsmith givenname: Fran sn: Smith cn: Fran Smith cn: Frances Smith telephonenumber: 510-555-1234 roomnumber: 122G o: Foobar, Inc. As mentioned before, uid stands for User ID. Just translate it in your head to "login" whenever you see it. Note that there are multiple entries for the CN. As mentioned above, LDAP allows some attributes to have multiple values, with the number of values being arbitrary. When would you want this? Let's say you're searching the company LDAP directory for Fran's phone number. While you might know her as Fran (having heard her spill her guts over lunchtime margaritas on more than one occasion), the people in HR may refer to her (somewhat more formally) as Frances. Because both versions of her name are stored, either search will successfully look up Fran's telephone number, email, cube number, and so on. mailRoutingAddress: mailhost: Like most companies on the Internet, Foobar uses Sendmail for internal mail delivery and routing. Foobar stores all

Created by Amit S users' mail routing information in LDAP, which is fully supported by recent versions of Sendmail. userpassword: {crypt}3x1231v76T89N uidnumber: 1234 gidnumber: 1200 gecos: Frances Smith homedirectory: /home/fsmith loginshell: /usr/local/bin/bash Note that Foobar's systems administrators store all the password map information in LDAP as well. At Foobar, foobarPerson object class adds this capability. Note that user password is stored in UNIX crypt format. The UNIX is stored here as uidnumber. NIS the the uid

Conclusion: Thus, the Light weight Directory studied. Access Protocol is

Assignment No 6 (a) Title: Case Study of SQL SERVER

What is SQL Server? SQL Server 2000 is a family of products designed to meet the data storage requirements of large data processing systems and commercial Web sites, as well as meet the easeof-use requirements of individuals and small businesses. At its core, SQL Server 2000 provides two fundamental services to the emerging Microsoft .NET platform, as well as in the traditional two-tier client/server environment. The first service is the SQL Server service, which is a highperformance, highly scalable relational database engine. The second service is SQL Server 2000 Analysis Services, which provides tools for analyzing the data stored in data warehouses and data marts for decision support.

Created by Amit S Microsoft SQL Server is a complete database and analysis solution for rapidly delivering the next generation of scalable Web applications. SQL Server is a key component in supporting e-commerce, line-of-business, and data warehousing applications, while offering the scalability necessary to support growing, dynamic environments. SQL Server includes rich support for Extensible Markup Language (XML) and other Internet language formats; performance and availability features to ensure uptime; and advanced management and tuning functionality to automate routine tasks and lower the total cost of ownership.

The SQL Server 2000 Environment

The traditional client/server database environment consists of client applications and a relational database management system (RDBMS) that manages and stores the data. In this traditional environment, the client applications that provide the interface for users to access SQL Server 2000 are intelligent (or thick) clients, such as custom-written Microsoft Visual Basic programs that access the data on SQL Server 2000 directly using a local area network. The emerging Microsoft .NET platform consists of highly distributed, loosely connected, programmable Web services executing on multiple servers. In this distributed, decentralized environment, the client applications are thin clients, such as Internet browsers, which access the data on SQL Server 2000 through Web services such as Microsoft Internet Information Services (IIS).

Created by Amit S

SQL Server 2000 Components

SQL Server 2000 provides a number of different types of components. At the core are server components. These server components are generally implemented as 32-bit Windows services. SQL Server 2000 provides client-based graphical tools and command-prompt utilities for administration. These tools and utilities, as well as all other client applications, use client communication components provided by SQL Server 2000. The communication components provide various ways in which client applications can access data through communication with the server components. These communication components are implemented as providers, drivers, database interfaces, and Net-Libraries.

Server Components

Created by Amit S The server components of SQL Server 2000 are normally implemented as 32-bit Windows services. The SQL Server and SQL Server Agent services may also be run as standalone applications on any supported Windows operating system platform. Table lists the server components and briefly describes their function. It also specifies how the component is implemented when multiple instances are used. Table: Server Components and Their Functions Server Description Component MSSQLServer service implements the SQL Server SQL Server 2000 database engine. There is one service service for each instance of SQL Server 2000. Microsoft SQL MSSQLServerOLAPService implements SQL Server Server 2000 2000 Analysis Services. There is only one Analysis service, regardless of the number of Services instances of SQL Server 2000. service SQLServerAgent service implements the agent SQL Server that runs scheduled SQL Server 2000 Agent service administrative tasks. There is one service for each instance of SQL Server 2000. Microsoft Search implements the full-text Microsoft search engine. There is only one service, Search service regardless of the number of instances of SQL Server 2000. Distributed Transaction Coordinator manages distributed transactions between instances of Microsoft (MS SQL Server 2000. There is only one service, DTC) service regardless of the number of instances of SQL Server 2000.

Client-Based Graphical Tools

Table lists the 32-bit graphical tools provided by SQL Server 2000 and briefly describes their function. Table: Graphical Tools in SQL Server 2000 Graphical Description Tool SQL Server SQL Server Enterprise Manager is the primary Enterprise administrative tool for SQL Server and Manager provides a Microsoft Management Console (MMC) compliant user interface that helps you to perform a variety of administrative tasks:

Created by Amit S Graphical Tool Description Defining groups of servers running SQL Server Registering individual servers in a group Configuring all SQL Server options for each registered server Creating and administering all SQL Server databases, objects, logins, users, and permissions in each registered server Defining and executing all SQL Server administrative tasks on each registered server Designing and testing SQL statements, batches, and scripts interactively by invoking SQL Query Analyzer Invoking the various wizards defined for SQL Server SQL Server SQL Query Analyzer is a graphical tool that helps you to perform a variety of tasks: Creating queries and other SQL scripts and SQL Query executing them against SQL Server databases Analyzer Creating commonly used database objects from predefined scripts Copying existing database objects Executing stored procedures without knowing the parameters SQL Profiler is a tool that captures SQL Server events from a server. The events are saved in a trace file that can later SQL Profiler be analyzed or used to replay a specific series of steps when trying to diagnose a problem. SQL Server A taskbar application used to start, Service stop, pause, or modify SQL Server Manager 2000 services. SQL Server SQL Server Agent runs on the server that is Agent running instances of SQL Server. SQL Server Agent is responsible for the following tasks: Running SQL Server tasks that are scheduled to

Created by Amit S Graphical Tool Description occur at specific times or intervals Detecting specific conditions for which administrators have defined an action, such as alerting someone through pages or e-mail, or issuing a task that will address the conditions Running replication tasks defined by administrators

Created by Amit S

Created by Amit S

Created by Amit S

The Relational Database Architecture

SQL Server 2000 data is stored in databases. Physically, a database consists of two or more files on one or more disks. This physical implementation is visible only to database administrators, and is transparent to users. The physical optimization of the database is primarily the responsibility of the database administrator. Logically, a database is structured into components that are visible to users, such as tables, views, and stored procedures. The logical optimization of the database (such as the design of tables and indexes) is primarily the responsibility of the database designer. ISBN 0-7356-0634X).

System and User Databases

Created by Amit S Each instance of SQL Server 2000 has four system databases. Table 1.6 lists each of these system databases and briefly describes their function. In addition, each instance of SQL Server 2000 has one or more user databases. The pubs and Northwind user databases are sample databases that ship with SQL Server 2000. Given sufficient system resources, each instance of SQL Server 2000 can handle thousands of users working in multiple databases simultaneously. Table: System Databases in SQL Server 2000 System Description Database Records all of the system-level information for a SQL Server 2000 system, including all other master databases, login accounts, and system configuration settings. Stores all temporary tables and stored procedures tempdb created by users, as well as temporary worktables used by the relational database engine itself. Serves as the template that is used whenever a new model database is created. SQL Server Agent uses this system database for msdb scheduling alerts and jobs, and recording operators.

Physical Structure of a Database

Each database consists of at least one data file and one transaction log file. These files are not shared with any other database. To optimize performance and to provide fault tolerance, data and log files are typically spread across multiple drives and frequently use a redundant array of independent disks (RAID). Extents and Pages SQL Server 2000 allocates space from a data file for tables and indexes in 64-KB blocks called extents. Each extent consists of eight contiguous pages of 8 KB each. There are two types of extents: uniform extents that are owned by a single object, and mixed extents that are shared by up to eight objects. A page is the fundamental unit of data storage in SQL Server 2000, with the page size being 8 KB. In general, data pages store data in rows on each data page. The maximum amount of data contained in a single row is 8060 bytes. Data rows are either organized in some kind of order

Created by Amit S based on a key in a clustered index (such as zip code), or stored in no particular order if no clustered index exists. The beginning of each page contains a 96-byte header that is used to store system information, such as the amount of free space available on the page.

Transaction Log Files

The transaction log file resides in one or more separate physical files from the data files and contains a series of log records, rather than pages allocated from extents. To optimize performance and aid in redundancy, transaction log files are typically placed on separate disks from data files, and are frequently mirrored using RAID.

Logical Structure of a Database

Data in SQL Server 2000 is organized into database objects that are visible to users when they connect to a database. Table lists these objects and briefly describes their function. Table: Database Objects in SQL Server 2000 Database Description Object A table generally consists of columns and rows of data in a format similar to that of a spreadsheet. Each row in the table represents Tables a unique record, and each column represents a field within the record. A data type specifies what type of data can be stored in a column. Views can restrict the rows or the columns of a table that are visible, or can combine data Views from multiple tables to appear like a single table. A view can also aggregate columns. An index is a structure associated with a table or view that speeds retrieval of rows from the table or view. Table indexes are Indexes either clustered or nonclustered. Clustering means the data is physically ordered based on the index key. A key is a column or group of columns that uniquely identifies a row (PRIMARY KEY), Keys defines the relationship between two tables (FOREIGN KEY), or is used to build an index. User-defined A user-defined data type is a custom data data types type, based on a predefined SQL Server 2000 data type. It is used to make a table

Created by Amit S Database Object Description

structure more meaningful to programmers and help ensure that columns holding similar classes of data have the same base data type. A stored procedure is a group of Transact-SQL Stored statements compiled into a single execution procedures plan. The procedure is used for performance optimization and to control access. Constraints define rules regarding the values Constraints allowed in columns and are the standard mechanism for enforcing data integrity. A default specifies what values are used in a column in the event that you do not specify a Defaults value for the column when you are inserting a row. A trigger is a special class of stored procedure defined to execute automatically Triggers when an UPDATE, INSERT, or DELETE statement is issued against a table or view. A user-defined function is a subroutine made up of one or more Transact-SQL statements used User-defined to encapsulate code for reuse. A function can functions have a maximum of 1024 input parameters. Userdefined functions can be used in place of views and stored procedures.

The Security Architecture

Logins, users, roles, and groups are the foundation for the security mechanisms of SQL Server. Users who connect to SQL Server must identify themselves by using a Specific Login Identifier (ID). Users can then see only the tables and views that they are authorized to see and can execute only the stored procedures and administrative functions that they are authorized to execute. This system of security is based on the IDs used to identify users.

Created by Amit S

Allocating Space for Tables and Indexes

Before SQL Server 2000 can store information in a table or an index, free space must be allocated from within a data file and assigned to that object. Free space is allocated for tables and indexes in units called extents. An extent is 64 KB of space, consisting of eight contiguous pages, each 8 KB in size. There are two types of extents, mixed

Created by Amit S extents and uniform extents. SQL Server 2000 uses mixed extents to store small amounts of data for up to eight objects within a single extent and uses uniform extents to store, whereas SQL Server 2000 uses uniform extents to store data from a single object. When a new table or index is created, SQL Server 2000 locates a mixed extent with a free page and allocates the free page to the newly created object. A page contains data for only one object. When an object requires additional space, SQL Server 2000 allocates free space from mixed extents until an object uses a total of eight pages. Thereafter, SQL Server 2000 allocates a uniform extent to that object. SQL Server 2000 will grow the data files in a round-robin algorithm if no free space exists in any data file and autogrow is enabled. When SQL Server 2000 needs a mixed extent with at least one free page, a Secondary Global Allocation Map (SGAM) page is used to locate such an extent. Each SGAM page is a bitmap covering 64,000 extents (approximately 4 GB) that is used to identify allocated mixed extents with at least one free page. Each extent in the interval that SGAM covers is assigned a bit. The extent is identified as a mixed extent with free pages when the bit is set to 1. When the bit is set to 0, the extent is either a mixed extent with no free pages, or the extent is a uniform extent. When SQL Server 2000 needs to allocate an extent from free space, a Global Allocation Map (GAM) page is used to locate an extent that has not previously been allocated to an object. Each GAM page is a bitmap that covers 64,000 extents, and each extent in the interval it covers is assigned a bit. When the bit is set to 1, the extent is free. When the bit is set to 0, the extent has already been allocated.

Storing Index and Data Pages

In the absence of a clustered index, SQL Server 2000 stores new data on any unfilled page in any available extent belonging to the table into which the data is being inserted. This disorganized collection of data pages is called a heap. In a heap, the data pages are stored in no specific order and are not linked together. In the absence of either a clustered or a nonclustered index, SQL Server 2000 has to search the entire table to locate a record within the table (using IAM pages to identify pages

Created by Amit S associated with the table). On a large table, this complete search is quite inefficient. To speed this retrieval process, database designers create indexes for SQL Server 2000 to use to find data pages quickly. An index stores the value of an indexed column (or columns) from a table in a B-tree structure. A B-tree structure is a balanced hierarchal structure (or tree) consisting of a root node, possible intermediate nodes, and bottom-level leaf pages (nodes). All branches of the B-tree have the same number of levels. A B-tree physically organizes index records based on these key values. Each index page is linked to adjacent index pages. SQL Server 2000 supports two types of indexes, clustered and nonclustered. A clustered index forces the physical ordering of data pages within the data file based on the key value used for the clustered index (such as last name or zip code). The leaf level of a clustered index is the data level. When a new data row is inserted into a table containing a clustered index, SQL Server 2000 traverses the B-tree structure and determines the location for the new data row based on the ordering within the B-tree (moving existing data and index rows as necessary to maintain the physical ordering). See Figure 5.1. The leaf level of a nonclustered index contains a pointer telling SQL Server 2000 where to find the data row corresponding to the key value contained in the nonclustered index. When a new data row is inserted into a table containing only a nonclustered index, a new index row is entered into the B-tree structure, and the new data row is entered into any page in the heap that has been allocated to the table and contains sufficient free space. See Figure 5.2.

Created by Amit S


Created by Amit S


Overview of MySQL AB MySQL AB is the company of the MySQL founders and main developers. MySQL AB was originally established in Sweden by David Axmark, Allan Larsson, and Michael Monty Widenius. The MySQL Web site ( latest information about MySQL and MySQL AB. provides the

The AB part of the company name is the acronym for the Swedish aktiebolag, or stock company. It translates to MySQL, Inc. Overview of the MySQL Database Management System MySQL, the most popular Open Source SQL database management system uses the standard SQL interface. It is developed, distributed, and supported by MySQL AB. MySQL is very popular as a back end for web applications. The MySQL engine can be accessed from most major programming/scripting languages such as perl, and php making it easy to develop applications. - MySQL is a database management system. - MySQL is a relational database management system. - MySQL software is Open Source. - The MySQL Database Server is very fast, reliable, and easy to use. - MySQL Server works in client/server or embedded systems. - A large amount of contributed MySQL software is available. The Main Features of MySQL Internals and Portability: Written in C and C++. Tested with a broad range of different compilers. Works on many different platforms.

Created by Amit S APIs for C, C++, Java, Perl, PHP, Python, etc. are available. Fully multi-threaded using kernel threads. It can easily use multiple CPUs if they are available. Provides transactional and non-transactional storage engines. Uses very fast B-tree disk tables (MyISAM) with index compression. Relatively easy to add other storage engines. This is useful if you want to add an SQL interface to an in-house database. A very fast thread-based memory allocation system. Very fast joins using an optimized one-sweep multi-join. In-memory hash tables, which are used as temporary tables. SQL functions are implemented using a highly optimized class library and should be as fast as possible. Usually there is no memory allocation at all after query initialization. The MySQL code is tested with Purify (a commercial memory leakage detector) as well as with Valgrind, a GPL tool. The server is available as a separate program for use in a client/server networked environment. It is also available as a library that can be embedded (linked) into standalone applications. Such applications can be used in isolation or in environments where no network is available. Data Types: Many data types: signed/unsigned integers 1, 2, 3, 4, and 8 bytes long, FLOAT, DOUBLE, CHAR, VARCHAR, TEXT, BLOB, DATE, TIME, DATETIME, TIMESTAMP, YEAR, SET, ENUM, and OpenGIS spatial types. Fixed-length and variable-length records. Statements and Functions: Full operator and function support in the SELECT and WHERE clauses of queries. For example: mysql> SELECT CONCAT(first_name, ' ', last_name) -> FROM citizen -> WHERE income/dependents > 10000 AND age > 30; Full support for SQL GROUP BY and ORDER BY clauses. Support for group functions (COUNT(), COUNT(DISTINCT ...), AVG(), STD(), SUM(), MAX(), MIN(), and GROUP_CONCAT()). Support for LEFT OUTER JOIN and RIGHT OUTER JOIN with both standard SQL and ODBC syntax. Support for aliases on tables and columns as required by standard SQL.

Created by Amit S DELETE, INSERT, REPLACE, and UPDATE return the number of rows that were changed (affected). It is possible to return the number of rows matched instead by setting a flag when connecting to the server. The MySQL-specific SHOW statement can be used to retrieve information about databases, storage engines, tables, and indexes. The EXPLAIN statement can be used to determine how the optimizer resolves a query. Function names do not clash with table or column names. For example, ABS is a valid column name. The only restriction is that for a function call, no spaces are allowed between the function name and the ( that follows it. You can mix tables from different databases in the same query. Security: A privilege and password system that is very flexible and secure, and that allows host-based verification. Passwords are secure because all password traffic is encrypted when you connect to a server. Scalability and Limits: Handles large databases. We use MySQL Server with databases that contain 50 million records. We also know of users who use MySQL Server with 60,000 tables and about 5,000,000,000 rows. Up to 64 indexes per table are allowed (32 before MySQL 4.1.2). Each index may consist of 1 to 16 columns or parts of columns. The maximum index width is 1000 bytes (767 for InnoDB); before MySQL 4.1.2, the limit is 500 bytes. An index may use a prefix of a column for CHAR, VARCHAR, BLOB, or TEXT column types. Connectivity: Clients can connect to the MySQL server using TCP/IP sockets on any platform. On Windows systems in the NT family (NT, 2000, XP, 2003, or Vista), clients can connect using named pipes. On Unix systems, clients can connect using Unix domain socket files. In MySQL 4.1 and higher, Windows servers also support shared-memory connections if started with the --sharedmemory option. Clients can connect through shared memory by using the --protocol=memory option. The Connector/ODBC (MyODBC) interface provides MySQL support for client programs that use ODBC (Open Database Connectivity) connections. For example, you can use MS

Created by Amit S Access to connect to your MySQL server. Clients can be run on Windows or Unix. MyODBC source is available. All ODBC 2.5 functions are supported, as are many others. The Connector/J interface provides MySQL support for Java client programs that use JDBC connections. Clients can be run on Windows or Unix. Connector/J source is available. MySQL Connector/NET enables developers to easily create .NET applications that require secure, highperformance data connectivity with MySQL. It implements the required ADO.NET interfaces and integrates into ADO.NET aware tools. Developers can build applications using their choice of .NET languages. MySQL Connector/NET is a fully managed ADO.NET driver written in 100% pure C#. Localization: The server can provide error messages to clients in many languages. See Section 5.11.2, Setting the Error Message Language. Full support for several different character sets, including latin1 (cp1252), german, big5, ujis, and more. For example, the Scandinavian characters , and are allowed in table and column names. Unicode support is available as of MySQL 4.1. All data is saved in the chosen character set. All comparisons for normal string columns are case-insensitive. Sorting is done according to the chosen character set (using Swedish collation by default). It is possible to change this when the MySQL server is started. To see an example of very advanced sorting, look at the Czech sorting code. MySQL Server supports many different character sets that can be specified at compile time and runtime. Clients and Tools: MySQL Server has built-in support for SQL statements to check, optimize, and repair tables. These statements are available from the command line through the mysqlcheck client. MySQL also includes myisamchk, a very fast commandline utility for performing these operations on MyISAM tables. All MySQL programs can be invoked with the --help or -? options to obtain online assistance.

Created by Amit S

System Architecture

Transaction Management
Transaction Overview

Created by Amit S Transaction - A sequence of executions of SQL statements that can be treated as a single unit in which all data changes can be committed or cancelled as a whole. Most database servers offer two transaction management modes: Auto Commit On: Each SQL statement is a transaction. Data changes resulted from each statement are automatically committed. Auto Commit Off: Transactions are explicitly started and ended by the client program. Data changes are not committed unless requested by the client program. Most database server supports the following statements for transaction management: Commit Statement - To commit all changes in the current transaction. Rollback Statement - To rollback all changes in the current transaction. Start Transaction Statement To start a new transaction. Transactions are not explicitly started on the storage engine level, but are instead implicitly started through calls to either start_stmt() or external_lock(). If the preceding methods are called and a transaction already exists the transaction is not replaced. The storage engine stores transaction information in per-connection memory and also registers the transaction in the MySQL server to allow the server to later issue COMMIT and ROLLBACK operations. As operations are performed the storage engine will have to implement some form of versioning or logging to permit a rollback of all operations executed within the transaction. After work is completed, the MySQL server will call either the commit() method or the rollback() method defined in the storage engine's handlerton. MySQL Support of Transaction Management MySQL support following rules:






Only two storage engines support transaction management: InnoDB and BDB. The default storage engine, MyISAM, doesn't support transaction management.

Created by Amit S

To force a table to use a non-default storage engine, you must specify the engine name in the "create table" statement.

Statements related to transaction management: SET AUTOCOMMIT = 0 | 1; START TRANSACTOIN; COMMIT; ROLLBACK; Note that:

SET AUTOCOMMIT = 1 - Turns on the auto-commit option. It also commits and terminates the current transaction. SET AUTOCOMMIT = 0 - Turns off the auto-commit option. It also starts a new transaction By default, auto-commit option is turned on when a new session is established. COMMIT - Commits the current transaction. ROLLBACK - Rolls back the current transaction. START TRANSACTION - Commits the current transaction and starts a new transaction.

Transaction Isolation Levels The impact of a transaction in the current session is simple. However, concurrent transactions in multiple sessions may impact each other in many ways. Three phenomena have been observed in concurrent transactions:

Dirty Read - One transaction T1 reads uncommitted changes from another transaction T2. If T2 performs a rollback later, T1 may have used incorrect data from the uncommitted changes. Non-Repeatable Read - One transaction T1 reads a row, which is changed and committed by another transaction T2 later. Now if T1 reads the same row again, the result will be will be different from the first read. Phantom - One transaction T1 reads a set of rows that satisfy a condition. Another transaction T2 then inserts some new rows that satisfy the same condition. If T1 repeats the same read, it will receive some "phantom" rows.

Created by Amit S To be able to control and avoid those phenomena, 4 transaction isolation levels have been defined by SQL standards:

Read Uncommitted - This is the lowest isolation level. All three phenomena are possible. Read Committed - Dirty Read is prevented. But NonRepeatable Read and Phantom are possible. Repeatable Read - Dirty Read and Non-Repeatable Read are prevented. But Phantom is still possible. Serializable - This is the highest isolation level. All three phenomena are prevented.

MySQL Support of Transaction Isolation Levels

Transaction isolation levels are supported by the InnoDB storage engine. The default isolation level is "Repeatable Read". The SET statement can be used to change the isolation level for the next transaction: "SET TRANSACTION ISOLATION LEVEL level_name". The SET statement can be used to change the isolation level for the entire session, starting with the next transaction: "SET SESSION TRANSACTION ISOLATION LEVEL level_name".

Starting a Transaction A transaction is started by the storage engine in response to a call to either the start_stmt() or external_lock() methods. If there is no active transaction, the storage engine must start a new transaction and register the transaction with the MySQL server so that ROLLBACK or COMMIT can later be called. Implementing ROLLBACK Of the two major transactional operations, ROLLBACK is the more complicated to implement. All operations that occurred during the transaction must be reversed so that all rows are unchanged from before the transaction began. To support ROLLBACK, create a method that matches this definition: int (*rollback)(THD *thd, bool all);

Created by Amit S The method name is then listed in the rollback (thirteenth) entry of the handlerton. The THD parameter is used to identify the transaction that needs to be rolled back, while the bool all parameter indicates whether the entire transaction should be rolled back or just the last statement. Details of implementing a ROLLBACK operation will vary by storage engine. Implementing COMMIT During a commit operation, all changes made during a transaction are made permanent and a rollback operation is not possible after that. Depending on the transaction isolation used, this may be the first time such changes are visible to other threads. To support COMMIT, create a method that matches this definition: int (*commit)(THD *thd, bool all); The method name is then listed in the commit (twelfth) entry of the handlerton. The THD parameter is used to identify the transaction that needs to be committed, while the bool all parameter indicates if this is a full transaction commit or just the end of a statement that is part of the transaction. Details of implementing a COMMIT operation will vary by storage engine. If the server is in auto-commit mode, the storage engine should automatically commit all readonly statements such as SELECT. In a storage engine, "autocommitting" works by counting locks. Increment the count for every call to external_lock(), decrement when external_lock() is called with an argument of F_UNLCK. When the count drops to zero, trigger a commit. Adding Support for Savepoints This should be a fixed size, preferably not large as the MySQL server will allocate space to store the savepoint for all storage engines with each named savepoint. When a COMMIT or ROLLBACK operation occurs (with bool all set to true), all savepoints are assumed to be released. If the storage engine allocates resources for savepoints, it should free them.

Indexing and Storage

Created by Amit S

Indexing Indexes are a special system that databases use to improve the overall performance. By setting indexes on your tables, you are telling MySQL to pay particular attention to that column (in layman's terms). In fact, MySQL creates extra files to store and track indexes efficiently. MySQL allows for up to 32 indexes for each table, and each index can incorporate up to 16 columns. While a multicolumn index may not seem obvious, it will come in handy for searches frequently performed on the same set of multiple columns (e.g., first and last name, city and state, etc.) Indexes are used to find rows with specific column values quickly. Without an index, MySQL must begin with the first row and then read through the entire table to find the relevant rows. The larger the table, the more this costs. If the table has an index for the columns in question, MySQL can quickly determine the position to seek to in the middle of the data file without having to look at all the data. If a table has 1,000 rows, this is at least 100 times faster than reading sequentially. If you need to access most of the rows, it is faster to read sequentially, because this minimizes disk seeks. Indexes are a way to increase performance and efficiency in a database table. If you have a table with many columns but you are always doing searches on one or two of those columns you can tell MySQL to index those columns. When you do a search (or a sort) using an indexed column the MySQL engine only has to process the much smaller index instead of the entire table to find the right field. You can also specify that an index is unique which is an even bigger performance benefit because once the engine finds the value it can stop because there can't be another one like it. You can add an index to a table with the command: alter table <table> add <index|unique> <index> (<column>[,column2...]). Indexing is a must on large tables. The performance can be horrible without them. Most MySQL indexes (PRIMARY KEY, UNIQUE, INDEX, and FULLTEXT) are stored in B-trees. Exceptions are that indexes on spatial data types use R-trees, and that MEMORY tables also support hash indexes.

Created by Amit S MySQL uses indexes for these operations:

To find the rows matching a WHERE clause quickly. To eliminate rows from consideration. If there is a choice between multiple indexes, MySQL normally uses the index that finds the smallest number of rows. To retrieve rows from other tables when performing joins. To find the MIN() or MAX() value for a specific indexed column key_col. The index also can be used for LIKE comparisons if the argument to LIKE is a constant string that does not start with a wildcard character.

Sometimes MySQL does not use an index, even if one is available. One circumstance under which this occurs is when the optimizer estimates that using the index would require MySQL to access a very large percentage of the rows in the table. (In this case, a table scan is likely to be much faster because it requires fewer seeks.) However, if such a query uses LIMIT to retrieve only some of the rows, MySQL uses an index anyway, because it can much more quickly find the few rows to return in the result. Storage Data in MySQL is stored in files (or memory) using a variety of different techniques. Each of these techniques employs different storage mechanisms, indexing facilities, locking levels and ultimately provides a range of different functions and capabilities. By choosing a different technique you can gain additional speed or functionality benefits that will improve the overall functionality of your application. For example, if you work with a large amount of temporary data, you may want to make use of the MEMORY storage engine, which stores all of the table data in memory. Alternatively, you may want a database that supports transactions (to ensure data resilience). Each of these different techniques and suites of functionality within the MySQL system is referred to as a storage engine (also known as a table type). By default, MySQL comes with a number of different storage engines preconfigured and enabled in the MySQL server. You can select the storage engine to use on a server, database and even

Created by Amit S table basis, providing you with the maximum amount of flexibility when it comes to choosing how your information is stored, how it is indexed and what combination of performance and functionality you want to use with your data. This flexibility to choose how your data is stored and indexed is a major reason why MySQL is so popular; other database systems, including most of the commercial options, support only a single type of database storage. Unfortunately the 'one size fits all approach' in these other solutions means that either you sacrifice performance for functionality, or have to spend hours or even days finely tuning your database. With MySQL, we can just change the engine we are using. Programmatically this is nothing special, it is normal practice to divide a program into modules and layers. But it is unique for a DBMS (Database Management System), because a developer and even a DBA (Database Administrator) is traditionally insulated from the physical storage methods that the database server may employ. How the data is stored really does not concern them, as the server just takes care of everything. That being the case, a developer or DBA could benefit from knowing a bit more about such things as it may help them to optimize applications. This is an angle that may be applied to many aspects of database servers, but in this article we'll focus on the storage engines. Why have storage engine interrelated reasons: layers? There are a number of

Technology evolves. As new features are developed, maintaining backward compatibility in the file format is not always possible. Users, would need to run a conversion tool when they upgrade, or even dump/import their entire dataset. This is obviously very inconvenient. It would be much nicer if users could upgrade their server (for bugfixes and other new features) without also having to migrate all their data. This means that a single version of the server has to support multiple file formats. For server developers, changes in the data storage code may require related changes elsewhere in the server, and like with all new code there is always the possibility of introducing bugs. This calls for abstraction: changes in

Created by Amit S the underlying code, to a large extent, should not affect the code at higher levels. Different applications have different requirements with regard to data storage, and some of these requirements may even conflict. Think of a banking application that requires highly secure transaction processing, versus traffic logging on a website. Typically, there are differences in the number and balance of selects and updates, as well as the need for transactions and isolation levels. There are always trade-offs, and choices need to be made. With only one mechanism available, most applications would just have to do with a solution that is probably not optimal for them. While accepting that there is no single tool suitable for every use, we think that there is something to be said for a moderate "Swiss army knife" style approach. It would be nice if a server can cater effectively to more than one type of application. Fundamentally, different storage media call for a different approach. A hard disk has characteristics which differ wildly from RAM, for instance. In a nutshell, a hard disk can generally contain more data, but getting to it takes longer. RAM is very fast, but there is a limited supply of it. Some search algorithms are optimized for RAM, others are optimized for disk-based storage. And did you know that a Compact Flash card uses much more power when reading data? That is an issue that definitely needs to be considered for an embedded application. Who knows what other new technologies we will see in the future. MySQL's storage engine architecture addresses all these aspects, and not by accident. It was a deliberate design choice by Michael "Monty" Widenius, MySQL AB's CTO. Let us look at a simplified high-level diagram of the MySQL server architecture:

Created by Amit S

The diagram shows four storage engines, each with different characteristics:

MyISAM is a disk based storage engine. Aiming for very low overhead, it does not support transactions. InnoDB is also disk based, but offers versioned, fully ACID transactional capabilities. InnoDB requires more disk space than MyISAM to store its data, and this increased overhead is compensated by more aggressive use of memory caching, in order to attain high speeds. Memory (formerly called "HEAP") is a storage engine that utilizes only RAM. Special algorithms are used that make optimal use of this environment. It is very fast. NDB, the MySQL Cluster Storage engine, connects to a cluster of nodes, offering high availability through redundancy, high performance through fragmentation (partitioning) of data across multiple node groups, and excellent scalability through the combination of these two. NDB uses main-memory only, with logging to disk.

One of the things that differs per storage engine is the locking and isolation mechanism, but most of the server operates in the same way no matter what storage engine is used: all the usual SQL commands are independent of the storage engine. Naturally, the optimizer may need to make different choices depending on the storage engine, but this is all handled through a standardized interface (API) which each storage engine supports. So to a degree, the application does not need to know how its data is stored. And it may not matter either, when the demands are not very high. But for a larger dataset, or with more demanding access requirements, it does become increasingly important to make a conscious choice. And the

Created by Amit S best news is that an application can use multiple storage engines, as the selection can be made on a per-table basis. Also, the server can convert tables between the different formats using a simple ALTER TABLE command.

Default Storage Engine If you use CREATE TABLE without specifying the ENGINE=... option, the server will use the default. The default storage engine is MyISAM. If you want to change the default to say InnoDB, you can use the configuration directive --default-storage-engine=InnoDB. Something to be aware of is that if you create a table specifying an engine type that is not enabled, MySQL will automatically fall back to the default. From MySQL 4.1, a warning is issued.

Query Processing
The query processing steps: 1. Parser (builds tree) 2. Preprocessor (checks syntax, columns) 3. Optimizer (generates query execution plan) o query transformation o search for optimal execution plan o plan is refined 4. Query sent to execution engine A query has only a few pre-defined operations, which eases the task of processing a query:

access methods (whether table scan or index) where conditions joins union, group, etc

MySQL uses a left-deep linear plan for executing a query. All of the tables fall into a single line. Many other systems use the bushy plan, which is more tree-like.

Created by Amit S Timour shows a large query with 5 or 6 WHERE conditions and steps through the process of how the query is parsed. In optimizing a SQL statement there is quite a bit of analysis of the cost of a query. The cost is calculated by looking at things like how many times the disk will need accessed, the number of pages per table, the length of the rows and keys and the data schema (key uniqueness etc). Determining costs involves mathematical operations to determine the cost using different methods. The type of storage engine isn't considered in the cost. MySQL 5.0 has greedy searching. It doesn't consider everything, just gets enough information to find a good path and then moves on.

User Interfaces for MySQL- EMS MySQL Manager

Full support of MySQL versions from 3.23 to 5.06 New state-of-the-art graphical user interface Rapid database management and navigation Simple management of all MySQL objects Advanced data manipulation tools Powerful security management Excellent visual and text tools for query building Impressive data export and import capabilities Easy-to-use wizards performing MySQL services

Created by Amit S


Billing Management Compliance & Risk Management Customer Relationship Management (CRM) Demand Chain Management (DRM) Education

Created by Amit S

Enterprise Content Management (ECM) Enterprise Information Portal (EIP) Enterprise Resources Planning (ERP) Financials Government Healthcare Human Resources Management (HRMS) Inventory Management Manufacturing Messaging & Collaboration Order Management Payroll Management Point of Sale (POS) Project Management Purchasing Management Retail Supply Chain Management (SCM)

MySQL Usage