Anda di halaman 1dari 64

1. Entity relationship model and explain all three levels of E-R Diagram?

An Entity Relationship model (ER model) is an abstract way to describe a database. It is a visual representation of different data using conventions that describe how these data are related to each other. There are three basic elements in ER models:

Entities are the things about which we seek information. Attributes are the data we collect about the entities. Relationships provide the structure needed to draw information from multiple entities.

Symbols used in E-R Diagram:

Entity rectangle Attribute -oval Relationship diamond Link - line

Entities and Attributes Entity Type: It is a set of similar objects or a category of entities that are well defined

A rectangle represents an entity set Ex: students, courses We often just say entity and mean entity type

Attribute: It describes one aspect of an entity type; usually [and best when] single valued and indivisible (atomic)

Represented by oval on E-R diagram Ex: name, maximum enrollment

Types of Attribute: Simple and Composite Attribute Simple attribute that consist of a single atomic value.A simple attribute cannot be subdivided. For example the attributes age, sex etc are simple attributes. A composite attribute is an attribute that can be further subdivided. For example the attribute ADDRESS can be subdivided into street, city, state, and zip code. Simple Attribute: Attribute that consist of a single atomic value. Example: Salary, age etc Composite Attribute : Attribute value not atomic. Example : Address : House_no:City:State Name : First Name: Middle Name: Last Name

Single Valued and Multi Valued attribute A single valued attribute can have only a single value. For example a person can have only one date of birth, age etc. That is a single valued attributes can have only single value. But it can be simple or composite attribute.That is date of birth is a composite attribute , age is a simple attribute. But both are single valued attributes. Multivalued attributes can have multiple values. For instance a person may have multiple phone numbers,multiple degrees etc.Multivalued attributes are shown by a double line connecting to the entity in the ER diagram. Single Valued Attribute: Attribute that hold a single value Example1: Age Exampe 2: City Example 3: Customer id Multi Valued Attribute: Attribute that hold multiple values. Example1: A customer can have multiple phone numbers, email ids etc Example 2: A person may have several college degrees Stored and Derived Attributes The value for the derived attribute is derived from the stored attribute. For example Date of birth of a person is a stored attribute. The value for the attribute AGE can be derived by subtracting the Date of Birth(DOB) from the current date. Stored attribute supplies a value to the related attribute. Stored Attribute: An attribute that supplies a value to the related attribute. Example: Date of Birth 2

Derived Attribute: An attribute thats value is derived from a stored attribute. Example : age, and its value is derived from the stored attribute Date of Birth. Keys Super key: An attribute or set of attributes that uniquely identifies an entitythere can be many of these Composite key:A key requiring more than one attribute Candidate key: a superkey such that no proper subset of its attributes is also a superkey (minimal superkey has no unnecessary attributes) Primary key: The candidate key chosen to be used for identifying entities and accessing records. Unless otherwise noted key means primary key Alternate key: A candidate key not used for primary key Secondary key: Attribute or set of attributes commonly used for accessing records, but not necessarily unique Foreign key: An attribute that is the primary key of another table and is used to establish a relationship with that table where it appears as an attribute also. Graphical Representation in E-R diagram

Rectangle Entity Ellipses Attribute (underlined attributes are [part of] the primary key) Double ellipses multi-valued attribute Dashed ellipses derived attribute, e.g. age is derivable from birthdate and current date. Relationships Relationship: connects two or more entities into an association/relationship

John majors in Computer Science

Relationship Type: set of similar relationships

Student (entity type) is related to Department (entity type) by MajorsIn (relationship type).

Relationship Types may also have attributes in the E-R model. When they are mapped to the relational model, the attributes become part of the relation. Represented by a diamond on E-R diagram. Cardinality of Relationships Cardinality is the number of entity instances to which another entity set can map under the relationship. This does not reflect a requirement that an entity has to participate in a relationship. Participation is another concept. One-to-one: X-Y is 1:1 when each entity in X is associated with at most one entity in Y, and each entity in Y is associated with at most one entity in X. One-to-many: X-Y is 1:M when each entity in X can be associated with many entities in Y, but each entity in Y is associated with at most one entity in X. Many-to-many: X:Y is M:M if each entity in X can be associated with many entities in Y, and each entity in Y is associated with many entities in X (many =>one or more and sometimes zero)

Relationship Participation Constraints Total participation

Every entity

member set in

of must the

participate relationship

Represented by double line from entity rectangle to relationship diamond

E.g., A Class entity cannot exist unless related to a Faculty member entity in this example, not necessarily at Juniata. You can set this double line in Dia In a relational model we will use the references clause. If every entity participates in exactly one relationship, both a total participation and a key constraint hold E.g., if a class is taught by only one faculty member. Not every entity instance must participate Represented by single line from entity rectangle to relationship diamond E.g., A Textbook entity can exist without being related to a Class or vice versa.

Key constraint

Partial participation

Strong and Weak Entities Strong Entity Vs Weak Entity An entity set that does not have sufficient attributes to form a primary key is termed as a weak entity set. An entity set that has a primary key is termed as strong entity set. A weak entity is existence dependent. That is the existence of a weak entity depends on the existence of a identifying entity set. The discriminator (or partial key) is used to identify other attributes of a weak entity set.The primary key of a weak entity set is formed by primary key of identifying entity set and the discriminator of weak entity set. The existence of a weak entity is indicated by a double rectangle in the ER diagram. We underline the discriminator of a weak entity set with a dashed line in the ER diagram.

2. Make a ER Diagram of Library Management System? (all three levels) Library management System (LMS) provides a simple GUI (graphical user interface) for the Library Staff to manage the functions of the library effectively. Usually when a book is returned or issued, it is noted down in a register after which data entry is done to update the status of the books in a moderate scale. This process takes some time and proper updation cannot be guaranteed. Such anomalies in the updation process can cause loss of books. So a more user friendly interface which could update the database instantly, has a great demand in libraries. E-R Diagram for LMS:

3. Explain decision table and its parts? Make a decision table of a report card A decision table is an excellent tool to use in both testing and requirements management. Essentially it is a structured exercise to formulate requirements when dealing with complex business rules. Decision tables are used to model complicated logic. They can make it easy to see that all possible combinations of conditions have been considered and when conditions are missed, it is easy to see this. A decision table is a good way to deal with combinations of things (e.g. inputs). This technique is sometimes also referred to as a cause-effect table. The reason for this is that there is an associated logic diagramming technique called cause-effect graphing which was sometimes used to help derive the decision table (Myers describes this as a combinatorial logic network. However, most people find it more useful just to use the table described in.

Decision tables provide a systematic way of stating complex business rules, which is useful for developers as well as for testers. Decision tables can be used in test design whether or not they are used in specifications, as they help testers explore the effects of combinations of different inputs and other software states that must correctly implement business rules.

It helps the developers to do a better job can also lead to better relationships with them. Testing combinations can be a challenge, as the number of combinations can often be huge. Testing all combinations may be impractical if not impossible. We have to be satisfied with testing just a small subset of combinations but making the choice of which combinations to test and which to leave out is also important. If you do not have a systematic way of selecting combinations, an arbitrary subset will be used and this may well result in an ineffective test effort. The four quadrants Conditions Condition alternatives Actions Action entries

Each decision corresponds to a variable, relation or predicate whose possible values are listed among the condition alternatives. Each action is a procedure or operation to perform, and the entries specify whether (or in what order) the action is to be performed for the set of condition alternatives the entry corresponds to. Many decision tables include in their condition alternatives the don't care symbol, a hyphen. Using don't cares can simplify

decision tables, especially when a given condition has little influence on the actions to be performed. In some cases, entire conditions thought to be important initially are found to be irrelevant when none of the conditions influence which actions are performed. Aside from the basic four quadrant structure, decision tables vary widely in the way the condition alternatives and action entries are represented. Some decision tables use simple true/false values to represent the alternatives to a condition (akin to if-then-else), other tables may use numbered alternatives (akin to switch-case), and some tables even use fuzzy logic or probabilistic representations for condition alternatives. In a similar way, action entries can simply represent whether an action is to be performed (check the actions to perform), or in more advanced decision tables, the sequencing of actions to perform (number the actions to perform).


4. Explain various types of cohesion and coupling along with the Diagram? In software engineering, coupling or dependency is the degree to which each program module relies on each one of the other modules. Coupling is usually contrasted with cohesion. Low coupling often correlates with high cohesion, and vice versa. The software quality metrics of coupling and cohesion were invented by Larry Constantine, an original developer of Structured Design who was also an early proponent of these concepts (see also SSADM). Low coupling is often a sign of a well-structured computer system and a good design, and when combined with high cohesion, supports the general goals of high readability and maintainability. In computer programming, cohesion refers to the degree to which the elements of a module belong together. Thus, it is a measure of how strongly related each piece of functionality expressed by the source code of a software module is. Cohesion is an ordinal type of measurement and is usually expressed as high cohesion or low cohesion when being discussed. Modules with high cohesion tend to be preferable because high cohesion is associated with several desirable traits of software including robustness, reliability, reusability, and understandability whereas low cohesion is associated with undesirable traits such as being difficult to maintain, difficult to test, difficult to reuse, and even difficult to understand. Cohesion is often contrasted with coupling, a different concept. High cohesion often correlates with loose coupling, and vice versa. The software quality metrics of coupling and cohesion were invented by Larry Constantine based on characteristics of good programming practices that reduced maintenance and modification costs. Types of coupling


Conceptual model of coupling Coupling can be "low" (also "loose" and "weak") or "high" (also "tight" and "strong"). Some types of coupling, in order of highest to lowest coupling, are as follows: Procedural programming A module here refers to a subroutine of any kind, i.e. a set of one or more statements having a name and preferably its own set of variable names. Content coupling (high) Content coupling (also known as Pathological coupling) occurs when one module modifies or relies on the internal workings of another module (e.g., accessing local data of another module). Therefore changing the way the second module produces data (location, type, timing) will lead to changing the dependent module. Common coupling Common coupling (also known as Global coupling) occurs when two modules share the same global data (e.g., a global variable). Changing the shared resource implies changing all the modules using it. External coupling External coupling occurs when two modules share an externally imposed data format, communication protocol, or device interface. This is basically related to the communication to external tools and devices. Control coupling Control coupling is one module controlling the flow of another, by passing it information on what to do (e.g., passing a what-to-do flag). Stamp coupling (Data-structured coupling) Stamp coupling occurs when modules share a composite data structure and use only a part of it, possibly a different part (e.g., passing a whole record to a function that only needs one field of it). This may lead to changing the way a module reads a record because a field that the module does not need has been modified. Data coupling Data coupling occurs when modules share data through, for example, parameters. Each datum is an elementary piece, and these are the only data shared (e.g., passing an integer to a function that computes a square root). 12

Message coupling (low) This is the loosest type of coupling. It can be achieved by state decentralization (as in objects) and component communication is done via parameters or message passing (see Message passing). No coupling Modules do not communicate at all with one another. Object-oriented programming Subclass Coupling Describes the relationship between a child and its parent. The child is connected to its parent, but the parent is not connected to the child. Temporal coupling When two actions are bundled together into one module just because they happen to occur at the same time. In recent work various other coupling concepts have been investigated and used as indicators for different modularization principles used in practice. Disadvantages Tightly coupled systems tend to exhibit the following developmental characteristics, which are often seen as disadvantages: 1. A change in one module usually forces a ripple effect of changes in other modules. 2. Assembly of modules might require more effort and/or time due to the increased inter-module dependency. 3. A particular module might be harder to reuse and/or test because dependent modules must be included. Performance issues Whether loosely or tightly coupled, a system's performance is often reduced by message and parameter creation, transmission, translation (e.g. marshaling) and message interpretation (which might be a reference to a string, array or data structure), which require less overhead than creating a complicated message such as a SOAP message. Longer messages require more CPU and memory to produce. To optimize runtime performance, message length must be minimized and message meaning must be maximized.


Message Transmission Overhead and Performance Since a message must be transmitted in full to retain its complete meaning, message transmission must be optimized. Longer messages require more CPU and memory to transmit and receive. Also, when necessary, receivers must reassemble a message into its original state to completely receive it. Hence, to optimize runtime performance, message length must be minimized and message meaning must be maximized. Message Translation Overhead and Performance Message protocols and messages themselves often contain extra information (i.e., packet, structure, definition and language information). Hence, the receiver often needs to translate a message into a more refined form by removing extra characters and structure information and/or by converting values from one type to another. Any sort of translation increases CPU and/or memory overhead. To optimize runtime performance, message form and content must be reduced and refined to maximize its meaning and reduce translation. Message Interpretation Overhead and Performance All messages must be interpreted by the receiver. Simple messages such as integers might not require additional processing to be interpreted. However, complex messages such as SOAP messages require a parser and a string transformer for them to exhibit intended meanings. To optimize runtime performance, messages must be refined and reduced to minimize interpretation overhead. Solutions One approach to decreasing coupling is functional design, which seeks to limit the responsibilities of modules along functionality, coupling increases between two classes A and B if:

A has an attribute that refers to (is of type) B. A calls on services of an object B. A has a method that references B (via return type or parameter). A is a subclass of (or implements) class B.

Low coupling refers to a relationship in which one module interacts with another module through a simple and stable interface and does not need to be concerned with the other module's internal implementation (see Information Hiding).


Systems such as CORBA or COM allow objects to communicate with each other without having to know anything about the other object's implementation. Both of these systems even allow for objects to communicate with objects written in other languages. Coupling versus Cohesion Coupling and Cohesion are terms which occur together very frequently. Coupling refers to the interdependencies between modules, while cohesion describes how related are the functions within a single module. Low cohesion implies that a given module performs tasks which are not very related to each other and hence can create problems as the module becomes large. Module coupling Coupling in Software Engineering describes a version of metrics associated with this concept. For data and control flow coupling:

di: number of input data parameters ci: number of input control parameters do: number of output data parameters co: number of output control parameters gd: number of global variables used as data gc: number of global variables used as control w: number of modules called (fan-out) r: number of modules calling the module under consideration (fan-in)

For global coupling:

For environmental coupling:

Coupling(C) makes the value larger the more coupled the module is. This number ranges from approximately 0.67 (low coupling) to 1.0 (highly coupled) For example, if a module has only a single input and output data parameter

If a module has 5 input and output data parameters, an equal number of control parameters, and accesses 10 items of global data, with a fan-in of 3 and a fan-out of 4,


COUPLING An indication of the strength of interconnections between program units. Highly coupled have program units dependent on each other. Loosely coupled are made up of units that are independent or almost independent. Modules are independent if they can function completely without the presence of the other. Obviously, can't have modules completely independent of each other. Must interact so that can produce desired outputs. The more connections between modules, the more dependent they are in the sense that more info about one modules is required to understand the other module. Three factors: number of interfaces, complexity of interfaces, type of info flow along interfaces. Want to minimize number of interfaces between modules, minimize the complexity of each interface, and control the type of info flow. An interface of a module is used to pass information to and from other modules. In general, modules tightly coupled if they use shared variables or if they exchange control info. Loose coupling if info held within a unit and interface with other units via parameter lists. Tight coupling if shared global data. If need only one field of a record, don't pass entire record. Keep interface as simple and small as possible. Two types of info flow: data or control.

Passing or receiving back control info means that the action of the module will depend on this control info, which makes it difficult to understand the module. Interfaces with only data communication result in lowest degree of coupling, followed by interfaces that only transfer control data. Highest if data is hybrid.

Ranked highest to lowest: 1. Content coupling: if one directly references the contents of the other. When one module modifies local data values or instructions in another module. (can happen in assembly language) if one refers to local data in another module. if one branches into a local label of another.


2. Common coupling: access to global data. modules bound together by global data structures. 3. Control coupling: passing control flags (as parameters or globals) so that one module controls the sequence of processing steps in another module. 4. Stamp coupling: similar to common coupling except that global variables are shared selectively among routines that require the data. E.g., packages in Ada. More desirable than common coupling because fewer modules will have to be modified if a shared data structure is modified. Pass entire data structure but need only parts of it. 5. Data coupling: use of parameter lists to pass data items between routines. COHESION Measure of how well module fits together. A component should implement a single logical function or single logical entity. All the parts should contribute to the implementation. Many levels of cohesion: 1. Coincidental cohesion: the parts of a component are not related but simply bundled into a single component. harder to understand and not reusable. 2. Logical association: similar functions such as input, error handling, etc. put together. Functions fall in same logical class. May pass a flag to determine which ones executed. interface difficult to understand. Code for more than one function may be intertwined, leading to severe maintenance problems. Difficult to reuse 3. Temporal cohesion: all of statements activated at a single time, such as start up or shut down, are brought together. Initialization, clean up. Functions weakly related to one another, but more strongly related to functions in other modules so may need to change lots of modules when do maintenance. 4. Procedural cohesion: a single control sequence, e.g., a loop or sequence of decision statements. Often cuts across functional lines. May contain only part of a complete function or parts of several functions. Functions still weakly connected, and again unlikely to be reusable in another product.


5. Communicational cohesion: operate on same input data or produce same output data. May be performing more than one function. Generally acceptable if alternate structures with higher cohesion cannot be easily identified. still problems with reusability. 6. Sequential cohesion: output from one part serves as input for another part. May contain several functions or parts of different functions. 7. Informational cohesion: performs a number of functions, each with its own entry point, with independent code for each function, all performed on same data structure. Different than logical cohesion because functions not intertwined. 8. Functional cohesion: each part necessary for execution of a single function. e.g., compute square root or sort the array. Usually reusable in other contexts. Maintenance easier. 9. Type cohesion: modules that support a data abstraction. Not strictly a linear scale. Functional much stronger than rest while first two much weaker than others. Often many levels may be applicable when considering two elements of a module. Cohesion of module considered as highest level of cohesion that is applicable to all elements in the module.


5. Explain project selection technique and data dictionary with the help of example? One of the biggest decisions that any organization would have to make is related to the projects they would undertake. Once a proposal has been received, there are numerous factors that need to be considered before an organization decides to take it up. The most viable option needs to be chosen, keeping in mind the goals and requirements of the organization. How is it then that you decide whether a project is viable? How do you decide if the project at hand is worth approving? This is where project selection methods come in use. Choosing a project using the right method is therefore of utmost importance. This is what will ultimately define the way the project is to be carried out. But the question then arises as to how you would go about finding the right methodology for your particular organization. At this instance, you would need careful guidance in the project selection criteria, as a small mistake could be detrimental to your project as a whole, and in the long run, the organization as well. Selection Methods There are various project selection methods practised by the modern business organizations. These methods have different features and characteristics. Therefore, each selection method is best for different organizations. Although there are many differences between these project selection methods, usually the underlying concepts and principles are the same. Following is an illustration of two of such methods (Benefit Measurement and Constrained Optimization methods):


As the value of one project would need to be compared against the other projects, you could use the benefit measurement methods. This could include various techniques, of which the following are the most common:

You and your team could come up with certain criteria that you want your ideal project objectives to meet. You could then give each project scores based on how they rate in each of these criteria and then choose the project with the highest score.

When it comes to the Discounted Cash flow method, the future value of a project is ascertained by considering the present value and the interest earned on the money. The higher the present value of the project, the better it would be for your organization.

The rate of return received from the money is what is known as the IRR. Here again, you need to be looking for a high rate of return from the project.

The mathematical approach is commonly used for larger projects. The constrained optimization methods require several calculations in order to decide on whether or not a project should be rejected. Cost-benefit analysis is used by several organizations to assist them to make their selections. Going by this method, you would have to consider all the positive aspects of the project which are the benefits and then deduct the negative aspects (or the costs) from the benefits. Based on the results you receive for different projects, you could choose which option would be the most viable and financially rewarding. These benefits and costs need to be carefully considered and quantified in order to arrive at a proper conclusion. Questions that you may want to consider asking in the selection process are:

Would this decision help me to increase organizational value in the long run? How long will the equipment last for? Would I be able to cut down on costs as I go along?

In addition to these methods, you could also consider choosing based on opportunity cost When choosing any project, you would need to keep in mind the profits that you would make if you decide to go ahead with the project. Profit optimization is therefore the ultimate goal. You need to consider the difference between the profits of the project you are primarily interested in and the next best alternative. 20

Implementation of the Chosen Method: The methods mentioned above can be carried out in various combinations. It is best that you try out different methods, as in this way you would be able to make the best decision for your organization considering a wide range of factors rather than concentrating on just a few. Careful consideration would therefore need to be given to each project. Conclusion: In conclusion, you would need to remember that these methods are time-consuming, but are absolutely essential for efficient business planning. It is always best to have a good plan from the inception, with a list of criteria to be considered and goals to be achieved. This will guide you through the entire selection process and will also ensure that you do make the right choice. A data dictionary is a collection of data about data. It maintains information about the defintion, structure, and use of each data element that an organization uses. There are many attributes that may be stored about a data element. Typical attributes used in CASE tools (Computer Assisted Software Engineering) are: Name Aliases or synonyms Default label Description Source(s) Date of origin Users Programs in which used Change authorizations Access authorization Data type Length Units(cm., degrees C, etc.) Range of values Frequency of use Input/output/local Conditional values Parent structure 21

Subsidiary structures Repetitive structures Physical location: record, file, data base A data dictionary is invaluable for documentation purposes, for keeping control information on corporate data, for ensuring consistency of elements between organizational systems, and for use in developing databases. Data dictionary software packages are commercially available, often as part of a CASE package or DBMS. DD software allows for consistency checks and code generation. It is also used in DBMSs to generate reports. The term data dictionary and data repository are used to indicate a more general software utility than a catalogue. A catalogue is closely coupled with the DBMS software. It provides the information stored in it to the user and the DBA, but it is mainly accessed by the various software modules of the DBMS itself, such as DDL and DML compilers, the query optimiser, the transaction processor, report generators, and the constraint enforcer. On the other hand, a data dictionary is a data structure that stores metadata, i.e., (structured) data about data. The software package for a stand-alone data dictionary or data repository may interact with the software modules of the DBMS, but it is mainly used by the designers, users and administrators of a computer system for information resource management. These systems are used to maintain information on system hardware and software configuration, documentation, application and users as well as other information relevant to system administration. If a data dictionary system is used only by the designers, users, and administrators and not by the DBMS Software, it is called a passive data dictionary. Otherwise, it is called an active data dictionary or data dictionary. When a passive data dictionary is updated, it is done so manually and independently from any changes to a DBMS (database) structure. With an active data dictionary, the dictionary is updated first and changes occur in the DBMS automatically as a result. Database users and application developers can benefit from an authoritative data dictionary document that catalogs the organization, contents, and conventions of one or more databases. This typically includes the names and descriptions of various tables (records or Entities) and their contents (fields) plus additional details, like the type and length of each data element. Another important piece of information that a data dictionary can provide is the relationship between Tables. This is sometimes referred to in Entity22

Relationship diagrams, or if using Set descriptors, identifying in which Sets database Tables participate. In an active data dictionary constraints may be placed upon the underlying data. For instance, a Range may be imposed on the value of numeric data in a data element (field), or a Record in a Table may be FORCED to participate in a set relationship with another Record-Type. Additionally, a distributed DBMS may have certain location specifics described within its active data dictionary (e.g. where Tables are physically located). The data dictionary consists of record types (tables) created in the database by systems generated command files, tailored for each supported back-end DBMS. Command files contain SQL Statements for CREATE TABLE, CREATE UNIQUE INDEX, ALTER TABLE (for referential integrity), etc., using the specific statement required by that type of database. There is no universal standard as to the level of detail in such a document. Middleware In the construction of database applications, it can be useful to introduce an additional layer of data dictionary software, i.e. middleware, which communicates with the underlying DBMS data dictionary. Such a "high-level" data dictionary may offer additional features and a degree of flexibility that goes beyond the limitations of the native "low-level" data dictionary, whose primary purpose is to support the basic functions of the DBMS, not the requirements of a typical application. For example, a high-level data dictionary can provide alternative entity-relationship models tailored to suit different applications that share a common database. Extensions to the data dictionary also can assist in query optimization against distributed databases. Additionally, DBA functions are often automated using restructuring tools that are tightly coupled to an active data dictionary. Software frameworks aimed at rapid application development sometimes include highlevel data dictionary facilities, which can substantially reduce the amount of programming required to build menus, forms, reports, and other components of a database application, including the database itself. For example, PHPLens includes a PHP class library to automate the creation of tables, indexes, and foreign key constraints portably for multiple databases. Another PHP-based data dictionary, part of the RADICORE toolkit, automatically generates program objects, scripts, and SQL code for menus and forms with data validation and complex joins. For the ASP.NET environment, Base One's data 23

dictionary provides cross-DBMS facilities for automated database creation, data validation, performance enhancement (caching and index utilization), application security, and extended data types. Visual DataFlex features provides the ability to use DataDictionaries as class files to form middle layer between the user interface and the underlying database. The intent is to create standardized rules to maintain data integrity and enforce business rules throughout one or more related applications. Platform-specific examples Data description specifications (DDS) allow the developer to describe data attributes in file descriptions that are external to the application program that processes the data, in the context of an IBM System i. The table below is an example of a typical data dictionary entry. The IT staff uses this to develop and maintain the database. Field Name CustomerID Title Data Type Autonumber Text Other information Primary key field Lookup: Mr, Mrs, Miss, Ms Field size 4 Field size 15 Indexed Field size 15 Format: Medium Date Range check: >=01/01/1930 Field size: 12 Presence check

Surname FirstName DateOfBirth

Text Text Date/Time




6. Explain data flow diagram and pseudo codes with the difference between physical DFD and logical DFD any five points? To understand the differences between a physical and logical DFD, we need to know what DFD is. A DFD stands for data flow diagram and it helps in representing graphically the flow of data in an organization, particularly its information system. A DFD enables a user to know where information comes in, where it goes inside the organization and how it finally leaves the organization. DFD does give information about whether the processing of information takes place sequentially or if it is processed in a parallel fashion. There are two types of DFDs known as physical and logical DFD. Though both serve the same purpose of representing data flow, there are some differences between the two that will be discussed in this article. Any DFD begins with an overview DFD that describes in a nutshell the system to be designed. A logical data flow diagram, as the name indicates concentrates on the business and tells about the events that take place in a business and the data generated from each such event. A physical DFD, on the other hand is more concerned with how the flow of information is to be represented. It is a usual practice to use DFDs for representation of logical data flow and processing of data. However, it is prudent to evolve a logical DFD after first developing a physical DFD that reflects all the persons in the organization performing various operations and how data flows between all these persons. What is the difference between Physical DFD and Logical DFD? While there is no restraint on the developer to depict how the system is constructed in the case of logical DFD, it is necessary to show how the system has been constructed. There are certain features of logical DFD that make it popular among organizations. A logical DFD makes it easier to communicate for the employees of an organization, leads to more stable systems, allows for better understanding of the system by analysts, is flexible and easy to maintain, and allows the user to remove redundancies easily. On the other hand, a physical DFD is clear on division between manual and automated processes, gives detailed description of processes, identifies temporary data stores, and adds more controls to make the system more efficient and simple. Data Flow Diagrams (DFDs) are used to show the flow of data through a system in terms of the inputs, processes, and outputs.


External Entities Data either comes from or goes to External Entities. They are either the source or destination (sometimes called a source or sink) of data, which is considered to be external to the system. It could be people or groups that provide or input data to the system or who receive data from the system Defined by an oval see below. Identified by a noun. External Entities are not part of the system but are needed to provide sources of data used by the system. Fig 1 below shows an example of an External Entity


Fig 1 External Entity Processes and Data Flows Data passed to, or from an External Entity must be processed in some way. The passing of data (flow of data) is shown on the DFD as an arrow. The direction of the arrow defines the direction of the flow of data. All data flows to and from External Entities to Processes and vice versa need to be named. Fig 2 below shows an example of a data flow: Customer details Fig 2 Data Flow Process processing data that emanates from external entities or data stores. The process could be manual, mechanised, or automated/computed. A data process will use or alter the data in some way. Identified from a scenario by a verb or action. Each process is given a unique number and is also given a name. An example of a Process is shown in Fig 3 below: 1 Add New Customer

Fig 3 - Process


Data Stores A Data Store is a held and receives through data flows. Entity Entity No Process Yes Store No point where data is or provides data Examples of data

stores are transaction records, data files, reports, and documents. Could be a filing cabinet or magnetic media. Data stores are named in the singular and numbered. A manual store such as a filing cabinet is numbered with an M prefix. A D is used as a prefix for an electronic store such as a relational table. An example of an electronic data store is shown in Fig 4 Customer below

Fig 4 Data Store Rules There are certain rules that must be applied when drawing DFDs. These are explained below:

An external entity cannot be connected to another external entity by a data flow An external entity cannot be connected directly to a data store An external entity must pass data to, or receive data from a process using a data flow A data store cannot be directly connected to another data store A data store cannot be directly connected to an external entity A data store can pass data to, or receive data from a process A process can pass data to and receive data from another process Data must flow from external entity to a process and then be passed onto anther process or a data store

A matrix for the above rules is show in Fig 5 below

Process Store

Yes No

Yes Yes

Yes No

Fig 5 ERD Rules


There are different levels of DFDs depending on the level of detail shown Level 0 or context diagram The context diagram shows the top-level process, the whole system, as a single process rectangle. It shows all external entities and all data flows to and from the system. Analysts draw the context diagram first to show the high-level processing in a system. An example of a Context Diagram is shown in Fig 6 below:


customer details new car details Customer Order Details

Management Bilbos Car Sales

invoice details

monthly report details

staff details updated customer details Management


Fig 6 Context Diagram for a Car Sales System Level 1 DFD This level of DFD shows all external entities that are on the context diagram, all the highlevel processes and all data stores used in the system. Each high-level process may contain sub-processes. These are shown on lower level DFDs.


A Level 1 DFD for the Car Sales scenario is shown in Fig 7 below:
1 Add New Customer 2 Create Monthly Sales Report monthly report details * Management

customer details Customer

D3 customer details customer details customer details sales details


Customer updated customer details customer details D1 Customer D3 5 Update Customer * Sales sales details updated customer details customer details D2 Car customer details sales details invoice details car details D1 Customer

car details car details 3 Add New Sale Customer Order Details 4 staff details Add New Car Details new car details * Management car details 6 Create Customer Invoice

D4 Customer


staff details 7 Add Staff Details * staff details Management

Fig 7 Level 1 DFD for a Car Sales System


Level 2 DFDs Each Level 1 DFD process may contain further internal processes. These are shown on the Level 2 DFD. The numbering system used in the Level 1 DFD is continued and each process in the Level 2 DFD is prefixed by the Level 1 DFD number followed by a unique number for each process i.e. for process 1, sub processes 1.1, 1.2, 1.3 etc see fig 8 below

3 Customer

Add New Sale 3.3 3.1 Validate Order * validated order details validated staff dets 3.2 Generate New Sale D4 * Staff staff details Add staff to order

Customer Order Details

customer details car details



car details

sales details





Fig 8 Level 2 DFD for Level 1 Process Add New Sale Each of the Level 2 DFDs could also have sub-processes and could be decomposed further into lower level DFDs i.e. 1.1.1, 1.1.2, 1.1.3 etc More than 3 levels for a DFD would become unmanageable. Lowest Level DFDs and Process Specification Once the DFD has been decomposed into its lowest level, each of the lower level DFDs can be described using pseudo-code (structured English), flow chart or similar process specification method that can be used by a programmer to code each process or function. For example, the Level 2 DFD for the Add New Sale process could be described as being a process that contains 3 sub-processes, Validate Order, Add Staff to Order and Generate New Sale. The structured English could be written thus: Open Customer File If existing customer Check Customer Details Else Add customer details


End If Open Car File If car available then Open Sale File Add customer to sale Set car to unavailable Add car to sale Add staff details Calculate price Generate Invoice Close Sale File Close Customer File Close Car File Inform User of successful sale exit process Else Inform User of problem exit process Close Customer File Close Car File End If The above example is not carved in stone as the analyst may decide to write separate functions to validate customer and car details and that the Generate New Sale process could include other sub-processes. All that matters is that the underlying processing logic solves the problem. For example, if you look at Figure 8 there is a process named Validate Order, which has a duel purpose of checking both the customer details (is customer a current customer, if not add to customer file) and the car details (is car available, if not stop the sale process). A separate process called Validate Order could be created, but I have written the structured English to show a logical sequence that shows that, only if the car is available do we begin the transaction of creating the sale. I have also assumed that the staff dealing with the sale will know their own details so there would not be a need for the process named Add Staff to Order. Like all analysis and design processes, the process of producing DFDs and writing structured English is an iterative process 31

7. Explain coding techniques and types of codes? It is required that information must be encoded into signals before it can be transported across communication media. In more precise words we may say that the waveform pattern of voltage or current used to represent the 1s and 0s of a digital signal on a transmission link is called digital to digital line encoding. There are different encoding schemes available: Digltal-to-Digltal Encoding It is the representation of digital information by a digital signal.

There are basically following types of digital to-digital encoding available like: Unipolar Polar Bipolar. Unipolar Unipolar encoding uses only one level of value 1 as a positive value and 0 remains Idle. Since unipolar line encoding has one of its states at 0 Volts, its also called Return to Zero (RTZ) as shown in Figure. A common example of unipolar line encoding is the 11'L logic levels used in computers and digital logic.

Unipolar encoding represents DC (Direct Current) component and therefore, ca.'1nottravel through media such as microwaves or transformers. It has low noise margin and needs extra hardware for synchronization purposes. It is well suited where the signal path is short. For long distances, it produces stray capacitance in the transmission medium and therefore, it never returns to zero as shown in Figure.


Polar Polar encoding uses two levels of voltages say positive and negative. For example, the RS:232D interface uses Polar line encoding. The signal does not return to zero; it is either a positive voltage or a negative voltage. Polar encoding may be classified as nonreturn to zero (NRZ), return to zero (RZ) and biphase. NRZ may be further divided into NRZL and NRZI. Biphase has also two different categories as Manchester and Differential Manchester encoding. Polar line encoding is the simplest pattern that eliminates most of the residua! DC problem. Figure shows the Polar line encoding. It has the same problem of synchronization as that of unipolar encoding. The added benefit of polar encoding is that it reduces the power required to transmit the signal by one-half.

Non-Return to Zero (NRZ) In NRZL, the level of the signal is 1 if the amplitude is positive and 0 in case of negative amplitude. In NRZI, whenever a positive amplitude or bit I appears in the signal, the signal gets inverted, Figure explains the concepts of NRZ-L and NRZI more precisely.


Return to Zero (RZ) RZ uses three values to represent the signal. These are positive, negative, and zero. Bit 1is represented when signal changes from positive to zero. Bit 0 is represented when signal changes from negative to zero. Figure explains the RZ concept.

Biphase Biphase is implemented in two different ways as Manchester and Differential Manchester encoding. In Manchester encoding, transition happens at the middle of each bit period. A low to high transition represents a 1 and a high to low transition represents a 0.In case of Differential Manchester encoding, transition occurs at the beginning of a bit time, which represents a zero. These encoding can detect errors during transmission because of the transition during every bit period. Therefore, the absence of a transition would indicate an error condition.


They have no DC component and there is always a transition available for synchronizing receives and transmits clocks. Bipolar Bipolar uses three voltage levels. These are positive, negative, and zero. Bit 0 occurs at zero level of amplitude. Bit 1 occurs alternatively when the voltage level is either positive or negative and therefore, also called as Alternate Mark Inversion (AMI). There is no DC component because of the alternate polarity of the pulses for Is. Figure describes bipolar encoding.

Analog to Digital Analog to digital encoding is the representation of analog information by a digital signal. These include PAM (Pulse Amplitude Modulation), and PCM (Pulse Code Modulation). Digital to Analog These include ASK (Amplitude Shift Keying), FSK (Frequency Shift Keying), PSK (Phase Shift Keying), QPSK (Quadrature Phase Shift Keying), are QAM (Quadrature Amplitude Modulation). Analog to Analog These are Amplitude modulation, Frequency modulation and Phase modulation techniques, Codecs (Coders and Decoders) Codec stands for coders/decompression in data communication. The reverse conversion of analog to digital is necessary in situations where it is advantageous to send analog information across a digital circuit. Certainly, this is often the case in carrier networks, where huge volumes of analog voice are digitized and sent across high capacity, digital circuits. The device that accomplishes the analog to digital conversion is known as a 35

codec. Codecs code an analog input into a digital format on the transmitting side of the connection, reversing the process, or decoding the information on the receiving side, in order to reconstitute the analog signal. Codecs are widely used to convert analog voice and video to digital format, and to reverse the process on the receiving end.


8. Explain algorithm with detect error module (eleven code) and module n code with the help of algorithm and examples In information theory and coding theory with applications in computer science and telecommunication, error detection and correction or error control are techniques that enable reliable delivery of digital data over unreliable communication channels. Many communication channels are subject to channel noise, and thus errors may be introduced during transmission from the source to a receiver. Error detection techniques allow detecting such errors, while error correction enables reconstruction of the original data. Error correction may generally be realized in two different ways:

Automatic repeat request (ARQ) (sometimes also referred to as backward error correction): This is an error control technique whereby an error detection scheme is combined with requests for retransmission of erroneous data. Every block of data received is checked using the error detection code used, and if the check fails, retransmission of the data is requested this may be done repeatedly, until the data can be verified.

Forward error correction (FEC): The sender encodes the data using an errorcorrecting code (ECC) prior to transmission. The additional information (redundancy) added by the code is used by the receiver to recover the original data. In general, the reconstructed data is what is deemed the "most likely" original data.

ARQ and FEC may be combined, such that minor errors are corrected without retransmission, and major errors are corrected via a request for retransmission: this is called hybrid automatic repeat-request (HARQ). Error detection is most commonly realized using a suitable hash function (or checksum algorithm). A hash function adds a fixed-length tag to a message, which enables receivers to verify the delivered message by recomputing the tag and comparing it with the one provided. There exists a vast variety of different hash function designs. However, some are of particularly widespread use because of either their simplicity or their suitability for detecting certain kinds of errors (e.g., the cyclic redundancy check's performance in detecting burst errors). Random-error-correcting codes based on minimum distance coding can provide a suitable alternative to hash functions when a strict guarantee on the minimum number of errors to be detected is desired. Repetition codes, described below, are special cases of error37

correcting codes: although rather inefficient, they find applications for both error correction and detection due to their simplicity. Repetition codes A repetition code is a coding scheme that repeats the bits across a channel to achieve error-free communication. Given a stream of data to be transmitted, the data is divided into blocks of bits. Each block is transmitted some predetermined number of times. For example, to send the bit pattern "1011", the four-bit block can be repeated three times, thus producing "1011 1011 1011". However, if this twelve-bit pattern was received as "1010 1011 1011" where the first block is unlike the other two it can be determined that an error has occurred. Repetition codes are very inefficient, and can be susceptible to problems if the error occurs in exactly the same place for each group (e.g., "1010 1010 1010" in the previous example would be detected as correct). The advantage of repetition codes is that they are extremely simple, and are in fact used in some transmissions of numbers stations. Parity bits A parity bit is a bit that is added to a group of source bits to ensure that the number of set bits (i.e., bits with value 1) in the outcome is even or odd. It is a very simple scheme that can be used to detect single or any other odd number (i.e., three, five, etc.) of errors in the output. An even number of flipped bits will make the parity bit appear correct even though the data is erroneous. Extensions and variations on the parity bit mechanism are horizontal redundancy checks, vertical redundancy checks, and "double," "dual," or "diagonal" parity (used in RAID-DP). Checksums A checksum of a message is a modular arithmetic sum of message code words of a fixed word length (e.g., byte values). The sum may be negated by means of a ones'-complement operation prior to transmission to detect errors resulting in all-zero messages. Checksum schemes include parity bits, check digits, and longitudinal redundancy checks. Some checksum schemes, such as the Damm algorithm, the Luhn algorithm, and the Verhoeff algorithm, are specifically designed to detect errors commonly introduced by humans in writing down or remembering identification numbers. Cyclic redundancy checks (CRCs) A cyclic redundancy check (CRC) is a single-burst-error-detecting cyclic code and nonsecure hash function designed to detect accidental changes to digital data in computer 38

networks. It is not suitable for detecting maliciously introduced errors. It is characterized by specification of a so-called generator polynomial, which is used as the divisor in a polynomial long division over a finite field, taking the input data as the dividend, and where the remainder becomes the result. Cyclic codes have favorable properties in that they are well suited for detecting burst errors. CRCs are particularly easy to implement in hardware, and are therefore commonly used in digital networks and storage devices such as hard disk drives. Even parity is a special case of a cyclic redundancy check, where the single-bit CRC is generated by the divisor x + 1. Cryptographic hash functions The output of a cryptographic hash function, also known as a message digest, can provide strong assurances about data integrity, whether changes of the data are accidental (e.g., due to transmission errors) or maliciously introduced. Any modification to the data will likely be detected through a mismatching hash value. Furthermore, given some hash value, it is infeasible to find some input data (other than the one given) that will yield the same hash value. If an attacker can change not only the message but also the hash value, then a keyed hash or message authentication code (MAC) can be used for additional security. Without knowing the key, it is infeasible for the attacker to calculate the correct keyed hash value for a modified message. Error-correcting codes Any error-correcting code can be used for error detection. A code with minimum Hamming distance, d, can detect up to d 1 errors in a code word. Using minimumdistance-based error-correcting codes for error detection can be suitable if a strict limit on the minimum number of errors to be detected is desired. Codes with minimum Hamming distance d = 2 are degenerate cases of error-correcting codes, and can be used to detect single errors. The parity bit is an example of a singleerror-detecting code. In digital data transmission, error occurs due to noise. The probability of error or bit error rate depends on the signal to noise ratio, the modulation type and the method of demodulation.


The bit error rate p, may be expressed in terms of

no of errors in N bits for l arg e N ( N ) N bits

For example, if p=0.1 we would expect on average there would be 1 error in every 10 bits. A p=0.1 actually stating that every bit has a 1/10th probability of being in error. Depending on the type of system and many factors, error rates typically range from 10-1 to 10-5 or better. Information transfer via digital system is usually packaged into a structure (a block of bits) called a message block or frame. A typical message block contains the following: Synchronization pattern to mark the start of message block Destination and sometimes source addresses System control/ commands Information Error control coding check bits

The total number of the bits in the block may vary widely ( from say 32 bits to several hundreds bits) depending on the requirement. Clearly, if the bits are subjected to an error rate p, there is some probability that a message block will be received with 1 or more bits in error. In order to counteract the effects of errors, error control coding techniques are used to either: a) detect errors error detection b) correct error error detection and correction Broadly, there are two types of error control codes: a) Block Codes Parity codes Array codes Repetition codes Cyclic codes etc b) Convolutional Codes



A block code is a coding technique which generates C check bits for M message bits to give a stand alone block of M+C= N bits.

The sync bits are usually not included in the error control coding because message synchronization must be achieved before the message and check bits can be processed. The code rate is given by Rate =

M M = M +C N

Where, M = number of message bits C = number of check bits N= M+C= total number of bits. The code rate is the measure of the proportion of free user assigned bits (M) to the total bits in the blocks (N). For example, i) A single parity bit (C=1 bit) applied to a block of 7 bits give a code rate R=
7 7 = 7 +1 8


A (7,4) Cyclic code has N=7, M=4 Code rate R =

4 7



A repetition-m code in which each bit or message is transmitted m times

and the receiver carries out a majority vote on each bit has a code rate 1 M = mM m

Rate =


Consider message transferred from a Source to a Destination, and assume that the
Destination is able to check the received messages and detect errors.

If no errors are detected, the Destination will accept the messages. If errors are detected, there are two forms of error corrections.
a) Automatic Retransmission Request (ARQ)

In ARQ system, the destination send an acknowledgment ACK message back to the source if the errors are not detected, and a Not-ACK (NAK) message back to the source if errors are detected. If the source receives an ACK to a message it will send the next message. If the source receives a NAK it repeats the same message. This process repeat until all the messages is accepted by the destination.


b) Forward Error Correction (FEC)

The error control code may be powerful enough to allow the destination to attempt to correct the errors by further processing. This is called Forward Error Correction, no ACKs or NAKs are required. Many systems are hybrid in that they use both ARQ (ACK/NAK) and FEC strategies for error correction.
Successful, False & Lost Message Transfer

The process of checking the received messages for errors gives two possible outcomes: a) Errors not detected messages accepted b) Errors detected messages re rejected An error not detected does not mean that errors are not present. Error control codes cannot detect every possible error or combinations of errors. However, if error are not detected the destination has not alternative but to accept the message, true or false. That is, we may conclude if errors are not detected either


a) that there were no errors, i.e. the messages accepted are true or in other words successful message transfer. b) that there were undetected errors, i.e. the messages accepted was false or in other words a false message transfer. If errors are detected, the destination does not accept the message and may either request a re-transmission (ARQ-system) or process the block further in an attempt to correct the error (FEC). In processing the block of error correction, again there are two possible outcomes. a) the processor may get it right, i.e. correct the error and give a successful message transfer. b) the processor may get it wrong, i.e. not correct the errors in which case there is a false message transfer. Some codes have a range of ability to detect and correct errors. For example a code may be able to detect and correct 1 error (single bit error) and detect 2,3 and 4 bits in error, but not correct them. Thus even with FEC, some messages may still be rejected and we think of these as lost messages. These ideas are illustrated below:



Consider message transfer between two computers e.g. it is required to transfer the contents of Computer A to Computer B.



As discussed, of the messages transferred to the Computer B, some may be rejected (lost) and some will be accepted, and will be either true (successful transfer) or false. Obviously the requirement is for a high probability of successful transfer (ideally = 1), low probability of false transfer (ideally = 0) and a low probability of lost messages. In particular the false rate should be kept low, even at the expense of an increased lost message rate. Note in some messages there may be in-built redundancy for example in text message REPAUT FOR WEDLESDAY However if this is followed by 10 JUNE we would ?? 10 Other example where there is little or no redundancy are Car registration numbers, Accounts etc, generally numeric or unstructured alpha-numeric information. There is thus a need for a low false rate appropriate to the function of the system and it is important for the information in Computer B to be correct even if it takes a long time to transfer. Error control coding may be considered further in two main ways.
In terms of System Performance i.e. the probabilities of successful, false and lost


message transfer. In this case we only need to know what the code for error detection / correction can do in terms of its ability to detect and correct errors (depends on hamming distance).


In terms of the Error Control Code itself i.e. the structure, operation, characteristics

and implementation of various types of codes.


In order to determine system performance in terms of successful, false and lost message transfers it is necessary to know: 1) the probability of error or b.e.r p. 2) the no. of bits in the message block N 3) the ability of the code to detect/ correct errors, usually expressed as a minimum Hamming distance, dmin for the code. Since the b.e.r, p, and the number of bits in the block, N we can apply the equation below
( R ) = N! N R p R (1 p ) (N R )! R! 0! =1, 1! =1

This gives the probability of R errors in an N bit block subject to a bit error rate p. Hence, for an N bit block we can determine the probability of no errors in the block (R=0) i.e. an error free block (0) = N! N 0 p 0 (1 p ) = (1 p ) N (N 0)!0!

the probability of 1 error in the block (R=1) N! N 1 p 1 (1 p ) = N p (1 p ) N 1 (N 1)!1!

(1) =

the probability of 2 error in the block (R=2) N! N 2 p 2 (1 p ) (N 2)!2!

(2) =

R=3, R=4 etc. P(3), P(4), P(5) ,..P(N).


The minimum hamming distance of an error control code, is a parameter which indicates the worst case ability of the code to detect/correct errors. In general, codes will perform better than indicated by the minimum Hamming distance.



dmin = minimum Hamming distance l = number of bits errors detected t = number of bit errors corrected

It may be shown that dmin = l + t + 1 detection/ correction. For example, suppose a code has a dmin = 6. Since, 1) 2) 3) dmin = l + t + 1 6= 5 + 0 + 1 {detect up to 5 errors , no correction} 6= 4 + 1 + 1 {detect up to 4 errors , correct 1 error} 6= 3 + 2 + 1 {detect up to 3 errors , correct 2 error} We have as options with t l For a given dmin , there are a range of (worst case) options from just error detection to error

After this, t>l, i.e. cannot go further, since we cannot correct more errors than can be detected. In option 1), up to 5 errors can be detected i.e. 1,2,3,4 or 5 errors detected, but there is no error correction. In option 2), up to 4 errors can be detected i.e. 1,2,3,4 errors detected, and 1 error can be corrected. In option 3), up to 3 errors can be detected i.e. 1,2,3 errors detected, and 1 and 2 errors can be corrected. Hence a given code can give several decoding, error detection/correction options at the receiver. In an ARQ system with no FEC, we would implement option 1, i.e detect as many errors as possible. If FEC were to be used, we might choose option 3 which allows 1 and 2 errors in a block to be detected and corrected, 3 errors can be detected but not corrected and these messages could be rejected and recovered by ARQ. For option 3 for example, if 4 or more errors occurred, these would not be detected and these messages would be accepted but would be false messages. Fortunately, the higher the no. of errors, the less the probability they will occur for reasonable values of p.


From the above, we may conclude that: Messages transfers are successful if no errors occurs or if t errors occurs which are corrected. i.e. Probability of Success = p (0) + p (i )
i =1 t

Messages transfers are lost if up to l errors are detected which are not corrected, i.e Probability of lost = p(t+1) + p(t+2)+ . P(l) =

i =t +1


Message transfers are false of l+1 or more errors occurs Probability of false = p(l+1) + p(l+2)+ . P(N) =

i =l +1


Example Using dmin = 6, option 3, (t=1, l =4) Probability of Successful transfer = p(0) + p(1) Probability of lost messages = p(2) + p(3) + p(4) Probability of false messages = p(5) + p(6)+ .+ p(N).


9. Explain back-up-plans

In information technology, a backup, or the process of backing up, refers to the copying and archiving of computer data so it may be used to restore the original after a data loss event. The verb form is to back up in two words, whereas the noun is backup. Backups have two distinct purposes. The primary purpose is to recover data after its loss, be it by data deletion or corruption. Data loss can be a common experience of computer users. A 2008 survey found that 66% of respondents had lost files on their home PC. The secondary purpose of backups is to recover data from an earlier time, according to a userdefined data retention policy, typically configured within a backup application for how long copies of data are required. Though backups popularly represent a simple form of disaster recovery, and should be part of a disaster recovery plan, by themselves, backups should not alone be considered disaster recovery. One reason for this is that not all backup systems or backup applications are able to reconstitute a computer system or other complex configurations such as a computer cluster, active directory servers, or a database server, by restoring only data from a backup. Since a backup system contains at least one copy of all data worth saving, the data storage requirements can be significant. Organizing this storage space and managing the backup process can be a complicated undertaking. A data repository model can be used to provide structure to the storage. Nowadays, there are many different types of data storage devices that are useful for making backups. There are also many different ways in which these devices can be arranged to provide geographic redundancy, data security, and portability. Before data is sent to its storage location, it is selected, extracted, and manipulated. Many different techniques have been developed to optimize the backup procedure. These include optimizations for dealing with open files and live data sources as well as compression, encryption, and de-duplication, among others. Every backup scheme should include dry runs that validate the reliability of the data being backed up. It is important to recognize the limitations and human factors involved in any backup scheme. Because data is the heart of the enterprise, it's crucial for you to protect it. And to protect your organization's data, you need to implement a data backup and recovery plan. Backing up files can protect against accidental loss of user data, database corruption, hardware failures, and even natural disasters. It's your job as an administrator to make sure that backups are performed and that backup tapes are stored in a secure location.


Creating a Backup and Recovery Plan

Data backup is an insurance plan. Important files are accidentally deleted all the time. Mission-critical data can become corrupt. Natural disasters can leave your office in ruin. With a solid backup and recovery plan, you can recover from any of these. Without one, you're left with nothing to fall back on.
Figuring Out a Backup Plan

It takes time to create and implement a backup and recovery plan. You'll need to figure out what data needs to be backed up, how often the data should be backed up, and more. To help you create a plan, consider the following:

How important is the data on your systems? The importance of data can go a

long way in helping you determine if you need to back it upas well as when and how it should be backed up. For critical data, such as a database, you'll want to have redundant backup sets that extend back for several backup periods. For less important data, such as daily user files, you won't need such an elaborate backup plan, but you'll need to back up the data regularly and ensure that the data can be recovered easily.

What type of information does the data contain? Data that doesn't seem

important to you may be very important to someone else. Thus, the type of information the data contains can help you determine if you need to back up the dataas well as when and how the data should be backed up.

How often does the data change? The frequency of change can affect your

decision on how often the data should be backed up. For example, data that changes daily should be backed up daily.

How quickly do you need to recover the data? Time is an important factor in

creating a backup plan. For critical systems, you may need to get back online swiftly. To do this, you may need to alter your backup plan.

Do you have the equipment to perform backups? You must have backup

hardware to perform backups. To perform timely backups, you may need several backup devices and several sets of backup media. Backup hardware includes tape drives, optical drives, and removable disk drives. Generally, tape drives are less expensive but slower than other types of drives.

Who will be responsible for the backup and recovery plan? Ideally, someone

should be a primary contact for the organization's backup and recovery plan. This 50

person may also be responsible for performing the actual backup and recovery of data.

What is the best time to schedule backups? Scheduling backups when system

use is as low as possible will speed the backup process. However, you can't always schedule backups for off-peak hours. So you'll need to carefully plan when key system data is backed up.

Do you need to store backups off-site? Storing copies of backup tapes off-site is

essential to recovering your systems in the case of a natural disaster. In your offsite storage location, you should also include copies of the software you may need to install to reestablish operational systems.
The Basic Types of Backup

There are many techniques for backing up files. The techniques you use will depend on the type of data you're backing up, how convenient you want the recovery process to be, and more. If you view the properties of a file or directory in Windows Explorer, you'll note an attribute called Archive. This attribute often is used to determine whether a file or directory should be backed up. If the attribute is on, the file or directory may need to be backed up. The basic types of backups you can perform include

Normal/full backups All files that have been selected are backed up, regardless of

the setting of the archive attribute. When a file is backed up, the archive attribute is cleared. If the file is later modified, this attribute is set, which indicates that the file needs to be backed up.

Copy backups All files that have been selected are backed up, regardless of the

setting of the archive attribute. Unlike a normal backup, the archive attribute on files isn't modified. This allows you to perform other types of backups on the files at a later date.

Differential backups Designed to create backup copies of files that have changed

since the last normal backup. The presence of the archive attribute indicates that the file has been modified and only files with this attribute are backed up. However, the archive attribute on files isn't modified. This allows you to perform other types of backups on the files at a later date.

Incremental backups Designed to create backups of files that have changed since

the most recent normal or incremental backup. The presence of the archive 51

attribute indicates that the file has been modified and only files with this attribute are backed up. When a file is backed up, the archive attribute is cleared. If the file is later modified, this attribute is set, which indicates that the file needs to be backed up.

Daily backups Designed to back up files using the modification date on the file

itself. If a file has been modified on the same day as the backup, the file will be backed up. This technique doesn't change the archive attributes of files. In your backup plan you'll probably want to perform full backups on a weekly basis and supplement this with daily, differential, or incremental backups. You may also want to create an extended backup set for monthly and quarterly backups that includes additional files that aren't being backed up regularly.
Tip You'll often find that weeks or months can go by before anyone notices that a file or

data source is missing. This doesn't mean the file isn't important. Although some types of data aren't used often, they're still needed. So don't forget that you may also want to create extra sets of backups for monthly or quarterly periods, or both, to ensure that you can recover historical data over time.
Differential and Incremental Backups

The difference between differential and incremental backups is extremely important. To understand the distinction between them, examine Table 1. As it shows, with differential backups you back up all the files that have changed since the last full backup (which means that the size of the differential backup grows over time). With incremental backups, you only back up files that have changed since the most recent full or incremental backup (which means the size of the incremental backup is usually much smaller than a full backup).
Table -1 Incremental and Differential Backup Techniques Day Week of Weekly Full Backup with Daily Weekly Full Backup with Daily Differential Backup Incremental Backup

Sunday Monday Tuesday

A full backup is performed. changes since Sunday. changes since Sunday.

A full backup is performed. incremental incremental backup backup contains contains changes since Sunday. changes since Monday.

A differential backup contains all An A differential backup contains all An


Wednesday Thursday Friday Saturday

A differential backup contains all An changes since Sunday. changes since Sunday. A differential backup contains all An A differential backup contains all An changes since Sunday. changes since Sunday. A differential backup contains all An

incremental incremental incremental incremental

backup backup backup backup

contains contains contains contains

changes since Tuesday. changes since Wednesday.

changes since Thursday. changes since Friday.

Once you determine what data you're going to back up and how often, you can select backup devices and media that support these choices. These are covered in the next section.
Selecting Backup Devices and Media

Many tools are available for backing up data. Some are fast and expensive. Others are slow but very reliable. The backup solution that's right for your organization depends on many factors, including

Capacity The amount of data that you need to back up on a routine basis. Can the

backup hardware support the required load given your time and resource constraints?

Reliability The reliability of the backup hardware and media. Can you afford to

sacrifice reliability to meet budget or time needs?

Extensibility The extensibility of the backup solution. Will this solution meet your

needs as the organization grows?

Speed The speed with which data can be backed up and recovered. Can you afford

to sacrifice speed to reduce costs?

Cost The cost of the backup solution. Does it fit into your budget?

Common Backup Solutions

Capacity, reliability, extensibility, speed, and cost are the issues driving your backup plan. If you understand how these issues affect your organization, you'll be on track to select an appropriate backup solution. Some of the most commonly used backup solutions include

Tape drives Tape drives are the most common backup devices. Tape drives use

magnetic tape cartridges to store data. Magnetic tapes are relatively inexpensive but aren't highly reliable. Tapes can break or stretch. They can also lose


information over time. The average capacity of tape cartridges ranges from 100 MB to 2 GB. Compared with other backup solutions, tape drives are fairly slow. Still, the selling point is the low cost.

Digital audio tape (DAT) drives DAT drives are quickly replacing standard tape

drives as the preferred backup devices. DAT drives use 4 mm and 8 mm tapes to store data. DAT drives and tapes are more expensive than standard tape drives and tapes, but they offer more speed and capacity. DAT drives that use 4 mm tapes can typically record over 30 MB per minute and have capacities of up to 16 GB. DAT drives that use 8 mm tapes can typically record more than 10 MB per minute and have capacities of up to 36 GB (with compression).

Auto-loader tape systems Auto-loader tape systems use a magazine of tapes to

create extended backup volumes capable of meeting the high-capacity needs of the enterprise. With an auto-loader system, tapes within the magazine are automatically changed as needed during the backup or recovery process. Most auto-loader tape systems use DAT tapes. The typical sys tem uses magazines with between 4 and 12 tapes. The main drawback to these systems is the high cost.

Magnetic optical drives Magnetic optical drives combine magnetic tape

technology with optical lasers to create a more reliable backup solution than DAT. Magnetic optical drives use 3.5-inch and 5.25-inch disks that look similar to floppies but are much thicker. Typically, magnetic optical disks have capacities of between 1 GB and 4 GB.

Tape jukeboxes Tape jukeboxes are similar to auto-loader tape systems.

Jukeboxes use magnetic optical disks rather than DAT tapes to offer high-capacity solutions. These systems load and unload disks stored internally for backup and recovery operations. Their key drawback is the high cost.

Removable disks Removable disks, such as Iomega Jaz, are increasingly being

used as backup devices. Removable disks offer good speed and ease of use for a single drive or single system backup. However, the disk drives and the removable disks tend to be more expensive than standard tape or DAT drive solutions.

Disk drives Disk drives provide the fastest way to back up and restore files. With

disk drives, you can often accomplish in minutes what takes a tape drive hours. So when business needs mandate a speedy recovery, nothing beats a disk drive. The drawbacks to disk drives, however, are relatively high costs and less extensibility. 54

Before you can use a backup device, you must install it. When you install backup devices other than standard tape and DAT drives, you need to tell the operating system about the controller card and drivers that the backup device uses. For detailed information on installing devices and drivers, see the section of Chapter 2 entitled "Managing Hardware Devices and Drivers."
Buying and Using Tapes

Selecting a backup device is an important step toward implementing a backup and recovery plan. But you also need to purchase the tapes or disks, or both, that will allow you to implement your plan. The number of tapes you need depends on how much data you'll be backing up, how often you'll be backing up the data, and how long you'll need to keep additional data sets. The typical way to use backup tapes is to set up a rotation schedule whereby you rotate through two or more sets of tapes. The idea is that you can increase tape longevity by reducing tape usage and at the same time reduce the number of tapes you need to ensure that you have historic data on hand when necessary. One of the most common tape rotation schedules is the 10-tape rotation. With this rotation schedule, you use 10 tapes divided into two sets of 5 (one for each weekday). As shown in Table -2, the first set of tapes is used one week and the second set of tapes is used the next week. On Fridays, full backups are scheduled. On Mondays through Thursdays, incremental backups are scheduled. If you add a third set of tapes, you can rotate one of the tape sets to an off-site storage location on a weekly basis.
Table-2 Using Incremental Backups Day of Week Tape Set 1 Tape Set 2

Friday Monday Tuesday Wednesday Thursday

Full backup on Tape 5

Full backup on Tape 5

Incremental backup on Tape 1 Incremental backup on Tape 1 Incremental backup on Tape 2 Incremental backup on Tape 2 Incremental backup on Tape 3 Incremental backup on Tape 3 Incremental backup on Tape 4 Incremental backup on Tape 4

Tip The 10-tape rotation schedule is designed for the 9 to 5 workers of the world. If you're

in a 24 x 7 environment, you'll definitely want extra tapes for Saturday and Sunday. In this case, use a 14-tape rotation with two sets of 7 tapes. On Sundays, schedule full backups. On Mondays through Saturdays, schedule incremental backups.


10. Duties of system administration

A system administrator, or sysadmin, is a person who is responsible for the upkeep, configuration, and reliable operation of computer systems; especially multi-user computers, such as servers. The system administrator seeks to ensure that the uptime, performance, resources, and security of the computers he or she manages meet the needs of the users, without exceeding the budget. To meet these needs, a system administrator may acquire, install, or upgrade computer components and software; automate routine tasks; write computer programs; troubleshoot; train and/or supervise staff; and provide technical support. The person who is responsible for setting up and maintaining the system or server is called as the system administrator. System administrators may be members of an information technology department. Most of the following discussion also applies to network and Windows system admins. The duties of a system administrator are wide-ranging, and vary widely from one organization to another. Sysadmins are usually charged with installing, supporting, and maintaining servers or other computer systems, and planning for and responding to service outages and other problems. Other duties may include scripting or light programming, project management for systems-related projects.
The system administrator is responsible for following things:

1. User administration (setup and maintaining account) 2. Maintaining system 3. Verify that peripherals are working properly 4. Quickly arrange repair for hardware in occasion of hardware failure 5. Monitor system performance 6. Create file systems 7. Install software 8. Create a backup and recover policy 9. Monitor network communication 10. Update system as soon as new version of OS and application software comes out 11. Implement the policies for the use of the computer system and network 12. Setup security policies for users. A sysadmin must have a strong grasp of computer security (e.g. firewalls and intrusion detection systems) 56

13. Documentation in form of internal wiki 14. Password and identity management
Cloud computing and sysadmin

Cloud computing is nothing but a large number of computers connected through the Internet/Wan. Cloud computing is now part of technology and sysadmin must lean: 1. Automation software such as puppet, chef, etc. 2. Cloud infrastructure such as AWS, Openstack etc. 3. Network services in cloud such as Content delivery networks (Akamai, CloudFront etc) and DNS servers. 4. Source control 5. Designing best practices for backups, and whole infrastructure. A system administrator's responsibilities might include:

Analyzing system logs and identifying potential issues with computer systems. Introducing and integrating new technologies into existing data center environments. Performing routine audits of systems and software. Performing backups. Applying operating system updates, patches, and configuration changes. Installing and configuring new hardware and software. Adding, removing, or updating user account information, resetting passwords,etc. Answering technical queries and assisting users. Responsibility for security. Responsibility for documenting the configuration of the system. Troubleshooting any reported problems. System performance tuning. Ensuring that the network infrastructure is up and running. Configure, Add, Delete File Systems. Knowledge of Volume management tools like Veritas (now Symantec), Solaris ZFS, LVM.

In larger organizations, some of the tasks above may be divided among different system administrators or members of different organizational groups. For example, a dedicated individual(s) may apply all system upgrades, a Quality Assurance (QA) team may perform testing and validation, and one or more technical writers may be responsible for all


technical documentation written for a company. System administrators, in larger organizations, tend not to be systems architects, system engineers, or system designers. In smaller organizations, the system administrator might also act as technical support, Database Administrator, Network Administrator, Storage (SAN) Administrator or application analyst. The root account has full (unrestricted) access, so he/she can do anything with system. For example, root can remove critical system files. In addition, there is no way you can recover file except using tape backup or disk based backup systems. Many tasks for system administration can be automated using Perl/Python or shell scripts. For example:

Create new users Resetting user passwords Lock/unlock user accounts Monitor server security Monitor special services etc

Most important skill to a system administrator Problem solving, period. This can some time lead into all sorts of constraints and stress.

When workstation or server goes down, you are called to solve the problem. You should able to quickly and correctly diagnose the problem. You must figure out what is wrong and how best it can be fixed in small amount of time.
System administrators are not...

Cookie cutting software engineers. Developers. It is not usually within your duties to design new applications software. But, you must understand the behavior of software in order to deploy it and to troubleshoot problems, and generally should be good at several programming languages used for scripting or automation of routine tasks such as shell, awk, perl, python etc.


11. Explain the concept at least any three system DLC models along with diagram advantages and disadvantages?

The commonly used SDLC models are:

Waterfall Model: In Waterfall model, the software project is sliced into six

different stages namely (1) Project Planning (2) Requirement Definition (3) Design (4) Development (5) Integration and Test (6) Installation and Acceptance. The Waterfall model is the base for all other SDLC models.

Spiral Model: The spiral model begins with initial pass through the lifecycle of

waterfall model by considering only a subset of the total requirement. Based on the pass, a robust prototype is developed. This strategy is followed for different subsets of requirement and for each pass or iteration, the prototype will grow larger.

RAD Model or Prototyping Model: RAD model offers an interesting approach in

which the prototype is developed first. The User can inspect the prototype to get a feel of the actual product and then the product development is made.

Agile Methodology: In recent days, the SDLC model that is popularly used is the

Agile SDLC or Agile Methodologies. Agile SDLC is a combination of various software development methodologies that are based on the incremental and iterative development. In Agile methodology, the solutions are evolved through the collaboration between the teams that are self-organizing and cross functional.


Waterfall Model Diagram of Waterfall-model:


The waterfall Model is a linear sequential flow. In which progress is seen as flowing steadily downwards (like a waterfall) through the phases of software implementation. This means that any phase in the development process begins only if the previous phase is complete. The waterfall approach does not define the process to go back to the previous phase to handle changes in requirement. The waterfall approach is the earliest approach that was used for software development.
The usage

Projects did not focus on changing requirements, for example, responses for request for proposals (RFPs)
Advantages and Disadvantages Advantages Disadvantages

Easy to explain to the user Structures approach. Assumes that the requirements of a Stages and activities are well defined Helps to system can be frozen Very difficult to go back to any stage after it finished. plan and schedule the project Verification at each stage ensures early detection Little flexibility and adjusting scope is difficult and expensive. Costly and of errors / misunderstanding required more time, in addition to Each phase has specific deliverables detailed plan


V-Shaped Model Diagram of V-model:


It is an extension for waterfall model, Instead of moving down in a linear way, the process steps are bent upwards after the coding phase, to form the typical V shape. The major difference between v-shaped model and waterfall model is the early test planning in vshaped model.
The usage

Software requirements clearly defined and known Software development technologies and tools is well known
Advantages and Disadvantages Advantages Disadvantages Simple and easy to use. Each phase has specific Very inflexible, like the waterfall deliverables. Higher chance of success over the model. Little flexibility and adjusting waterfall model due to the development of test scope is difficult and expensive. plans early on during the life cycle. Works well Software is developed during the implementation phase, so no early for where requirements are easily understood. prototypes of the software are produced. Model doesnt provide a clear path for problems found during testing phases. Costly and required more time, in addition to detailed plan


Evolutionary Prototyping Model Description

It refers to the activity of creating prototypes of software applications, for example, incomplete versions of the software program being developed. It is an activity that can occur in software development. It used to visualize some component of the software to limit the gap of misunderstanding the customer requirements by the development team. This also will reduce the iterations may occur in waterfall approach and hard to be implemented due to inflexibility of the waterfall approach. So, when the final prototype is developed, the requirement is considered to be frozen. It has some types, such as: Throwaway prototyping: Prototypes that are eventually discarded rather than becoming a part of the finally delivered software Evolutionary prototyping: prototypes that evolve into the final system through iterative incorporation of user feedback. Incremental prototyping: The final product is built as separate prototypes. At the end the separate prototypes are merged in an overall design. Extreme prototyping: used at web applications mainly. Basically, it breaks down web development into three phases, each one based on the preceding one. The first phase is a static prototype that consists mainly of HTML pages. In the second phase, the screens are programmed and fully functional using a simulated services layer. In the third phase the services are implemented
The usage

This process can be used with any software developing life cycle model. While this shall be focused with systems needs more user interactions. So, the system do not have user interactions, such as, system does some calculations shall not have prototypes.
Advantages and Disadvantages Advantages Disadvantages Reduced time and costs, but this can be Insufficient analysis User confusion of disadvantage if the developer lose time in prototype and finished system developing the prototypes Improved and Developer misunderstanding of user increased user involvement objectives Excessive development time of the prototype Expense of implementing prototyping


Spiral Method (SDM) Diagram of Spiral model:


It is combining elements of both design and prototyping-in-stages, in an effort to combine advantages of top-down and bottom-up concepts. This model of development combines the features of the prototyping model and the waterfall model. The spiral model is favored for large, expensive, and complicated projects. This model uses many of the same phases as the waterfall model, in essentially the same order, separated by planning, risk assessment, and the building of prototypes and simulations.
The usage

It is used in shrink-wrap application and large system which built in small phases or segments.
Advantages and Disadvantages Disadvantages Advantages Estimates (i.e. budget, schedule, etc.) become High cost and time to reach the final more realistic as work progresses, because product. Needs special skills to evaluate important issues are discovered earlier. Early the risks and assumptions Highly involvement of developers Manages risks and customized limiting re-usability develops system into phases


Iterative and Incremental Method Description

It is developed to overcome the weaknesses of the waterfall model. It starts with an initial planning and ends with deployment with the cyclic interactions in between. The basic idea behind this method is to develop a system through repeated cycles (iterative) and in smaller portions at a time (incremental), allowing software developers to take advantage of what was learned during development of earlier parts or versions of the system.
The usage

It is used in shrink-wrap application and large system which built in small phases or segments. Also can be used in system has separated components, for example, ERP system. Which we can start with budget module as first iteration and then we can start with inventory module and so forth.
Advantages and Disadvantages Advantages Disadvantages Produces business value early in the Requires heavy documentation Follows a development life cycle Better use of scarce defined set of processes Defines increments resources through proper increment based on function and feature dependencies definition Can accommodate some change Requires more customer involvement than requests between increments More focused the linear approaches on customer value than the linear approaches Partitioning the functions and features Problems can be detected earlier might be problematic Integration between iteration can be an issue if this is not considered during the development. Extreme programing (Agile development)

It is based on iterative and incremental development, where requirements and solutions evolve through collaboration between cross-functional teams. It can be used with any type of the project, but it needs more involvement from customer and to be interactive. Also, it can be used when the customer needs to have some functional requirement ready in less than three weeks.
Advantages and Disadvantages Advantages Disadvantages Decrease the time required to avail some system Scalability Skill of the software features. Face to face communication and developers Ability of customer to continuous inputs from customer representative express user needs Documentation is leaves no space for guesswork. The end result is done at later stages the high quality software in least possible time Reduce the usability of components. Needs special skills for the team. duration and satisfied customer