DEPARTMENT
OF
INFORMATION TECHNOLOGY
SEMESTER-VII
TABLE OF CONTENTS
LIST OF SUBJECTS
CURRICULAM
IT6701
Information Management
CS6701
IT6702
CS6703
IT6004
Software Testing
IT6711
IT6712
Security Laboratory
IT6713
PAGE
NO
iii
Code No.
Course Title
THEORY
IT6701
Information Management
CS6701
IT6702
CS6703
IT6004
Software Testing
PRACTICAL
IT6711
IT6712
Security Laboratory
15
21
IT6713
TOTAL
CS6701
LTPC
3003
UNIT I
INTRODUCTION & NUMBER THEORY
10
Services, Mechanisms and attacks-the OSI security architecture-Network security modelClassical Encryption techniques (Symmetric cipher model, substitution techniques,
transposition techniques, steganography).FINITE FIELDS AND NUMBER THEORY:
Groups, Rings, Fields-Modular arithmetic- Euclids algorithm-Finite fields- Polynomial
Arithmetic Prime numbers-Fermats and Eulers theorem- Testing for primality -The
Chinese remainder theorem- Discrete logarithms.
UNIT II
BLOCK CIPHERS & PUBLIC KEY CRYPTOGRAPHY
10
Data Encryption Standard-Block cipher principles-block cipher modes of operationAdvanced Encryption Standard (AES)-Triple DES-Blowfish-RC5 algorithm. Public key
cryptography: Principles of public key cryptosystems-The RSA algorithm-Key
management Diffie Hellman Key exchange-Elliptic curve arithmetic-Elliptic curve
cryptography.
UNIT III HASH FUNCTIONS AND DIGITAL SIGNATURES
8
Authentication requirement Authentication function MAC Hash function Security
of hash function and MAC MD5 SHA HMAC CMAC Digital signature and
authentication protocols DSS EI Gamal Schnorr.
UNIT IV SECURITY PRACTICE & SYSTEM SECURITY
8
Authentication applications Kerberos X.509 Authentication services Internet
Firewalls for Trusted System: Roles of Firewalls Firewall related terminology- Types of
Firewalls Firewall designs SET for E-Commerce Transactions. Intruder Intrusion
detection system Virus and related threats Countermeasures Firewalls design
principles Trusted systems Practical implementation ofcryptography and security.
UNIT V
E-MAIL, IP & WEB SECURITY
9
E-mail Security: Security Services for E-mail-attacks possible through E-mail
establishing keys privacy-authentication of the source-Message Integrity-Nonrepudiation-Pretty Good Privacy-S/MIME. IPSecurity: Overview of IPSec IP and IPv6Authentication Header-Encapsulation Security Payload (ESP)-Internet Key Exchange
(Phases of IKE, ISAKMP/IKE Encoding). Web Security: SSL/TLS Basic Protocolcomputing the keys- client authentication-PKI as deployed by SSLAttacks fixed in
v3- Exportability-Encoding-Secure Electronic Transaction (SET).
TOTAL: 45
PERIODS
TEXT BOOKS:
1. William Stallings, Cryptography and Network Security, 6th Edition, Pearson
Education,
March 2013.
(UNIT
I,II,III,IV).
2. Charlie Kaufman, Radia Perlman and Mike Speciner, Network Security, Prentice
Hall of India, 2002. (UNIT V).
REFERENCES:
1. Behrouz A. Ferouzan, Cryptography & Network Security, Tata Mc Graw Hill, 2007.
2. Man Young Rhee, Internet Security: Cryptographic Principles, Algorithms and
Protocols,
Wiley Publications,
2003.
3. Charles Pfleeger, Security in Computing, 4th Edition, Prentice Hall of India, 2006.
4. Ulysess Black, Internet Security Protocols, Pearson Education Asia, 2000.
5. Charlie Kaufman and Radia Perlman, Mike Speciner, Network Security, Second
Edition,
Private Communication
in
Public
World,
PHI
2002.
6. Bruce Schneier and Neils Ferguson, Practical Cryptography, First Edition, Wiley
Dreamtech India
Pvt
Ltd,
2003.
7. Douglas R Simson Cryptography Theory and practice, First Edition, CRC Press,
1995.
8. http://nptel.ac.in/.
Faculty Name
: Prasath.R
Subject Name
Year
:Cryptography &
Network Security
:IV
:B.Tech/IT
Designation
:AP
Code
:CS6701
Semester
:07
AIM:
To understand OSI security architecture and classical encryption techniques Acquire fundamental
knowledge on the concepts of finite fields and number theory, understand various block cipher and stream
cipher models, Describe the principles of public key cryptosystems, hash functions and digital signature.
Sl. No.
No. of Periods
Required
Topics
Text /Ref.
Book
T1
T1
T1
T1
Modular arithmetic
TI
Euclids algorithm
T1
Finite fields
T1
Polynomial Arithmetic
T1
Prime numbers
T1
10
T1
11
T1
12
T1
13
T1
14
T1
15
T1
16
T1
17
T1
18
T1
19
T1
Text /Ref.
No. of Periods
Book
Required
UNIT III HASH FUNCTIONS AND DIGITAL SIGNATURES
Sl. No.
Topics
20
T1
21
22
2
2
T1
T1
23
T1
24
CMAC
25
T1
T1
26
DSS
T1
27
EI Gamal Schnorr
T1
Authentication applications
T1
29
Kerberos X.509
Roles of Firewalls, Terms, Types of Firewalls
Firewall
SET,IDS,designs
Virus and related threats
Countermeasures, Firewalls design principles
Trusted systems
T1
2
2
1
1
T1
T1
T1
T1
34
T1
35
E-mail Security
T2
36
Message
Integrity-Non-repudiation-Pretty
Good Privacy-S/MIME
T2
37
Cache basics
T2
38
IPSecurity:
T2
39
T2
40
ESP,IKE
T2
41
Web Security
T2
42
T2
43
T2
T2
30
31
32
33
44
Electronic
UNIT I
PART A (TWO MARKS)
1. Specify the four categories of security threats.
Interruption Interception Modification Fabrication
8
Asymmetric encryption
It is a form of cryptosystem in
It is a form of cryptosystem in which
which encryption and decryption
encryption and decryption performed
Performed
using
two
keys.
using the same key. Eg: DES, AES
Eg:RSA,ECC
5. Define cryptanalysis?
It is a process of attempting to discover the key or plaintext or both.
6. Compare stream cipher with block cipher with example. (May 15)
Stream cipher
Block cipher
Processes the input stream continuously Processes the input one block of
and producing one element at a time. elements at a time producing an output
Example: caeser cipher
block for each input block. Example:
DES
9. Define steganography.
Hiding the message into some cover media. It conceals the existence of a message.
10. Why network need security?
When systems are connected through the network, active attacks and passive attacks are
possible during transmission time from sender to receiver and vice versa. So network
needs security.
11. Define Encryption.
The process of converting from plaintext to cipher text is known as encryption.
12. Specify the components of encryption algorithm.
(a) Plaintext (b) Encryption algorithm (c) secret key (d) cipher text (e) Decryption
algorithm.
13. Define confidentiality and authentication
Confidentiality: It means how to maintain the secrecy of message. It ensures that the
information in a computer system and transmitted information are accessible only for
reading by authorized person.
Authentication: It helps to prove that the source entity only has involved the transaction.
14. Define cryptography.
It is a science of writing Secret code using mathematical techniques. The many schemes
used for enciphering constitute the area of study known as cryptography.
15. Compare Substitution and Transposition techniques. (Dec 14)
16.
Define
SUBSTITUTION
TRANSPOSITION
11
Data Integrity - assurance that data received is as sent by an authorized entity NonRepudiation - protection against denial by one of the parties in a communication
Security Mechanisms
specific security mechanisms:
encipherment, digital signatures, access controls, data integrity, authentication
exchange, traffic padding, routing control, notarization
pervasive security mechanisms:
trusted functionality, security labels, event detection, security audit trails, security
recovery
Classification of Security Attacks as
passive attacks - eavesdropping on, or monitoring of, transmissions to:
obtain message contents, or monitor traffic flows
active attacks modification of data stream to: masquerade of one entity as some other
replay previous messages modify messages in transit denial of service
Security Attacks is classified as
Passive attack
Active attacks
Modification or creation of messages (by attackers)
Four categories: modification of messages, replay, masquerade, denial of service.
Easy to detect but difficult to prevent.
Defense: detect attacks and recover from damages
12
replace each letter of message by a letter a fixed distance away eg use the 3rd
letter on
eg.
13
ie mapping is
ABCDEFGHIJKLMNOPQRSTUVWXYZ
DEFGHIJKLMNOPQRSTUVWXYZABC
Mixed Alphabets
each plaintext letter is given a different random ciphertext letter, hence key is 26
letters long
Plain: ABCDEFGHIJKLMNOPQRSTUVWXYZ
Cipher: DKVQFIBJWPESCXHTMYAUOLRGZN
Plaintext: IFWEWISHTOREPLACELETTERS
Cipher text: WIRFRWAJUHYFTSDVFSFUUFYA
Cryptanalysis
General Monoalphabetic
14
Polyalphabetic Substitution
(since same plaintext letter gets replaced by several ciphertext letter, depending on
which alphabet is used)
Vigenre Cipher
use each alphabet in turn, repeating from start after d letters in message
4. Explain Euclids algorithm and Fermats Little Theorem. (May 12 & May 15)
The Euclidean Algorithm is a technique for quickly finding the GCD of two
integers.
The Algorithm
The Euclidean Algorithm for finding GCD(A,B) is as follows:
15
16
remote user during an attempt to connect to an enclave. Active attacks result in the
disclosure or dissemination of data files, DoS, or modification of data.
DISTRIBUTED ATTACK
A distributed attack requires that the adversary introduce code, such as a Trojan horse
or back-door program, to a trusted component or software that will later be distributed
to many other companies and users Distribution attacks focus on the malicious
modification of hardware or software at the factory or during distribution. These attacks
introduce malicious code such as a back door to a product to gain unauthorized access to
information or to a system function at a later date.
INSIDER ATTACK
An insider attack involves someone from the inside, such as a disgruntled
employee, attacking the network Insider attacks can be malicious or no malicious.
Malicious insiders intentionally eavesdrop, steal, or damage information; use information
in a fraudulent manner; or deny access to other authorized users. No malicious attacks
typically result from carelessness, lack of knowledge, or intentional circumvention of
security for such reasons as performing a task
CLOSE-IN ATTACK
A close-in attack involves someone attempting to get physically close to network
components, data, and systems in order to learn more about a network Close-in attacks
consist of regular individuals attaining close physical proximity to networks, systems, or
facilities for the purpose of modifying, gathering, or denying access to information. Close
physical proximity is achieved through surreptitious entry into the network, open access,
or both.
PHISHING ATTACK
In phishing attack the hacker creates a fake web site that looks exactly like a
popular site such as the SBI bank or paypal. The phishing part of the attack is that the
hacker then sends an e-mail message trying to trick the user into clicking a link that leads
to the fake site. When the user attempts to log on with their account information, the
hacker records the username and password and then tries that information on the real site.
17
HIJACK ATTACK
Hijack attack In a hijack attack, a hacker takes over a session between you and
another individual and disconnects the other individual from the communication. You still
believe that you are talking to the original party and may send private information to the
hacker by accident.
SPOOF ATTACK
Spoof attack In a spoof attack, the hacker modifies the source address of the
packets he or she is sending so that they appear to be coming from someone else. This
may be an attempt to bypass your firewall rules.
BUFFER OVERFLOW
Buffer overflow A buffer overflow attack is when the attacker sends more data to
an application than is expected. A buffer overflow attack usually results in the attacker
gaining administrative access to the system in a ommand prompt or shell.
EXPLOIT ATTACK
Exploit attack In this type of attack, the attacker knows of a security problem
within an operating system or a piece of software and leverages that knowledge by
exploiting the vulnerability.
PASSWORD ATTACK
Password attack An attacker tries to crack the passwords stored in a network account
database or a password-protected file. There are three major types of password attacks: a
dictionary attack, a brute-force attack, and a hybrid attack. A dictionary attack uses a
word list file, which is a list of potential passwords. A brute-force attack is when the
attacker tries every possible combination of characters.
6. Explain Cipher Feedback and Output Feedback.
Cipher Feedback (CFB)
Message is
is
limitation is need to stall while do block encryption after every n-bits o note that
the block cipher is used in encryption mode at both ends
errors propagate for several blocks after the error
*(
mod
) for 1
Primitive Roots
from Eulers theorem have a(n)mod n=1 o consider ammod n=1, GCD(a,n)=1
must exist for m= (n) but may be smaller o once powers reach m, cycle will
repeat
if smallest is m= (n) then a is called a primitive root
if p is prime, then successive powers of a "generate" the group mod p o these are
useful but relatively hard to find
with external lines or radio/satellite links consider attacks and placement in this
scenario
snooping from another workstation
use dial-in to LAN or server to snoop
use external router link to enter & snoop monitor and/or modify traffic one
external links have two major placement alternatives
Link encryption
Encryption occurs independently on every link
implies must decrypt traffic between links
requires many devices, but paired keys
end-to-end encryption
Encryption occurs between original source and final destination
need devices at each end with shared keys
Traffic Confidentiality
when using end-to-end encryption must leave headers in clear
so network can correctly route information
hence although contents protected, traffic pattern flows are not
ideally want both at once end-to-end protects data contents over entire path and
provides authentication
link protects traffic flows from monitoring
Placement of Encryption
Can place encryption function at various layers in OSI Reference Model o link
encryption occurs at layers 1 or 2
end-to-end can occur at layers 3, 4, 6, 7
as move higher less information is encrypted but it is more secure though more
complex with more entities and keys
Traffic Analysis
Is monitoring of communications flows between parties o useful both in military
& commercial spheres
can also be used to create a covert channel o link encryption obscures header
details
but overall traffic volumes in networks and at end-points is still visible o traffic
padding can further obscure flows
UNIT II
21
2.
It must be impossible
impractial message if
information is available
plain text
Encryption algoritm
Public and private key
Cipher text
Decryption algorithm IT
22
23
8. List four general characteristics of schema for the distribution of the public key?(May
11)
The four general characteristics for the distribution of the public key are
Public announcement
Publicly available directory
Public-key authority
Public-key certificate
24
the cipher text. If the change is small, this might provider a way to reduce the size of the
plaintext or key space to be searched
.
13. Give the five modes of operation of Block cipher. (Dec 14)
Electronic Codebook(ECB)
Cipher Block Chaining(CBC)
Cipher Feedback(CFB)
Output Feedback(OFB)
Counter(CTR)
14. State advantages of counter mode.
*Hardware Efficiency * Software Efficiency *Preprocessing * Random Access *
Provable Security * Simplicity.
15. Find gcd (1970, 1066) using Euclids algorithm? (Dec 13)
gcd (1970,1066) = gcd(1066,1970 mod 1066)
= gcd(1066,904)
= 2 15.
16. What is the primitive root of a number? (Dec 11)
We can define a primitive root of a number p as one whose powers generate all
the I ntegers from 1 to p-1. That is p, if a is a primitive root of the prime number p then
the numbers.
17. Determine the gcd (24140, 16762) using Euclids algorithm.
Soln: We know, gcd(a, b) = gcd(b, a mod b)
gcd(24140,16762) =gcd(16762,7378)
gcd(7378,2006) =gcd(2006,1360)
gcd(1360,646)= gcd(646,68)
gcd(68,34) = 34
gcd(24140,16762) = 34.
18. Perform encryption and decryption using RSA Alg. for the following.
P=7; q=11; e=17; M=8.
Soln: n = pq
25
n = 7*11=77
(n)=(p-1) (q-1)
=6*10 = 60
e =17 d =27
C = Me mod n
C = 817 mod 77
= 57
M = Cd mod n
= 5727 mod 77
=8
19. What is elliptic curve cryptography?
Elliptic curve cryptography (ECC) is an approach to public-key cryptography based on
the algebraic structure of elliptic curves over finite fields. ECC requires smaller keys
compared to non-ECC cryptography (based on plain Galois fields) to provide equivalent
security. Elliptic curves are applicable for encryption, digital signatures, pseudo-random
generators and other tasks.
20. What is Blowfish?
Blowfish is a symmetric-key block cipher, designed in 1993 by Bruce Schneierand
included in a large number of cipher suites and encryption products. Schneier designed
Blowfish as a general-purpose algorithm, intended as an alternative to the aging DES and
free of the problems and constraints associated with other algorithms.
PART B (16 marks)
1. Explain Diffie Hellman key Exchange in detail with an example (May 11 & Dec
12)
DiffieHellman key exchange (DH) [nb 1] is a specific method of securely
exchanging cryptographic keys over a public channel and was one of the first public-key
protocols as originally conceptualized by Ralph Merkle and named after Whitfield
Diffie and Martin Hellman.[1][2]DH is one of the earliest practical examples of public key
exchange implemented within the field of cryptography. Traditionally, secure encrypted
communication between two parties required that they first exchange keys by some
26
secure physical channel, such as paper key lists transported by a trusted courier. The
DiffieHellman key exchange method allows two parties that have no prior knowledge of
each other to jointly establish a shared secret key over an insecure channel. This key can
then be used to encrypt subsequent communications using a symmetric key cipher.
DiffieHellman is used to secure a variety of Internet services. However, research
published in October 2015 suggests that the parameters in use for many DH Internet
applications at that time are not strong enough to prevent compromise by very wellfunded attackers, such as the security services of large governments.
The scheme was first published by Whitfield Diffie and Martin Hellman in 1976. By
1975, James H. Ellis, Clifford Cocks and Malcolm J. Williamson within GCHQ, the
British signals intelligence agency, had previously shown how public-key cryptography
could be achieved; however, their work was kept secret until 1997.
Although DiffieHellman key agreement itself is a non-authenticated key-agreement
protocol, it provides the basis for a variety of authenticated protocols, and is used to
provide forward secrecy in Transport Layer Security's ephemeral modes (referred to as
EDH or DHE depending on the cipher suite).
The method was followed shortly afterwards by RSA, an implementation of public-key
cryptography using asymmetric algorithms.
General overview
27
DiffieHellman Key Exchange establishes a shared secret between two parties that can be
used for secret communication for exchanging data over a public network. The following
conceptual diagram illustrates the general idea of the key exchange by using colors
instead of very large numbers.
The process begins by having the two parties, Alice and Bob, agree on an arbitrary
starting color that does not need to be kept secret (but should be different every time); in
this example the color is yellow. Each of them selects a secret colorred and aqua
respectivelythat they keep to themselves. The crucial part of the process is that Alice and
Bob now mix their secret color together with their mutually shared color, resulting in
orange and blue mixtures respectively, then publicly exchange the two mixed colors.
Finally, each of the two mix together the color they received from the partner with their
own private color. The result is a final color mixture (brown) that is identical to the
partner's color mixture.
If another party (usually named Eve in cryptology publications, Eve being a third-party
who is considered to be an eavesdropper) had been listening in on the exchange, it would
be computationally difficult for that person to determine the common secret color; in fact,
when using large numbers rather than colors, this action is impossible for
modern supercomputers to do in a reasonable amount of time.
Initial Permutation IP
Initial permutation of the key (PC1) which selects 56-bits in two28-bit halves
16 stages consisting of: selecting 24-bits from each half permuting them by PC2
for use in function f, rotating each half separately either 1 or 2 places depending on the
key rotation schedule K
DES Decryption
where a change of one input or key bit results in changing approx halfoutput bits
3. Briefly explain block cipher design principles and modes of operation. (Dec13)
Block Cipher Design Principles and Modes of Operation
Basic principles
still like Feistel in 1970s
number of rounds
more is better, exhaustive search best attack
function f:
provides confusion, is nonlinear, avalanche
key schedule
complex subkey creation, key avalanche
Modes of Operation
Need way to use in practice, given usually have arbitrary amount of information
to encrypt
four were defined for DES in ANSI standard ANSI X3.106-1983 Modesof Use
subsequently now have 5 for DES and AES have block and stream modes.
(i)Electronic Codebook Block (ECB)
message is broken into independent blocks which are encrypted
each block is a value which is substituted, like a codebook, hence name
each block is encoded independently of the other blocksC
uses: secure transmission of single values
Advantages and Limitations of ECB
repetitions in message may show in cipher text
if aligned with message block particularly with data such graphics
or with messages that change very little, which become a code-book analysis
problem
weakness due to encrypted message blocks being independent
main use is sending a few blocks of data
(ii)Cipher Block Chaining (CBC)
message is broken into blocks
but these are linked together in the encryption operation
each previous cipher blocks is chained with current plaintext block
Advantages and Limitations of CBC
each cipher text block depends on all message blocks
thus a change in the message affects all cipher text blocks after the change as well as
the original block
need Initial Value (IV) known to sender & receiver
30
However if IV is sent in the clear, an attacker can change bits of the first block, and
change IV to compensate hence either IV must be a fixed value (as in EFTPOS) or it
must besent encrypted in ECB mode before rest of message
at end of message, handle possible last short block
by padding either with known non-data value (eg nulls)
or pad last block with count of pad sizeeg. [ b1 b2 b3 0 0 0 0 5] <- 3 data bytes, then
5 bytes pad+count
(iii)Cipher Feedback (CFB)
message is treated as a stream of bits
added to the output of the block cipher
result is feedback for next stage (hence name)
standard allows any number of bit (1,8 or 64 or whatever) to be feed back
denoted CFB-1, CFB-8, CFB-64 etc
is most efficient to use all 64 bits (CFB-64)
uses: stream data encryption, authentication
Advantages and Limitations of CFB
appropriate when data arrives in bits/bytes
most common stream mode
limitation is need to stall while do block encryption after every n-bits
note that the block cipher is used in encryption mode at both ends
errors propagate for several blocks after the error
(iv)Output FeedBack (OFB)
message is treated as a stream of bits
output of cipher is added to message
output is then feed back (hence name)
feedback is independent of message
can be computed in advance C
uses: stream encryption over noisy channels
(v)Counter (CTR)
a new mode, though proposed early on
similar to OFB but encrypts counter value rather than any feedback value
must have a different key & counter value for every plaintext block (never reused)
C uses: high-speed network encryptions
4. Explain RSA algorithm in detail with an example (May 11, May 12 & Dec 14)
RSA is one of the first practical public-key cryptosystems and is widely used for secure
data transmission. In such a cryptosystem, the encryption key is public and differs from
the decryption key which is kept secret. In RSA, this asymmetry is based on the practical
difficulty of factoring the product of two large prime numbers, the factoring problem.
RSA is made of the initial letters of the surnames of Ron Rivest, Adi Shamir,
31
and Leonard Adleman, who first publicly described the algorithm in 1977. Clifford
Cocks, an English mathematician working for the UK intelligence agency GCHQ, had
developed an equivalent system in 1973, but it was notdeclassified until 1997.
A user of RSA creates and then publishes a public key based on two large prime numbers,
along with an auxiliary value. The prime numbers must be kept secret. Anyone can use
the public key to encrypt a message, but with currently published methods, if the public
key is large enough, only someone with knowledge of the prime numbers can feasibly
decode the message.Breaking RSAencryption is known as the RSA problem; whether it is
as hard as the factoring problem remains an open question.
RSA is a relatively slow algorithm, and because of this it is less commonly used to
directly encrypt user data. More often, RSA passes encrypted shared keys for symmetric
key cryptography which in turn can perform bulk encryption-decryption operations at
much higher speed.
The RSA algorithm involves four steps: key generation, key distribution, encryption and
decryption.
RSA involves a public key and a private key. The public key can be known by everyone
and is used for encrypting messages. The intention is that messages encrypted with the
public key can only be decrypted in a reasonable amount of time using the private key.
The basic principle behind RSA is the observation that it is practical to find three very
large positive integers e,d and n such that with modular exponentiation for all m:
and that even knowing e and n or even m it can be extremely difficult to find d.
Additionally, for some operations it is convenient that the order of the two
exponentiations can be changed and that this relation also implies:
1. Key distribution
To enable Bob to send his encrypted messages, Alice transmits her public key (n, e) to
Bob via a reliable, but not necessarily secret route. The private key is never distributed.
2. Encryption
Suppose that Bob would like to send message M to Alice. He first turns M into an
integer m, such that 0 m < n and gcd(m, n) = 1 by using an agreed-upon reversible
protocol known as a padding scheme. He then computes the cipher text c, using Alice's
public key e, corresponding to. This can be done efficiently, even for 500-bit numbers,
using modular exponentiation. Bob then transmits c to Alice.
3. Decryption
32
Alice can recover m from c by using her private key exponent d by computing. Given m,
she can recover the original message M by reversing the padding scheme.
4. Key generation
The keys for the RSA algorithm are generated the following way:
Choose two distinct prime numbers p and q.
For security purposes, the integers p and q should be chosen at random, and
should be similar in magnitude but 'differ in length by a few digits' [2] to make
factoring harder. Prime integers can be efficiently found using a primality
test.
Compute n = pq.
n is used as the modulus for both the public and private keys. Its length,
usually expressed in bits, is the key length.
Compute (n) = (p)(q) = (p 1)(q 1) = n (p + q 1), where is Euler's
totient function. This value is kept private.Choose an integer e such that 1 < e <
(n) and gcd(e, (n)) = 1; i.e., e and (n) are coprime.
1. Determine d as d e1 (mod (n)); i.e., d is
multiplicative inverse of e (modulo (n))
the modular
This is more clearly stated as: solve for d given de 1 (mod (n))
While the RSA patent expired in 2000, there may be patents in force covering certain
aspects of ECC technology, though some (including RSA Laboratories[3] and Daniel J.
Bernstein[4]) argue that the Federal elliptic curve digital signature standard (ECDSA;
NIST FIPS 186-3) and certain practical ECC-based key exchange schemes (including
ECDH) can be implemented without infringing them.
6. Explain Key management in detail. (16 mark)
Key Management
Distribution of Public Keys
o can do key exchange analogous to D-H o users select a suitable curve Ep(a,b)
o select base point G=(x1,y1) with large order n s.t. nG=O o A & B select private keys
nA<n, nB<n
o compute public keys: PA=nAG, PB=nBG o compute shared key: K=nAPB, K=nBPA
o same since K=nAnBG
ECC Encryption/Decryption
o several alternatives, will consider simplest
must first encode any message M as a point on the elliptic curve Pm
select suitable curve & point G as in D-H o each user chooses private key nA<n
o and computes public key PA=nAG
usually with other info such as period of validity, rights of use etc with all contents
signed by a trusted Public-Key or Certificate Authority (CA)
can be verified by anyone who knows the public-key authorities public-key
B generates a session key K sends it to A encrypted using the supplied public key
problem is that an opponent can intercept and impersonate both halves of protocol
cost computational
general security
implementation attacks
Designed by Rijmen-Daemen in Belgium has 128/192/256 bit keys, 128 bit data an
iterative rather than feistel cipher
treats data in 4 groups of 4 bytes
38
would seem to need 3 distinct keys o but can use 2 keys with E-D-E sequence
C = EK1[DK2[EK1[P]]]
if K1=K2 then can work with single DES o standardized in ANSI X9.17 &
ISO8732
no current known practical attacks
39
Secure: The key length is variable, it can be in the range of 32~448 bits: default
128 bits key length.
It is suitable for applications where the key does not change often, like
communication link or an automatic file encrypted.
Unpatented and royalty-free.
40
Description of Algorithm:
Blowfish symmetric block cipher algorithm encrypts block data of 64-bits at a time. It
will follows the feistel network and this algorithm is divided into two parts.
1. Key-expansion
2. Data Encryption
Key-expansion:
It will convert a key of at most 448 bits into several sub key arrays totaling 4168
bytes. Blowfish uses large number of sub keys.
These keys are generating earlier to any data encryption or decryption.
The p-array consists of 18, 32-bit subkeys:
P1, P2 P18
Four 32-bit S-Boxes consist of 256 entries each:
S1, 0, S1, 1 S1, 255
S2, 0, S2, 1 S2, 255
41
2.
3.
4.
5.
6.
7.
Initialize first the P-array and then the four S-boxes, in order, with a fixed string.
This string consists of the hexadecimal digits of pi (less the initial 3): P1 = 0x243f6a88,
P2 = 0x85a308d3, P3 = 0x13198a2e, P4 = 0x03707344, etc.
XOR P1 with the first 32 bits of the key, XOR P2 with the second 32-bits of the key,
and so on for all bits of the key (possibly up to P14). Repeatedly cycle through the key
bits until the entire P-array has been XORed with key bits. (For every short key, there is
at least one equivalent longer key; for example, if A is a 64-bit key, then AA, AAA, etc.,
are equivalent keys.)
Encrypt the all-zero string with the Blowfish algorithm, using the sub keys
described in steps (1) and (2).
Replace P1 and P2 with the output of step (3).
Encrypt the output of step (3) using the Blowfish algorithm with the modified sub
keys.
Replace P3 and P4 with the output of step (5).
Continue the process, replacing all entries of the P array, and then all four S-boxes
in order, with the output of the continuously changing Blowfish algorithm.
In total, 521 iterations are required to generate all required sub keys. Applications can
store the sub keys rather than execute this derivation process multiple times.
UNIT III
PART A (TWO MARKS)
1. What is message authentication? (Dec 14)
It is a procedure that verifies whether the received message comes from assigned source
has not been altered. It uses message authentication codes, hash algorithms to
authenticate the message.
2. Define the classes of message authentication function.
Message encryption: The entire cipher text would be used for authentication.
42
iii.
iv.
v.
vi.
vii.
viii.
MAC: In Message Authentication Code, the secret key shared by sender and receiver.
The MAC is appended to the message at the source at a time which the message is
assumed or known to be correct.
Hash Function: The hash value is appended to the message at the source at time when
the message is assumed or known to be correct. The hash function itself not considered to
be secret.
6. Any three hash algorithm.
MD5 (Message Digest version 5) algorithm.
SHA_1 (Secure Hash Algorithm).
RIPEMD_160 algorithm.
7. What are the requirements of the hash function?
H can be applied to a block of data of any size.
H produces a fixed length output.
H(x) is relatively easy to compute for any given x, making both hardware and
software implementations practical.
8. What do you mean by MAC?
MAC is Message Authentication Code. It is a function of message and secret key which
produce a fixed length value called as MAC. MAC = Ck(M) Where M = variable length
message K = secret key shared by sender and receiver. CK(M) = fixed length
authenticator.
9. Differentiate internal and external error control.
Internal error control: In internal error control, an error detecting code also known as
frame check sequence or checksum.
External error control: In external error control, error detecting codes are appended
after encryption.
10. What is meant by meet in the middle attack?
This is the cryptanalytic attack that attempts to find the value in each of the range and
domain of the composition of two functions such that the forward mapping of one
through the first function is the same as the inverse image of the other through the second
function-quite literally meeting in the middle of the composed function.
44
It is proportional to 2 n/2
MD5
SHA-1
RIPEMD160
Digest length
128 bits
128 bits
160 bits
Basic
unit
processing
of 512 bits
512 bits
512 bits
No of steps
Maximum
message size
Infinity
Primitive
function
logical 4
264 -1 bits
2 64 -1 bits
Additive constants
used
64
Endianess
Little endian
Big endian
Little endian
1.The direct digital signature involves The arbiter plays a sensitive and crucial
45
2.This may be formed by encrypting the Every signed message from a sender x
entire message with the senders private to a receiver y goes first to an arbiter A,
key.
who subjects the message and its
signature to a number of tests to check
its origin and content.
its own unique HMAC. The server compares the two HMACs, and, if they're
equal, the client is trusted and the request is executed. This process is often called
a secret handshake.
19. What is digital signature? (May 15)
A digital signature is a mathematical technique used to validate the authenticity
and integrity of a message, software or digital document.(Digital signatures can
provide the added assurances of evidence to origin, identity and status of an
electronic document, transaction or message, as well as acknowledging informed
consent by the signer.
20. Give Elgamal Digital Signature Scheme. (May 13)
The ElGamal signature scheme is a digital signature scheme which is based on
the difficulty of computing discrete logarithms. It was described by Taher
ElGamal in 1984. The ElGamal signature scheme allows a third-party to confirm
the authenticity of a message sent over an insecure channel.
PART-B
1. Explain the classification of authentication function in detail. (May 11)
the MAC is generated via some algorithm which depends on both the message
and some (public or private) key known only to the sender and receiver
the MAC may be of any length, but more often is some fixed size, requiring the
use of some hash function to condense the message to the required size if this is
not achieved by the authentication scheme
message authentication may also be done using the standard modes of use of a
block cipher
o
can use either CBC or CFB modes and send final block, since this will
depend on all previous bits of the message
Hashing Functions
hashing functions are used to condense an arbitrary length message to a fixed size,
usually for subsequent signature by a digital signature algorithm
should resist birthday attacks (finding any 2 messages with the same
hash value, perhaps by iterating through minor permutations of 2
messages )
it is usually assumed that the hash function is public and not keyed
length should be large enough to resist birthday attacks (64-bits is now regarded
as too small, 128-512 proposed)
Snefru
uses an algorithm H which hashes 512-bits to m-bits, taking the first m output bits
of H as the hash value
H is the last m-bits of the output of E XOR'd with the first m-bits of the
input of E
overview of algorithm
o
after the last block (0 padded to size as needed) the hash value is appended
to a message length value and H computed on this, the resulting value
being the MAC
Snefru has been broken by a birthday attack by Biham and Shamir for 128-bit
hashes, and possibly for 256-bit when 2 to 4 passes are used in E
49
2. Describe MD5 algorithm in detail. Compare its performance with SHA-1. (Dec 13
& May 12)
MD2, MD4 and MD5
MD4 produces a 128-bit hash of the message, using bit operations on 32-bit
operands for fast implementation
some progress at cryptanalyzing MD4 has been made, with a small number of
collisions having been found
MD5 was designed as a strengthened version, using four rounds, a little more
complex than in MD4 .
a little progress at cryptanalyzing MD5 has been made with a small number of
collisions having been found
both MD4 and MD5 are still in use and considered secure in most practical
applications
SHA was designed by NIST & NSA and is the US federal standard for use with
the DSA signature scheme (nb the algorithm is SHA, the standard is SHS)
it produces 160-bit hash values
SHA overview
(67452301,efcdab89,98badcfe,10325476,c3d2e1f0)
SHA is a close relative of MD5, sharing much common design, but each having
differences
SHA has very recently been subject to modification following NIST identification
of some concerns, the exact nature of which is not public
4. AB: EKs[f(N2)]
o
o
o
o
used to securely distribute a new session key for communications between A & B
but is vulnerable to a replay attack if an old session key has been compromised
then message 3 can be resent convincing B that is communicating with A
modifications to address this require:
timestamps (Denning 81)
using an extra nonce (Neuman 93)
6. Explain authentication protocols in detail.
Description
A protocol that is used with either a password or a smart
card for interactive logon. It is also the default method of
network authentication for services.
A protocol that is used when a user attempts to access a
secure Web server.
A protocol that is used when either the client or server
uses a previous version of Windows.
Digest authentication transmits credentials across the
network as an MD5 hash or message digest.
Passport authentication is a user-authentication service
which offers single sign-in service.
54
Message Encryption the cipher text of the entire message serves as its authenticator.
Message Authentication Code (MAC) a public function of the message and a secret
key that produces a fixed length value that serves as the authenticator.
Hash Function a public function that maps a message of any length into a fixedlength hash value, which serves as the authenticator.
8. Explain HMAC
Specified as Internet standard RFC2104
Hash[(K+ XOR ipad)||M)]] where K+ is the key padded out to sizeand opad, ipad
are specified padding constants
overhead is just 3 more hash calculations than the message needs alone any of
MD5, SHA-1, RIPEMD-160 can be used
HMAC Security
know that the security of HMAC relates to that of the underlying hash algorithm
attacking HMAC requires either:
brute force attack on key used
birthday attack (but since keyed would need to observe a very large number of
messages)
choose hash function used based on speed verses security constraints
UNIT- IV
PART-A (2 MARKS)
1.
2.
3.
Reliable
Transparent
Scalable
4.
cannot protect from attacks bypassing it eg sneaker net, utility modems, trusted
organizations, trusted services (eg SSL/SSH)
cannot protect against internal threats eg disgruntled or colluding employees
57
cannot protect against access via WLAN if improperly secured against external
use
Cannot protect against malware imported via laptop, PDA, storage infected outside
15. What is an intruder?
An Intruder is a person who attempts to gain unauthorized access to a system, to damage
that system, or to disturb data on that system. In summary, this person attempts to
violate Security by interfering with system Availability, data Integrity or data
Confidentiality.
16. What is IDS?
An intrusion detection system (IDS) is a device or software application that monitors
network or system activities for malicious activities or policy violations and produces
electronic reports to a management station.
17. What are the types of IDS?
Network Based IDS
Host Based IDS
Intrusion detection and prevention systems (IDPS)
18. Define virus
A computer virus is a malware that, when executed, replicates by reproducing itself or
infecting other programs by modifying them.[1] Infecting computer programs can include
as well, data files, or the boot sector of the hard drive. When this replication succeeds,
the affected areas are then said to be "infected".
19. Differentiate virus, worm and Trojan horse
VIRUS
WORM
TROJAN Horse
A computer
virus is
a malware that,
when
executed, replicates by
reproducing
itself
or
infecting
other programs by
modifying them
It
uses
a computer
network to
spread
iself.Unlike
a computer
virus, it does not need to
attach itself to an existing
program. Worms almost
always cause at least some
harm to the network, even
if
only
by
consuming bandwidth
58
environmental shortcomings
o encryption alg, network protocol, byte order, ticket lifetime, authentication
forwarding, interrealm auth
and
technical deficiencies
o double encryption, non-std mode of use, session keys, password attacks
o specified as Internet standard RFC 1510
61
Honeypots
o decoy systems to lure attackers
62
eg disgruntled employee
o cannot protect against transfer of all virus infected programs or files
63
o simplest of components
o foundation of any firewall system
o examine each IP packet (no context) and permit or deny according to rules o hence restrict
access to services (ports)
o
checks each packet validly belongs to one o better able to detect bogus packets out of context
64
A firewall is a term used for a ``barrier'' between a network of machines and users that operate
under a common security policy and generally trust each other, and the outside world. In recent
years, firewalls have become enormously popular on the Internet. In large part, this is due to the
fact that most existing operating systems have essentially no security, and were designed under
the assumption that machines and users would trust each other.
There are two basic reasons for using a firewall at present: to save money in concentrating your
security on a small number of components, and to simplify the architecture of a system by
restricting access only to machines that trust each other. Firewalls are often regarded as some as
an irritation because they are often regarded as an impediment to accessing resources. This is
not a fundamental flaw of firewalls, but rather is the result of failing to keep up with demands
to improve the firewall.
There is a fairly large group of determined and capable individuals around the world who take
pleasure in breaking into systems. Other than the sense of insecurity that it has instilled in
society, the amount of actual damage that has been caused is relatively slight. It highlights the
fact that essentially any system can be compromised if an adversary is determined enough. It is
a tried and true method to improve security within DOD projects to have a ``black hat''
organization that attempts to break into systems rather than have them found by your real
adversaries. By bringing the vulnerabilities of systems to the forefront, the Internet hackers
have essentially provided this service, and an impetus to improve existing systems. It is
probably a stretch to say that we should thank them, but I believe that it is better to raise these
issues early rather than later when our society will be almost 100% dependent on information
systems.
6. Explain types of firewalls.
Types of Firewalls The firewalls can be broadly categorized into the following three types:
Packet Filters
Application-level Gateways
Circuit-level Gateways
Packet Filters: Packet filtering router applies a set of rules to each incoming IP packet and
then forwards or discards it. Packet filter is typically set up as a list of rules based on matches
of fields in the IP or TCP header. An example table of telnet filters rules .The packet filter
operates with positive filter rules. It is necessary to specify what should be permitted, and
everything that is explicitly not permitted is automatically forbidden .A table of packet filter
rules for telnet application.
Application-level Gateway: Application level gateway, also called a Proxy Server acts as a
relay of application level traffic. Users contact gateways using an application and the request is
successful after authentication. The application gateway is service specific such as FTP,
TELNET, SMTP or HTTP.
Circuit Level Gateway: Circuit-level gateway can be a standalone or a specialized system. It
does not allow end-to-end TCP connection; the gateway sets up two TCP connections. Once the
65
TCP connections are established, the gateway relays TCP segments from one connection to the
other without examining the contents. The security function determines which connections will
be allowed and which are to be disallowed.
7. Explain types of secure system.
Types of Secure Computing Systems
Dedicated (Single-Level) Systems
o handles subjects and objects with same classification
o relies on other security procedures (eg physical)
System-High
o only provides need-to-know protection between users
o entire system operates at highest classification level
o all users must be cleared for that level of information
Compartmented
o varaition of System-High which can process two or more types of compartmented
information
o not all users are cleared for all compartments, but all must be cleared to the highest level of
information processed
Multi-Level Systems
o is validated for handling subjects and objects with different rights and levels of security
simultaneously
o major features of such systems include:
user identification and authentication
resource access control and object labeling
audit trails of all security relevant events
external validation of the systems security
8. Explain active firewall elements.
The structure of an active firewall element, which is integrated in the communication interface
between the insecure public network and the private network To provide necessary security
services, following components are required:
Integration Module: It integrates the active firewall element into the communication system
with the help of device drivers. In case of packet filters, the integration is above the Network
Access Layer, where as it are above the Transport layer ports in case of Application Gateway.
Analysis Module: Based on the capabilities of the firewall, the communication data is analyses
in the Analysis Module. The results of the analysis are passed on to the Decision Module.
Decision Module: The Decision Module evaluates and compares the results of the analysis
with the security policy definitions stored in the Rule set and the communication data is
allowed or prevented based the outcome of the comparison.
66
Processing module for Security related Events: Based on rule set, configuration settings and
the message received from the decision module, it writes on the logbook and generates alarm
message to the Security Management System.
Authentication Module: This module is responsible for the identification and authentication of
the instances that are communicated through the firewall system.
Rule set: It contains all the information necessary to make a decision for or against the
transmission of communication data through the Firewall and it also defines the security related
events to be logged.
Logbook: All security-related events that occur during operation are recorded in the logbook
based on the existing rule set. Security Management System: It provides an interface where the
administrator enters and maintains the rule set. It also analyses the data entered in the logbook.
UNIT V
PART A (Two marks)
1. Define Public-Key Infrastructure.
Public-key infrastructure (PKI) as the set of hardware, software, people, policies, and procedures
needed to create, manage, store, distribute, and revoke digital certificates based on asymmetric
cryptography.
2. Define PGP. (Dec 14)
Pretty Good Privacy is an open-source freely available software package for e-mail security. It
provides authentication through the use of digital signature; confidentiality through the use of
symmetric block encryption; compression using the ZIP algorithm; e-mail compatibility using
the radix-64 encoding scheme; and segmentation and reassembly to accommodate long e-mails.
3. Define S/MIME (May 15)
Secure/Multipurpose Internet Mail Extension is an Internet standard approach to e-mail security
that incorporates the same functionality as PGP.
4. Write short notes on IP Security.
IPsec provides the capability to secure communications across a LAN, across private and public
WANs, and across the Internet.
5. Write short notes on Web Security
Secure socket layer (SSL) provides security services between TCP and applications that use TCP.
The Internet standard version is called transport layer service (TLS).
6. Write short notes on Secure Electronic Transaction.
67
Secure Electronic Transaction (SET) is an open encryption and security specification designed to
protect credit card transactions on the Internet.
7. What are the features of SET?
Confidentiality of information
Integrity of data
Cardholder account authentication
Merchant authentication
8. Write short notes on Transport Layer Security (TLS)? (Dec 11)
Transport Layer Security is defined as a Proposed Internet Standard in RFC 2246. RFC 2246 is
very similar to SSLv3. The TLS Record Format is the same as that of the SSL Record Format,
and the fields in the header have the same meanings. The one difference is in version number.
9. What are the function areas of IP security?
Authentication
Confidentiality
Key management.
10. Differentiate Transport and Tunnel mode in IPsec?
Transport mode
1. Provide the protection for upper layer
protocol between two hosts.
Tunnel Mode
1. Provide the protection for entire IP Packet.
Figure shows the interrelationship among the key elements of the PKIX model.
69
initialized with the public key and other assured information of the trusted CA(s),
to be used in validating certificate paths.
Certification: This is the process in which a CA issues a certificate for a user's
public key, and returns that certificate to the user's client system and/or posts that
certificate in a repository.
Key pair recovery: Key pairs can be used to support digital signature creation
and verification, encryption and decryption, or both. When a key pair is used for
encryption/decryption, it is important to provide a mechanism to recover the
necessary decryption keys when normal access to the keying material is no longer
possible, otherwise it will not be possible to recover the encrypted data. Loss of
access to the decryption key can result from forgotten passwords/PINs, corrupted
disk drives, damage to hardware tokens, and so on. Key pair recovery allows end
entities to restore their encryption/decryption key pair from an authorized key
backup facility (typically, the CA that issued the End Entity's certificate).
Key pair update: All key pairs need to be updated regularly (i.e., replaced with a
new key pair) and new certificates issued. Update is required when the certificate
lifetime expires and as a result of certificate revocation.
Revocation request: An authorized person advises a CA of an abnormal situation
requiring certificate revocation. Reasons for revocation include private key
compromise, change in affiliation, and name change.
Cross certification: Two CAs exchange information used in establishing a crosscertificate. A cross-certificate is a certificate issued by one CA to another CA that
contains a CA signature key used for issuing certificates.
PKIX Management Protocols
The PKIX working group has defines two alternative management protocols between
PKIX entities that support the management functions listed in the preceding subsection.
RFC 2510 defines the certificate management protocols (CMP). Within CMP, each of the
management functions is explicitly identified by specific protocol exchanges. CMP is
designed to be a flexible protocol able to accommodate a variety of technical,
operational, and business models.
2. Write briefly about the e-mail security-PGP (Pretty Good Privacy). (May 15)
PGP is an open-source freely available software package for e-mail security. It provides
authentication through the use of digital signature; confidentiality through the use of symmetric
block encryption; compression using the ZIP algorithm; e-mail compatibility using the radix-64
encoding scheme; and segmentation and reassembly to accommodate long e-mails.
There are five important services in PGP
Authentication (Sign/Verify)
Confidentiality (Encryption/Decryption)
Compression
71
Sender:
1.
2.
3.
4.
Receiver:
1.
2.
3.
Email compatibility
Segmentation and Reassembly
The last three are transparent to the user PGP: Authentication steps
Creates a message
Hashes it to 160-bits using SHA1
Encrypts the hash code using her private key, forming a signature
Attaches the signature to message
Decrypts attached signature using senders public key and recovers hash code
Re-computes hash code using message and compares with the received hash code
If they match, accepts the message
PGP: Confidentiality
Sender:
1. Generates message and a random number (session key) only for this message
2. Encrypts message with the session key using AES, 3DES, IDEA or CAST-128
3. Encrypts session key itself with recipients public key using RSA
4. Attaches it to message
Receiver:
1. Recovers session key by decrypting using his private key
2. Decrypts message using the session key.
72
PGP Compression
PGP Segmentation/Reassembly:
PGP uses key rings to identify the key pairs that a user owns or trusts
Private-key ring contains public/private key pairs of keys he owns
Public-key ring contains public keys of others he trusts
75
AH in transport mode authenticates the entire inner IP packet and selected fields of the
outer IP header
usually used between security gateways (routers, firewalls)
SET participants
Cardholder: is an authorized holder of a payment card (e.g., MasterCard, Visa) that has
been issued by an issuer through internet.
Merchant: is a person or organization that has goods or services to sell to the cardholder.
Issuer: is a financial institution, such as a bank, that provides the cardholder with the
payment card.
Acquirer: is a financial institution that establishes an account with a merchant and
processes payment card authorizations and payments.
Payment gateway: is a function operated by the acquirer or a designated third party that
processes merchant payment messages.
Certification authority (CA): is an entity that is trusted to issue X.509v3 public-key
certificates for cardholders, merchants, and payment gateways.
76
Cardholder registration
Merchant registration
Purchase request
Payment authorization
Payment capture
Certificate inquiry and status
Purchase inquiry
Authorization reversal
Capture reversal
Credit
77
Credit reversal
Payment gateway certificate request
Batch administration
Error message
Purchase Request
Message from customer to merchant containing OI(Order Information) for merchant and
PI(payment Information) for bank.
Consists of 4 messages
o Initiate Request
o Initiate Response
o Purchase Request
o Purchase Response
5. Explain in detail about Secure Socket Layer and Transport Layer Security.
SSL Architecture
SSL is designed to make use of TCP to provide a reliable end-to-end secure
service.
Sequence numbers
2. Session: An SSL session is an association between a client and a server.
Session identifier
Peer certificate
Compression method
Cipher spec
Master secret.
Is resemble.
SSL Record Protocol
i)Confidentiality
ii)Message Integrity
79
Content types
Change Cipher Specification Protocol
o This protocol consists of a single message which consists of a single byte with the
value 1.
o This is used to cause the pending state to be copied into the current state
Alert protocol
o The Alert Protocol is used to convey SSL-related alerts to the peer entity.
o Alerts that are fatal
Handshake protocol
o Is used for server and client to authenticate each other and protect data sent in SSL
record.
o Type (1 byte): Indicates one of 10 messages. lists the defined message types.(table)
o Length (3 bytes): The length of the message in bytes.
o Content (0 bytes): The parameters associated with this message. (table)
Application Data protocol
o Contains Opaque content
TLS(Transport Layer Security)
o TLS is defined as proposed internet standard in RFC 2246 and record format is
similar to SSL record format with different in version no.
6. Write brief notes on malicious software.
Viruses and Other Malicious Content
computer viruses have got a lot of publicity
one of a family of malicious softwareeffects usually obvious
have figured in news reports, fiction
movies (often exaggerated) getting more attention than deserve
are a concern though.
80
Trapdoors
secret entry point into a program
allows those who know access bypassing
usual security procedures have been commonly used by developers
a threat when left in production programs
allowing exploited by attackers very hard to block in O/S
system vulnerabilities
subsequently used for further attacks, esp DoS major issue is lack of security of
permanently
connected systems,
Worm Operation
worm phases like those of viruses:
Dormant
81
propagation
search for other systems to infect
establish connection to target remote system
replicate self onto remote system triggering
Execution
Morris Worm
Best known classic worm
released by Robert Morris in 1988
targeted Unix systems using several propagation techniques
simple password cracking of local pw file
exploit bug in finger daemon exploit debug trapdoor in sendmail daemon
if any attack succeeds then replicated self
Recent Worm Attacks
new spate of attacks from mid-2001
Code Red
Exploited bug in MS IIS to penetrate
spread probes random IPs for systems running IIS
had trigger time for denial-of-service attack 2 nd wave infected 360000 servers in 14
hours
Code Red 2
had backdoor installed to allow remote control
Nimda used multiple infection mechanisms email, shares, web client, IIS, Code
Red 2 backdoor
8. Explain countermeasures of viruses .
Virus Countermeasures
Viral attacks exploit lack of integrity
control on systems to defend need to add such controls
typically by one or more of:
prevention - block virus infection mechanism
detection - of viruses in infected system reaction - restoring system to clean state
Anti-Virus Software
First-generation
scanner uses virus signature to identify virus or change in length of programs
Second-generation
uses heuristic rules to spot viral infection or uses program checksums to spot changes
Third-generation
memory-resident programs identify virus by actions
Fourth-generation packages with a variety of antivirus techniques eg scanning &
activity traps, access-controls
Advanced Anti-Virus Techniques
generic decryption
use CPU simulator to check program signature
82
Authentication
confidentiality
key management
applicable to use over LANs, across public & private WANs, & for the Internet
Benefits of IPSec
in a firewall/router provides strong
Security to all traffic crossing the perimeter is resistant to bypass
is below transport layer, hence transparent
to applications can be transparent to end users
can provide security for individual users if desired
IP Security Architecture
specification is quite complex
defined in numerous RFCs
incl. RFC 2401/2402/2406/2408 many others, grouped by category
mandatory in IPv6, optional in IPv4
IPSec Services
Access control
Connectionless integrity
Data origin authentication
Rejection of replayed packets
a form of partial sequence integrity Confidentiality (encryption)
Limited traffic flow confidentiality
84
85
86
87
IT6702
LTPC 3003
DATA WAREHOUSING
Data warehousing Components Building a Data warehouse - Mapping the Data Warehouse to
a Multiprocessor Architecture DBMS Schemas for Decision Support Data Extraction,
Cleanup, and Transformation Tools Metadata.
UNIT II
BUSINESS ANALYSIS
Reporting and Query tools and Applications Tool Categories The Need for Applications
Cognos Impromptu Online Analytical Processing (OLAP) Need Multidimensional Data
Model OLAP Guidelines Multidimensional versus Multi relational OLAP Categories of
Tools OLAP Tools and the Internet.
UNIT III
DATA MINING
Mining Frequent Patterns, Associations and Correlations Mining Methods Mining various
Kinds of Association Rules Correlation Analysis Constraint Based Association Mining
Classification and Prediction - Basic Concepts - Decision Tree Induction - Bayesian
Classification Rule Based Classification Classification by Back propagation Support Vector
Machines Associative Classification Lazy Learners Other Classification Methods
Prediction.
UNIT V
TOTAL: 45
PERIODS OUTCOMES:
After completing this course, the student will be able to:
Apply data mining techniques and methods to large data sets.
Use data mining tools.
Compare and contrast the various classifiers.
TEXT BOOKS:
1. Alex Berson and Stephen J.Smith, Data Warehousing, Data Mining and OLAP, Tata
McGraw Hill Edition, Thirteenth Reprint 2008.
2. Jiawei Han and Micheline Kamber, Data Mining Concepts and Techniques, Third Edition,
Elsevier, 2012.
REFERENCES:
1. Pang-Ning Tan, Michael Steinbach and Vipin Kumar, Introduction to Data Mining, Person
Education, 2007.
2. K.P. Soman, Shyam Diwakar and V. Aja, Insight into Data Mining Theory and Practice,
Eastern Economy Edition, Prentice Hall of India, 2006.
3. G. K. Gupta, Introduction to Data Mining with Case Studies, Eastern Economy Edition,
Prentice Hall of India, 2006.
4. Daniel T.Larose, Data Mining Methods and Models, Wiley-Interscience, 2006.
89
Unit-I
Part-A
1. Define data warehouse. [Dec 2013][May 2012]
Data warehousing is the process of constructing and using a data warehouse. A data
warehouse is constructed by integrating data from multiple heterogeneous sources that support
analytical reporting, structured and/or ad hoc queries, and decision making. Data warehousing
involves data cleaning, data integration, and data consolidations.
2. List the Components of Data ware House.
i.
ii.
iii.
iv.
v.
vi.
3. How is a data warehouse different from a database? How are they similar?[May 2012]
Data warehouse
Online Analytical Processing
Data analysis and decision making
Database
Online Transaction Processing
Day-to-day operations purchasing, Payroll etc
90
Up-to-date driven
Query driven
4. Define Metadata
Metadata is simply defined as data about data. The data that are used to represent other data is
known as metadata
5. Define Data partitioning?
Partitioning is done to enhance performance and facilitate easy management of data.
Partitioning also helps in balancing the various requirements of the system. It optimizes the
hardware performance and simplifies the management of data warehouse by partitioning each
fact table into multiple separate partitions.
6. What are the five types of access tools
i.
ii.
iii.
iv.
v.
Data Mart
collects information about subjects that span the focuses on selected subjects
entire organization
scope is enterprise-wide
scope is department-wide
Example:
fact constellation schema
Example:
star or snowflake schema
92
Part-B
1. Describe the data warehouse Architecture? [Nov-Dec-2014]
A data warehouses adopts three-tier architecture. Following are the three tiers of the data
warehouse architecture.
Bottom Tier - The bottom tier of the architecture is the data warehouse database server. It is the
relational database system. We use the back end tools and utilities to feed data into the bottom
tier.
Middle Tier - In the middle tier the OLAP Server that can be implemented by Relational OLAP
(ROLAP), this is an extended relational database management system. The ROLAP maps the
operations on multidimensional data to standard relational operations or by Multidimensional
OLAP (MOLAP) model, this directly implements the multidimensional data and operations.
Top-Tier - This tier is the front-end client layer. This layer holds the query tools and reporting
tools, analysis tools and data mining tools.
93
Business factors: Business users want to make decision quickly and correctly
using all available data.
Technological factors:
To address the incompatibility of operational data stores.
IT infrastructure is changing rapidly. Its capacity is increasing and cost is
decreasing sothat building a data warehouse is easy.
There are several things to be considered while building a successful data warehouse
They are:i) Top - Down Approach (Suggested by Bill Inmon)
In the top down approach suggested by Bill Inmon, we build a centralized repository to
house corporate wide business data. This repository is called Enterprise Data Warehouse
(EDW). The data in the EDW is stored in a normalized form in order to avoid redundancy.
The central repository for corporate wide data helps us maintain one version of truth of the
data. The data in the EDW is stored at the most detail level. The reason to build the EDW
on the most detail level is to leverage
1. Flexibility to be used by multiple departments.
2. Flexibility to cater for future requirements.
ii) Bottom Up ApproachThe bottom up approach suggested by Ralph Kimball is an incremental approach
to build a data warehouse. Here we build the data marts separately at different points
of time as and when the specific subject area requirements are clear. The data marts
are integrated or combined together to form a data warehouse. Separate data marts are
combined through the use of conformed dimensions and conformed facts. A
conformed dimension and a conformed fact is one that can be shared across data
marts.
5. Mention the factors to be used to build successful data warehouse.
Data extraction, clean up, transformation and migration
Data warehouse several have the following selection criteria that affect the ability to
transform, consolidate, integrate and repair the data should be considered:
Timeliness of data delivery to the warehouse.
i. The tool must have the ability to identify the particular data and that can be read
byconversion tool.
ii. The tool must support flat files, indexed files since corporate data is still in this
type.
iii. The tool must have the capability to merge data from multiple data stores.
iv. The tool should have specification interface to indicate the data to be extracted.
v.The tool should have the ability to read data from data dictionary.
vi. The code generated by the tool should be completely maintainable.
vii. The tool should permit the user to extract the required data.
viii. The tool must have the facility to perform data type and character set translation.
95
ix. The tool must have the capability to create summarization, aggregation and
derivation of records.
x. The data warehouse database system must be able to perform loading data directly
fromthese tools
6. Explain the concept of mapping the data warehouse architecture to Multiprocessor
architecture. [Nov-Dec-2014].
The functions of data warehouse are based on the relational data base technology. The
relational data base technology is implemented in parallel manner. There are two
advantages of having parallel relational data base technology for data warehouse:
Linear Speed up: refers the ability to increase the number of processor to reduce
response time.
Linear Scale up: refers the ability to provide same performance on the same requests as
the database size increases.
Types of parallelism
i) Inter query Parallelism:
In which different server threads or processes handle multiple requests at
the same time.
ii)Intra query Parallelism:
This form of parallelism decomposes the serial SQL query into the lower
level operations such as scan, join, sort etc. Then these lower level operations are
executed concurrently in parallel.
Intra query parallelism can be done in either of two ways:
Horizontal parallelism:
The data base is partitioned across multiple disks and parallel processing occurs
within a specific task that is performed concurrently on different processors against
different set of data.
Vertical parallelism:
This occurs among different tasks. All query components such as scan, join, sort
etc are executed in parallel in a pipelined fashion. In other words, an output from one task
becomes an input into another task.
7. Brief the types of meta data [Nov-Dec 2013,2014]
Meta data It is data about data. It is used for maintaining, managing and using the data
warehouse. It is classified into two:
a. Technical Meta data:
i.
It contains information about data warehouse data used by warehouse designer,
administrator to carry out development and management tasks. It includes, Info
about data stores
96
ii.
iii.
iv.
v.
vi.
iv.
v.
There is a fact table at the center. It contains the keys to each of four dimensions.
The fact table also contains the attributes, namely dollars sold and units sold.
Snowflake Schema
Some dimension tables in the Snowflake schema are normalized. The normalization splits up the
data into additional tables.
Unlike Star schema, the dimensions table in a snowflake schema are normalized. For example,
the item dimension table in star schema is normalized and split into two dimension tables,
namely item and supplier table.
98
i.
ii.
Now the item dimension table contains the attributes item_key, item_name, type, brand,
and supplier-key.
The supplier key is linked to the supplier dimension table. The supplier dimension table
contains the attributes supplier_key and supplier_type.
i.
ii.
iii.
iv.
1.
Custom templates
Exception reporting
Interactive reporting
Frames
4 . List the categories of OLAP tools. [May 2011][Dec 2013]
Web OLAP
5. What are the OLAP guidelines? [Dec 2013]
Multidimensional conceptual view.
Transparency
Accessibility
Consistent reporting performance
Client/server architecture
6. Define Data cube.[June 2013]
A data cube refers is a three-dimensional (3D) (or higher) range of values that are generally
used to explain the time sequence of an image's data. It is a data abstraction to evaluate
aggregated data from a variety of viewpoints.
7. Name some OLAP tools. [Dec 2013]
ArborsEssbase, Oracle Express, Planning Sciences Gentia, Kenan Technologies
Acumate ES.
8. What is the need of tools for applications?
100
Easy-to-use
Point-and-click tools accept SQL or generate SQL statements to query relational
data stored in the warehouse.
Tools can format the retrieved data into easy-to-read reports
9. What are production reporting tools? Give examples[June 2013]
Third Generation Language COBOL, Fourth generation language Information Builder,
Focus, Client server tools MITIs SQR.
10. What is multidimensional database?[Dec 2011]
A multidimensional database (MDB) is a type of database that is optimized for data
warehouse and online analytical processing (OLAP) applications. Multidimensional
databases are frequently created using input from existing relational databases
11. Define OLTP systems.
The major task of online operational database system is to perform online transaction and
query processing. These systems are called On Line Transaction Processing (OLTP) systems.
They cover most of the day-to-day operations of an organization such as purchasing, inventory,
manufacturing and banking.
12. List the commercial tools used in data warehouse development?
Infomartica.
Cognos
Business objects
Data Storage
RapidMiner
Weka
R Language
13. State the components of Data Integrator.
Graphical Designer
Administrator
14. Define ETL.
ETL is short for extract, transform, and load, three database functions that are
combined into one tool to pull data out of one database and place it into another database.
15. Define data transformation. [May 2011]
Data transformation from one format to another on the basis of possible differences
between the source and the target platforms.
Ex: calculating age from the date of birth, replacing a possible numeric gender code with a more
meaningful male and female.
16. Write the categories of query Tools.
101
It is used in data mining applications to find the hidden trends and pattern in data.
Cognos query
It facilitates data navigator, speed up the process of adhoc queries.
Cognos powerplay
It is used for carrying out the multidimensional online analysis of data.
3. Multidimensional versus Multi-relational
OLAP-OLAP Tools
The key role in multidimensional modelling is played by the concept of hierarchies while
implementing the OLAP.
Types of OLAP servers
ROLAP (Relational OLAP)
There are mainly three types of OLAP servers. ROLAP,MOLAP AND HOLAP. Let us Discuss
these types of OLAP Server
ROLAP (Relational OLAP)
The preferred technology when the database size is large,i.e greater than 100 GB. Here ,the data
will not be in summarized from. Its response time is poor, minutes to hours, depending on the
query type shows the architecture of ROLAP. As the name implies ,ROLAP systems are based
on relational data model . There are ROLAP clients and a database server which is based on
RDBMS. The OLAP server sends the request to the database server. The multidimensional cubes
are generated dynamically as per the requirement sent by the user. It supports mapping between
relational model and business dimensions.
MLOAP (Mutlitdimensional OLAP)
The system consists of OLAP Client that provides the front-end GUI for giving queries and
obtaining the reports. OLAP server is known as multidimensional DBMS. This is a proprietary
DBMS which stores the multidimensional data in multidimensional cubes and contains the data
in summarized form, based on the type of reports required.
A machine that carries out the data staging, which converts the data from RDBMS format to
MDBMS and sends the Multidimensional cube data to OLAP server.
HOLAP (Hybrid OLAP)
Hybrid OLAP is an amalgamation of ROLAP and MOLAP. It tries to accommodate the
advantages of both models. ROLAP has good database structure and simple queries can be
handled efficiently. On the other hand, MOLAP can handle complex aggregate queries faster.
However, MOLAP is computationally costlier. So, we can have a midway, In HOLAP, relational
database structure is preserved to handle simple and user-required queries. Instead of computing
all the cubes and storing them in MOLAP server, HOLAP server stores only some important and
partially computed cubes or aggregates so that when we require higher scalability and faster
computation, the required aggregates can be computed efficiently. Thus, HOLAP possesses
advantages of both ROLAP and HOLAP.
4. Write the difference between OLTP vs. OLAP (Nov/Dec-2014)
104
We can divide IT systems into transactional (OLTP) and analytical (OLAP). In general we can
assume that OLTP systems provide source data to data warehouses.
Particulars
OLTP System
Online Transaction Processing
(Operational System)
OLAP System
Online Analytical Processing
(Data Warehouse)
105
Backup and
Recovery
ROLAP
The mainly comprehensive premises in computing have been the internet and data warehousing
thus the integration of these two giant technologies is a necessity. The advantages of using the
Web for access are inevitable. The advantages are:
1. The internet provides connectivity between countries acting as a free resource.
2. The web eases administrative tasks of managing scattered locations.
3. The Web allows users to store and manage data and applications on servers that can be
managed, maintained and updated centrally.
106
These reasons indicate the importance of the Web in data storage and manipulation.
6. List the guidelines for OLAP
107
The primary key in the decision making process is the amount of data collected and how well
this data is interpreted. Nowadays, Managers arent satisfied by getting direct answers to their
direct questions, Instead due to the market growth and increase of clients their questions became
more complicated.
8. Discuss the concept of Multidimensional data Model [Apr-May-2015]
The multidimensional data model is an integral part of On-Line Analytical Processing, or
OLAP. Because OLAP is on-line, it must provide answers quickly; analysts pose iterative queries
during interactive sessions, not in batch jobs that run overnight. And because OLAP is also
analytic, the queries are complex. The multidimensional data model is designed to solve complex
queries in real time.
Multidimensional data model is to view it as a cube. The cable at the left contains detailed
sales data by product, market and time. The cube on the right associates sales number (unit sold)
with dimensions-product type, market and time with the unit variables organized as cell in an
array.
This cube can be expended to include another array-price-which can be associates with all or
only some dimensions. As number of dimensions increases number of cubes cell increase
exponentially.
Dimensions are hierarchical in nature i.e. time dimension may contain hierarchies for years,
quarters, months, weak and day. GEOGRAPHY may contain country, state, city etc.
9. List the Operations in Multidimensional Data Model.
Aggregation (roll-up)
dimension reduction: e.g., total sales by city
summarization over aggregate hierarchy:
e.g., total sales by city and year -> total sales by region and by year
Selection (slice) defines a subcube
e.g., sales where city = Palo Alto and date = 1/15/96
Navigation to detailed data (drill-down)
e.g., (sales - expense) by city, top 3% of cities by average income
Visualization Operations (e.g., Pivot or dice).
InfoBurst - a web based server tool that allows reports to be refreshed, scheduled and distributed.
It can be used to distribute reports and data to users or servers in various formats (e.g. Text,
Excel, PDF, HTML, etc.). For more information, see the documentation below:
o InfoBurst Usage Notes (PDF)
o InfoBurst User Guide (PDF).
Data Warehouse List Upload - a web based tool, that allows lists of data to be uploaded into the
data warehouse for use as input to queries.
o Data Warehouse List Upload Instructions (PDF) WSU has negotiated a contract with Business
Objects for purchasing these tools at a discount.
Selecting your Query Tool:
a. The query tools discussed in the next several slides represent the most commonly used query
tools at Penn State.
b. A Data Warehouse user is free to select any query tool, and is not limited to the ones
mentioned.
c. What is a Query Tool?
d. A query tool is a software package that allows you to connect to the data warehouse from your
PC and develop queries and reports
Unit-III
Part-A
1.
2.
3.
4.
What are the types of data?[Nov 2014]
i) Qualitative data
ii) Quantitative data
5.
List the data attributes types.
i) Nominal
ii) Ordinal
iii) Interval
109
iv) Ratio.
6.
8.
9.
10.
11.
Define data integration.
Data integration combines data from multiple sources into a coherent data store. These sources
may include multiple databases, data cubes or flat files.
12.
What are the data preprocessing techniques?
Data preprocessing techniques are
i) Data cleaning-removes noise and correct inconsistencies in the data.
ii) Data integration-merges data from multiple sources into a coherent data store such as data
warehouse or a data cube.
iii) Data transformations-such as normalization improve the accuracy and efficiency of mining
algorithms involving distance measurements.
iv) Data reduction-reduces the data size by aggregating, eliminating redundant features, or
clustering.
13.
i)
ii)
iii)
iv)
v)
14.
What kind of data can be mined?
Kinds of data are Database data, data warehouses, transactional data and other kinds of data like
time related data, data streams, spatial data, engineering design data, multimedia data and
web data.
15.
Give the various data smoothing techniques.
i) Binning
ii) clustering
iii) regression
16. List the attributes of data.
- Nominal
- Ordinal
- Interval
- Ratio.
17. Give an example for Nominal type attribute.
Nominal divided into 2 parts.
Simple (eg: Professor, AP, Lecturer)
Binary(eg: values between 0 and 1).
18. Give an example for Numeric type attribute.
Internal ( Eg: Temp in degree/Celsius)
Ratio ( Eg: Years of experience , Age).
19. State No-coupling Architecture.
The data mining system which does not work with any aspect of the database or data
warehouse system.
20. State Loose coupling architecture.
The data mining system which collaborates with the database or data warehouse.
Part-B
1. Explain the architecture of a typical data mining system.
The architecture of a typical data mining system may have the following major components
111
i.
ii.
iii.
iv.
v.
vi.
112
iv. Data mining (an essential process where intelligent methods are applied in order to
extract data patterns).
v. Pattern evaluation (to identify the truly interesting patterns representing knowledge
based on some interestingness measures;)
vi. Knowledge presentation (where visualization and knowledge representation
techniques are used to present the mined knowledge to the user).
evolution analysis. For instance, if studying the buying habits of customers in Canada, you
may choose to mine associations between customer profiles and the items that these
customers like to buy
3. Background knowledge: Users can specify background knowledge, or knowledge about
the domain to be mined. This knowledge is useful for guiding the knowledge discovery
process, and for evaluating the patterns found. There are several kinds of background
knowledge.
4. Interestingness measures: These functions are used to separate uninteresting patterns from
knowledge. They may be used to guide the mining process, or after discovery, to evaluate the
discovered patterns. Different kinds of knowledge may have different interestingness
measures.
5. Presentation and visualization of discovered patterns: This refers to the form in which
discovered patterns are to be displayed. Users can choose from different forms for knowledge
presentation, such as rules, tables, charts, graphs, decision trees, and cubes.
8. Brief the major issues in data mining. [Apr-May-2015]
1. Mining methodology and user-interaction issues. These recent the kinds of knowledge mined,
the ability to mine knowledge at multiple granularities, the use of domain knowledge, ad-hoc
mining, and knowledge visualization.
2. Mining different kinds of knowledge in databases. Since different users can be interested in
different kinds of knowledge, data mining should cover a wide spectrum of data analysis and
knowledge discovery tasks, including data characterization, discrimination, association,
classification, clustering, trend and deviation analysis, and similarity analysis. These tasks
may use the same database in different ways and require the development of numerous data
mining techniques.
3. Interactive mining of knowledge at multiple levels of abstraction. Since it is difficult to know
exactly what can be discovered within a database, the data mining process should be
interactive. For databases containing a huge amount of data, appropriate sampling technique
can first be applied to facilitate interactive data exploration. Interactive mining allows users to
focus the search for patterns, providing and refining data mining requests based on returned
results. Specifically, knowledge should be mined by drilling-down, rolling-up, and pivoting
through the data space and knowledge space interactively, similar to what OLAP can do on
data cubes. In this way, the user can interact with the data mining system to view data and
discovered patterns at multiple granularities and from different angles.
9. Explain the performance issues in data mining.[Apr-may-2011,2015]
Performance issues
These include efficiency, scalability, and parallelization of data mining algorithms.
Efficiency and scalability of data mining algorithms. To effectively extract information
from a huge amount of data in databases, data mining algorithms must be efficient and
scalable. That is, the running time of a data mining algorithm must be predictable and
acceptable in large databases. Algorithms with exponential or even medium-order
116
(ii).z-score normalization (or zero-mean normalization), the values for an attribute A are
normalized based on the mean and standard deviation of A. A value v of A is normalized
to v0 by computing where mean A and stand dev A are the mean and standard deviation,
respectively, of attribute A. This method of normalization is useful when the actual
minimum and maximum of attribute A are unknown, or when there are outliers which
dominate the min-max normalization.
(iii). Normalization by decimal scaling normalizes by moving the decimal point of values
of attribute A. The number of decimal points moved depends on the maximum absolute
value of A. A value v of A is normalized to v0by computing where j is the smallest
integer such that
Unit-IV
Part-A
1. Define Association Rule Mining.
Association rule mining searches for interesting relationships among items in a given data
set.
117
118
Post pruning removes branches from a Fully grown tree. A tree node is pruned by removing its
branches. Eg: Cost Complexity Algorithm.
11. What is the concept of prediction?
Prediction can be viewed as the construction and use of a model to assess the class of an
unlabeled sample or to assess the value or value ranges of an attribute that a given sample is
likely to have.
12. What is the purpose of Apriori Algorithm?
Apriori algorithm is an influential algorithm for mining frequent item sets for Boolean
association rules. The name of the algorithm is based on the fact that the algorithm uses prior
knowledge of frequent item set properties.
13. State Support Vector Machine.
It refers to an algorithm in which the training data in transformed into a higher dimension by
using a non linear mapping.
14. What are the things suffering the performance of Apriori candidate generation technique.
Need to generate a huge number of candidate sets
Need to repeatedly scan the scan the database and check a large set of candidates by
pattern matching
15.What is CHIAD?
CHIAD= Chi-Square Automatic interaction Detection. It refers to an approach that uses
classification to handle nominal attributes, where for each input attribute ai, there is a pair of
values vi, which are the least different from the target attribute.
16. How are association rules mined from large databases?
I step: Find all frequent item sets:
II step: Generate strong association rules from frequent item sets
17. Define anti-monotone property.
If a set cannot pass a test, all of its supersets will fail the same test as well.
18. What are the things suffering the performance of Apriori candidate generation technique.
Need to generate a huge number of candidate sets
Need to repeatedly scan the scan the database and check a large set of candidates by
pattern matching
19. What are multidimensional association rules?
Association rules that involve two or more dimensions or predicates
Inter dimension association rule: Multidimensional association rule with no repeated
predicate or dimension
Hybrid-dimension association rule: Multidimensional association rule with multiple
occurrences of some predicates or dimensions.
119
120
Rule support and confidence are two measures of rule interestingness. They respectively reflect
the usefulness and certainty of discovered rules. A support of 2% for association Rule means that
2% of all the transactions under analysis show that computer and financial management software
are purchased together. A confidence of 60% means that 60% of the customers who purchased a
computer also bought the software. Typically, association rules are considered interesting if they
satisfy both a minimum support threshold and a minimum confidence threshold.
3. Discuss in brief about the Decision tree Induction.
Classification by Decision Tree Induction Decision tree
A flow-chart-like tree structure
Internal node denotes a test on an attribute
Branch represents an outcome of the test
Leaf nodes represent class labels or class distribution.
Decision tree generation consists of two phases
Tree construction
At start, all the training examples are at the root.
Partition examples recursively based on selected attributes
Tree pruning
Identify and remove branches that reflect noise or outliers.
Use of decision tree: Classifying an unknown sample
Test the attribute values of the sample against the decision tree.
Algorithm
Basic algorithm (a greedy algorithm)
Tree is constructed in a top-down recursive divide-and-conquer manner
At start, all the training examples are at the root
Attributes are categorical (if continuous-valued, they are discretized in advance)
Examples are partitioned recursively based on selected attributes
Test attributes are selected on the basis of a heuristic or statistical measure (e.g., information
gain)
Conditions for stopping partitioning
121
5.
122
Find: all rules that correlate the presence of one set of items with that of another set of items
E.g., 98% of people who purchase tires and auto accessories also get automotive services
done
Applications.
Maintenance Agreement (What the store should do to boost Maintenance Agreement sales) *
(What other products should the store stocks up?)
Home Electronics
Attached mailing in direct marketing
Detecting ping-ponging of patients, faulty collisions
7. Short notes on Mining Frequent Patterns. [Nov-Dec 2014]
Mining Frequent Patterns The method that mines the complete set of frequent itemsets with
candidate generation.
Apriori property
All nonempty subsets of a frequent item set most also be frequent.
An item set I does not satisfy the minimum support threshold, min-sup, then I is not
frequent, i.e., support(I) < min-sup
If an item A is added to the item set I then the resulting item set (I U A) can not occur more
frequently than I.
Monotonic functions are functions that move in only one direction.
This property is called anti-monotonic.
If a set cannot pass a test, all its supersets will fail the same test as well.
This property is monotonic in failing the test.
124
3. Convertible Constraint
Suppose all items in patterns are listed in a total order R
A constraint C is convertible anti-monotone iff a pattern S satisfying the constraint
implies that each suffix of S w.r.t. R also satisfies C
A constraint C is convertible monotone iff a pattern S satisfying the constraint implies
that each pattern of which S is a suffix w.r.t. R also satisfies C
10. Illustrate the Constraints in Data Mining.
Rule Constraints in Association Mining
Two kind of rule constraints:
Rule form constraints: meta-rule guided mining.
P(x, y) ^ Q(x, w) takes(x, database systems).
Rule (content) constraint: constraint-based query optimization (Ng, et al., SIGMOD98).
sum(LHS) < 100 ^ min(LHS) > 20 ^ count(LHS) > 3 ^ sum(RHS) > 1000
1-variable vs. 2-variable constraints (Lakshmanan, et al. SIGMOD99):
1-var: A constraint confining only one side (L/R) of the rule, e.g., as shown above.
2-var: A constraint confining both sides (L and R). sum(LHS) < min(RHS) ^ max(RHS) < 5*
sum(LHS)
Constrain-Based Association Query
Database: (1) trans (TID, Itemset ), (2) itemInfo (Item, Type, Price)
A constrained asso. query (CAQ) is in the form of {(S1, S2 )|C },
where C is a set of constraints on S1, S2 including frequency constraint
A classification of (single-variable) constraints:
e.g. Class constraint: S Item A
Domain constraint:
S { , , , , , } e.g. S.Price < 100
S.Type. e.g. snacks or is S, v
{, , , , } V , S , or SV S.Type e.g. {snacks, sodas }
Unit-V
Part-A
1. Define Clustering
Clustering is the task of discovering groups and structures in the data that are in some way
or another "similar", without using known structures in the data.
2. List the types of Data in clustering.
- Data Matrix
- Dissimilarity Matrix.
3. What is CRM?
126
Part-B
1. Explain outlier analysis with an example [nov-dec-2013].
Outlier Analysis What Is Outlier Discovery?
What are outliers?
The set of objects are considerably dissimilar from the remainder of the data Example:
Sports: Michael Jordon, Wayne Gretzky.
Problem
Find top n outlier points
Applications:
Credit card fraud detection
Telecom fraud detection
Customer segmentation
Medical analysis Outlier Discovery: Statistical Approaches
Assume a model underlying distribution that generates data set (e.g. normal distribution).
Use discordance tests depending on
data distribution
distribution parameter (e.g., mean, variance)
number of expected outliers
Drawbacks
most tests are for single attribute
In many cases, data distribution may not be know
129
130
1. Place each sample in its own cluster. Construct the list of inter-cluster distances for all
distinct unordered pairs of samples, and sort this list in ascending order.
2. Step through the sorted list of distances, forming for each distinct threshold value dk a
graph of the samples where pairs samples closer than dk are connected into a new cluster by a
graph edge. If all the samples are members of a connected graph, stop. Otherwise, repeat this
step.
3. The output of the algorithm is a nested hierarchy of graphs, which can be cut at the desired
dissimilarity level forming a partition (clusters) identified by simple connected components in
the corresponding sub graph.
Retail industry.
Telecommunication industry
Biomedical Data Mining and DNA Analysis
DNA sequences: 4 basic building blocks (nucleotides): adenine (A), cytosine (C),
guanine (G), and thymine (T).
Gene: a sequence of hundreds of individual nucleotides arranged in a particular order.
Humans have around 100,000 genes.
Tremendous number of ways that the nucleotides can be ordered and sequenced to form
distinct genes.
Semantic integration of heterogeneous, distributed genome databases
Current: highly distributed, uncontrolled generation and use of a wide variety of DNA
data Data cleaning and data integration methods developed in data mining will help
Data Mining for Financial Data Analysis
Financial data collected in banks and financial institutions are often relatively complete,
reliable, and of high quality.
Design and construction of data warehouses for multidimensional data analysis and data
mining
View the debt and revenue changes by month, by region, by sector, and by other
factors.
Access statistical information such as max, min, total, average, trend, etc.
Loan payment prediction/consumer credit policy analysis
Feature selection and attribute relevance ranking
Loan payment performance.
Consumer credit rating
Data Mining for Retail Industry
Retail industry: huge amounts of data on sales, customer shopping history, etc.
Applications of retail data mining
Identify customer buying behaviours
Discover customer shopping patterns and trends
Improve the quality of customer service
Achieve better customer retention and satisfaction
Enhance goods consumption ratios
Design more effective goods transportation and distribution policies
Data Mining for Telecommunication Industry
A rapidly expanding and highly competitive industry and a great demand for data
mining.
Understand the business involved
Identify telecommunication patterns
Catch fraudulent activities
Make better use of resources
Improve the quality of service
132
134
Attempt to optimize the fit between the data and some mathematical model
Statistical and AI approach
Conceptual clustering
A form of clustering in machine learning
Produces a classification scheme for a set of unlabeled objects.
Finds characteristic description for each concept (class)
COBWEB (Fisher87)
A popular a simple method of incremental conceptual learning
Creates a hierarchical clustering in the form of a classification tree
Each node refers to a concept and contains a probabilistic description of that concept.
Neural network approaches
Represent each cluster as an exemplar, acting as a prototype of the cluster
New objects are distributed to the cluster whose exemplar is the most similar according
to some distance measure.
Competitive Learning
Involves a hierarchical architecture of several units (neurons)
Neurons compete in a winner-takes-all fashion for the object currently being
presented.
137
11.
PARTB-(5x 16=80marks)
(a) (i), Explain the mapping of data warehouse to multiprocessor architecture. (10)
(ii) Discuss about data warehouse meta data. (6)
Or
b) With a neat diagram describe the various stages of building a data warehouse. (16)
12. (a) (i)) Explain the data model which is suitable for data warehouse with an example.' (8) (ii)Write the
difference between multi-dimensional OLAP and multirelational OLAP
Or
b) Explain the.different types of OLAP tools.
13.what is the use of data mining task? what are the basic types of data mining tasks? Explainwith'examples. (16)
Or
(b) Explain various methods of data cleaning in detail.
14.Write and explain the algorithm for minirrg without candidate generation frequent itemsets (8)
(ii) A database has nine transactions let min_sup = 30%
TID
List of items_IDs
1
a,b,e
2
b,d
3
b,c
4
a,b,d
5
a,c
6
b,c
7
a,c
8
a,b,c,e
9
a,b,c
Find all frequent itemsets using the above algorithm
Or
(b) With an example explain various attribution selection measures in classification.
15. (a) (r) Explain the different types of data used in cluster analysis. (10)
(ii) Discuss the use of outlier analysis. (6)
Or
b) (i) Write the difference between CLARA and CLARANS.
138
Part-A
139
140
References
1. Alex Berson and Stephen J.Smith, Data Warehousing, Data Mining and OLAP, Tata
McGraw Hill Edition, Thirteenth Reprint 2008.
2. Jiawei Han and Micheline Kamber, Data Mining Concepts and Techniques, Third
Edition, Elsevier, 2012.
3. http://www.tutorialspoint.com.
4. https://anuradhasrinivas.files.wordpress.com
141
IT6004
SOFTWARE TESTING
L T P C3 0 0 3
UNIT I
INTRODUCTION
9
Testing as an Engineering Activity Testing as a Process Testing axioms Basic definitions
Software Testing Principles The Testers Role in a Software Development Organization
Origins of Defects Cost of defects Defect Classes The Defect Repository and Test Design
Defect Examples Developer/Tester Support of Developing a Defect Repository Defect
Prevention strategies.
UNIT II
TEST CASE DESIGN
9
Test case Design Strategies Using Black Box Approach to Test Case Design Random Testing
Requirements based testing Boundary Value Analysis Equivalence Class Partitioning State
based testing Cause-effect graphing Compatibility testing user documentation testing
domain testing Using White Box Approach to Test design Test Adequacy Criteria static
testing vs. structural testing code functional testing Coverage and Control Flow Graphs
Covering Code Logic Paths code complexity testing Evaluating Test Adequacy Criteria.
UNIT III
LEVELS OF TESTING
9
The need for Levers of Testing Unit Test Unit Test Planning Designing the Unit Tests The
Test Harness Running the Unit tests and Recording results Integration tests Designing
Integration Tests Integration Test Planning Scenario testing Defect bash elimination System
Testing Acceptance testing Performance testing Regression Testing Internationalization
testing Adhoc testing Alpha, Beta Tests Testing OO systems Usability and Accessibility
testing Configuration testing Compatibility testing Testing the documentation Website
testing.
UNIT IV
TEST MANAGEMENT
9
People and organizational issues in testing Organization structures for testing teams testing
services Test Planning Test Plan Components Test Plan Attachments Locating Test Items
test management test process Reporting Test Results The role of three groups in Test
Planning and Policy Development Introducing the test specialist Skills needed by a test
specialist Building a Testing Group.
UNIT V
TEST AUTOMATION
9
Software test automation skill needed for automation scope of automation design and
architecture for automation requirements for a test tool challenges in automation Test
metrics and measurements project, progress and productivity metrics.
TOTAL: 45 PERIODS
TEXT BOOKS: 1. Srinivasan Desikan and Gopalaswamy Ramesh, Software Testing
Principles and Practices, Pearson Education, 2006. 2. Ron Patton, Software Testing, Second
Edition, Sams Publishing, Pearson Education, 2007.
REFERENCES:
1. Ilene Burnstein, Practical Software Testing, Springer International Edition, 2003.
2. Edward Kit, Software Testing in the Real World Improving the Process, Pearson
Education, 1995.
3. Boris Beizer, Software Testing Techniques 2 nd Edition, Van Nostrand Reinhold, New
York, 1990.
4. Aditya P. Mathur, Foundations of Software Testing _ Fundamental Algorithms and
Techniques, Dorling Kindersley (India) Pvt. Ltd., Pearson Education, 2008
142
ALPHA COLLEGE OF
ENGINEERING
Thirumazhisai, Chennai 600124
LESSON PLAN
Faculty Name
Prema.S
Designation
Assistant professor
Subject Name
Software Testing
Code
IT6001
Year
IV
Semester
07
B.Tech/IT
AIM:
To understand the concepts of software testing technologies and to Expose the criteria for test cases, Learn
the design of test cases , Be familiar with test management and test automation techniques , to Be exposed to test
metrics and measurements.
TEXT BOOKS:
1. Srinivasan Desikan and Gopalaswamy Ramesh, Software Testing Principles and Practices, Pearson
Education, 2006.
2. Ron Patton, Software Testing, Second Edition, Sams Publishing, Pearson Education, 2007.
REFERENCES:
1. Ilene Burnstein, Practical Software Testing, Springer International Edition, 2003.
2. Edward Kit, Software Testing in the Real World Improving the Process, Pearson Education, 1995.
3. Boris Beizer, Software Testing Techniques 2 nd Edition, Van Nostrand Reinhold, New York, 1990.
4. Aditya P. Mathur, Foundations of Software Testing _ Fundamental Algorithms and Techniques, Dorling
Kindersley (India) Pvt. Ltd., Pearson Education, 2008
Sl.
No.
1
2
3
4
5
No. of Periods
Required
Topics
UNIT I INTRODUCTION
Introduction to software testing
Testing as an Engineering Activity
Testing as a Process Testing axioms
Basic definitions
The Testers Role in a Software Development
Organization
143
1
1
1
1
2
Text /Ref.
Book
T1
T1
T1
T1
TI
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Cause-effect
graphing
Partitioning
Compatibility testing user documentation
Domain
testing testing
Using White Box Approach to Test design
Code
functionalCriteria
testing static
Coverage
andvs.
Test Adequacy
testing
Covering
CodeGraphs
Logic Paths
Control Flow
Code complexity testing Evaluating Test
Test
Adequacy Criteria.
UNIT III LEVELS OF TESTING
Introduction to levels of testing
The need for Levers of Testing Unit Test
The
thethe
Unit
tests
and
UnitTest
TestHarness
Planning Running
Designing
Unit
Tests
Integration
tests
Designing
Integration
Tests
Recording results
Scenario
testing
Defect bash elimination
Integration
Test Planning
Acceptance
testing Performance testing
System Testing
Regression Testing Internationalization
Alpha,
Tests
Testing OO systems
testing Beta
Adhoc
testing
Usability and Accessibility testing
Compatibility
Configuration testing Testing the
Test
documentation Website testing.
2
2
1
T1
T1
1
1
1
1
1
1
1
1
1
1
1
1
T1
T1
T1
T1
T1
T1
T1
T1
T1
T1
T1
T2
1
1
1
2
1
2
1
1
1
1
T2
T2
T2
T2
T2
T2
T2
T2
T2
32
33
T2
T2
34
35
36
1
1
1
T2
T2
T2
37
38
39
40
41
1
1
1
1
1
T2
T2
T2
T2
T2
144
42
Test
T2
43
44
45
46
47
48
49
50
51
1
1
1
1
1
1
1
1
1
52
Test
T2
T2
T2
T2
T2
T2
T2
T2
T2
T2
UNIT: 1 (INTRODUCTION)
1) Define Software Engineering.
Software Engineering is a discipline that produces error free software with in a
time and budget.
2) Define software Testing.
Testing can be described as a process used for revealing defects in software, and
for establishing that the software has attained a specified degree of quality with respect to
selected attributes.
3) List the elements of the engineering disciplines.
Basic principles
Processes
Standards
Measurements
Tools
Methods
Best practices
Code of ethics
Body of knowledge
4) Differentiate between verification and validation?(Nov/Dec 2009/2012)[May/June2006]
Verification
Validation
Failure
146
functions
its required
specified performance
requirements.
functions within
specification.
14) Define Test Cases. [Nov/Dec-2009]
A test case in a practical sense is attest related item which contains the following
information.
A set of test inputs. These are data items received from an external
source by the code under test. The external source can be hardware,
software, or human.
Execution conditions. These are conditions required for running the test,
for example, a certain state of a database, or a configuration of a
hardware device.
Expected outputs. These are the specified results to be produced by the
code under test.
15)Write short notes on Test, Test Set, and Test Suite.
A Test is a group of related test cases, or a group of related test cases and test
procedure.
A group of related test is sometimes referred to as a test set.
A group of related tests that are associated with a database, and are usually run together,
is sometimes referred to as a Test Suite.
16) Define Test Oracle.
Test Oracle is a document, or a piece of software that allows tester to determine whether a
test has been passed or failed.
17) Define Test Bed.
A test bed is an environment that contains all the hardware and software needed to test a
software component or a software system.
18) Define Software Quality.
Quality relates to the degree to which a system, system component, or process meets
specified requirements.
Quality relates to the degree to which a system, system component, or process meets
Customer or user needs, or expectations.
19) List the Quality Attributes.
147
Correctness
Reliability
Usability
Integrity
Portability
Maintainability
Interoperability
Education
Communication
Oversight
Transcription
Process
process
Reveal defects
And to evaluate
quality attributes
PART B
process of
Locating the fault or defect
Repairing the code, and
Retesting the code.
UNIT 1 (INTRODUCTION)
Test Strategy
Testers View
Inputs
Black box
White box
Test Strategy
Black box
White box
Knowledge Sources
Methods
1. Requirements
document
2. Specifications
3. Domain Knowledge
4. Defect analysis data
1. High level design
2. Detailed design
3. Control flow graphs
4. Cyclomatic complexity
6. Define State.
A state is an internal configuration of a system or component. It is defined in terms of the
values assumed at a particular time for the variables that characterize the system or component.
7. Define Finite-State machine.
A finite-state machine is an abstract machine that can be represented by a state graph
having a finite number of states and a finite number of transitions between states.
8. Define Error Guessing.
151
12. What are the factors affecting less than 100% degree of coverage?
13. What are the basic primes for all structured program.
Iteration
False
152
True
False
True
COTS
Certification
6. Types of white box testing[Nov/Dec-2009]
Coverage and control flow graph
Three basic primes
Sequential
Condition
Iteration
Coverage code logic
Figure: Code sample with branch and loop.
Figure: A control flow graph representation for the code.
Table: A test case for the code ,that satisfies the decision coverage criterion.
Table: Test cases for simple decision coverage
Table: Test cases for condition coverage
Table: Test cases for decision condition coverage.
Path Testing
Path
cyoclomatic complexity formula.
7. Additional white box test design approaches. .[Nov/Dec-2012]
Dataflow and white box test design
Variable.
Figure: sample code with data flow information
Loop Testing
Mutation Testing
The component programmer hypothesis
The copying effect
8. Evaluating Test adequacy Criteria
Axioms Set of assumptions
Applicability Property
Non exhaustive applicability property
Monotonicity Property
Inadequate Empty set
Antientensionality Property
General multiple change Property
Anti decomposition Property
Renaming Property
Complexity Property
Statement Coverage Property
UNIT: 3 (LEVELS OF TESTING )
1. List the levels of Testing or Phases of testing.
Unit Test
Integration Test
System Test
Acceptance Test
155
perform. We test for compliance of the requirement at the system level with the
functional based system test.
Quality Requirement: They are nonfunctional in nature but describe quality
levels expected for the software.
20. Define stress Testing.
When a system is tested with a load that causes it to allocate its resources in
maximum amounts .It is important because it can reveal defects in real-time and
other types of systems.
21. Define Breaking the System.
The goal of stress test is to try to break the system; Find the circumstances under
which it will crash. This is sometimes called breaking the system.
22. What are the steps for top down integration?
Main control module is used as a test driver and stubs are substituted for all
components directly subordinate to the main module.
Depending on integration approach (Depth or breadth first) subordinate stubs are
replaced one at a time with actual components.
Tests are conducted as each component is integrated.
The completion of each set of tests another stub is replaced with real component
Regression testing may be conducted to ensure that new errors have not been
introduced.
23. What is meant by regression testing?
Regression testing is used to check for defects propagated to other modules by
changes made to existing program. Thus, regression testing is used to reduce the side
effects of the changes.
PART-B
UNIT- 3 (LEVELS OF TESTING)
1. Need for levels testing.
Unit Test
Integration Test
System Test
Acceptance Test
Fig: Levels of Testing
Alpha And Beta Test
2. Levels of testing and software development paradigm
Fig: Levels of testing
Two Approaches
Bottom_Up
Top_Down
Two types of Language
Procedure Oriented
Object Oriented
3. Unit Test
Functions, procedures, classes and methods as units
Fig: Some components suitable for unit test
Unit Test: Need for preparation
158
Planning
Both black box and White box
Reviewer
Several Tasks
4. Unit Test Planning
Phase I: Describe unit test approach and Risks
Phase II: Identify unit features to be tested
Phase III: Add levels of detail to the planning
5. The class as testable unit
Issue1: Adequately Testing classes
Issue2: Observation of object states and state changes.
Issue3: The retesting of classes-I
Issue4: The retesting of classes-II
Fig: Sample stack class with multiple methods
Fig: Sample Shape class
6. Test harness
The auxiliary code developed into support testing of units and components is
called a test harness. The harness consists of drivers that call the target code
and stubs that represent modules it calls.
Fig: The test Harness
Diver
Stub
7. Integration Test [Nov/Dec-2009]
Goals
Integration strategies for procedures and functions
Top down
Bottom up
Fig: Simple Structure chart for integration test example
Integration strategies for classes
Fig: An generic class cluster
8. System test: Different Types
Functional testing
Performance testing
Stress testing
Configuration testing
Security testing
Recovery testing
The other types of system Testing are,
Reliability & Usability testing.
Fig: Types of System Tests
Fig: Example of special resources needed for a performance test
UNIT 4(TEST MANAGEMENT)
1) Write the different types of goals?
159
i.
Business goal: To increase market share 10% in the next 2 years in the area of
ii.
iii.
iv.
financial software
Technical Goal: To reduce defects by 2% per year over the next 3 years.
Business/technical Goal: To reduce hotline calls by 5% over the next 2 years
Political Goal: To increase the number of women and minorities in high
management positions by 15% in the next 3 years.
Review plan:
Inspections and
walkthroughs
160
Unit test
plan
Integration
test plan
System
test plan
Acceptanc
e test plan
think creatively.
Technical Skills
General software engineering principles and practices, understanding
of testing principles and practices, ability to plan, design, and execute
18).Define Plan.
1.
2.
3.
4.
5.
PART-B
UNIT 4 (TEST MANAGEMENT)
Testing and Debugging goals and Policy
Debugging goal
Debugging policy
Testing Policy: Organization X
Debugging policy: Organization X
Test planning
Planning
Milestone
Overall test objectives
What to test (Scope of the tests)
Who will test?
How to test?
When to test?
When to stop Testing?
Test Plan Components.[Nov/Dec-2012]
Test plan identifier
Introduction
Items to be tested
Features to be tested
Approach
Pass/fail criteria
Suspension and resumption criteria
Test deliverables
Testing tasks
Test environment
Responsibilities
Staffing and training needs
Scheduling
Risks and contingencies
Testing costs
Approvals
Test Plan Attachments
Test design specifications
Test case specifications
Test procedure specifications
Reporting Test Results
Test log
Test log identifier
163
Description
Activity and event entities
Test incident report
Test incident report identifier
Summary
Impact
Test summary report
6. The role of the 3 critical groups [Nov/Dec-2009]
1. Managers
Task forces, policies, standards
Planning
Resource allocation
Support for education and training
Interact with users
2. Developers/ testers
Apply black and white box methods
Assist with test planning
Test at all levels
Train and mentor
Participate in task forces
Interact with users
3. Users/clients
Specify requirements clearly
Participate in usability test
UNIT: 5 (TEST AUTOMATION)
1. Define Project monitoring or tracking.[Nov/Dec-2012]
Project monitoring refers to the activities and tasks managers engage into periodically
check the status of each project .Reports are prepared that compare the actual work done to
the work that was planned.
2. Define Project Controlling. [Nov/Dec-2012]
It consists of developing and applying a set of corrective actions to get a project on
track when monitoring shows a deviation from what was planned .
3. Define Milestone.
MileStones are tangible events that are expected to occur at a certain time in the
projects life time.Mnagers use them to determine project status.
4. Define SCM (Software Configuration management).[Nov/Dec-2012]
164
165
3.
4.
Change control
Configuration status reporting
Configuration audits
Types of reviews
Inspections as a type of technical review
Inspection process
Initiation
Preparation
Inspection meeting
Reporting results
Rework and follow up
Walkthroughs as a type of technical review
Components of review plans
Review goals
Preconditions and items to be reviewed
Roles, participants, team size, and time requirements
Review procedures
Review training
Review checklists
Requirements reviews
Design reviews
Code reviews
Test plan reviews
167
Maximum:100marks
Answer all the questions
Part A-(10*2=20marks)
Or
b.i.Explain the challenges and issues faced in testing services organization. Also write how we
can eliminate challenges.
ii.How can we build a test group?
15.a.i.Test metrics. ii.Testing tools.
Or
b.i.Write about software configuration management.
ii.Write the cmoponents of review plans.
MODEL QUESTION PAPER
B.E./B.Tech., DEGREE EXAMINATIONS
INFORMATION TECHNOLOGY
SEVENTH SEMISTER
IT2032 - SOFTWARE TESTING
(REGULATION 2008)
Time:3 hours
Maximum:100marks
(or)
13. (b)Develop a use case to describe a user purchase of a laptop with credit card from online
vendor using web based software. With use case, design a set of tests you would use during
system test (general).
14. (a)Why is testing plan important for developing a repeatable and managed testing process.
Give example.
(or)
14. (b)What role do user/client play in the development of test plan for a project? Should they be
present at any of the test plan reviews. Justify your answer.
15. (a)If you are developing a patient record system for health care centre, why of the stop test
will be most appropriate for this system.
(or)
15. (b)What is the role of the tester in supporting, monitoring and controlling of testing?
REFERENCES:
WEBSITE:
1. www.tutorialspoint.com/software_testing/software_testing_quick_guide.htm
2. https://www.vidyarthiplus.com/vp/thread-9727.html
3. pass-in-annauniversityexams.blogspot.com/.../anna-university-IT2032SoftwareTesting
4. www.rejinpaul.com
5. www.vidyarthiplus.in/2013/05/it2032-software-testing-important.html
BOOKS:
1. Ilene Burnstein, Practical Software Testing, Springer International Edition, 2003.
2. Edward Kit, Software Testing in the Real World Improving the Process, Pearson
Education, 1995.
3. Boris Beizer, Software Testing Techniques 2 nd Edition, Van Nostrand Reinhold, New
York, 1990.
4. Aditya P. Mathur, Foundations of Software Testing _ Fundamental Algorithms and
Techniques, Dorling Kindersley (India) Pvt. Ltd., Pearson Education, 2008
5.Elfriede Dustin, Effective Software Testing, First Edition, Pearson Education, 2003.
6.Renu Rajani, Pradeep Oak, Software Testing Effective Methods, Tools and Techniques,
Tata McGraw Hill, 2004.
171