Submitted To:
Submitted By:
Mohit Khandelwal
Abhishek Pal
Project In-charge
Ujjwal Anand
Akshat Patel
CS & IT Department
IET Alwar
Candidates Declaration
We hereby declare that the work, which is being presented in this report, entitled IET
SEARCH in partial fulfillment for the award of Degree of Bachelor of Technology in
department of Computer Science and Engineering , Institute of Engineering and Technology
affiliated to, Rajasthan Technical University is a record of my own investigations carried under
the Guidance of Mr. Vinit Bhargava and Mrs. Anupma Mathur, Department of Computer
Science and Engineering , IET Alwar.
We have not submitted the matter presented in this report any where for the award of any other
Degree.
Abhishek Pal (12EIACS702)
Ujjwal Anand (12EIACS110)
Akshat Patel (12EIACS703)
Computer Science and Engineering,
Counter Signed by
Mr. Vinit Bhargava
Mrs. Anupma Mathur
Preface
The aim of this project is to develop a simple web-based search engine
that demonstrates the main features of a search engine (web crawling,
indexing and ranking) and the interaction between them.
Acknowledgement
We would like to thank our advisors Mr. Vinit Bhargava and Mrs.
Anupma Mathur for providing their valuable time, constant guidance
and support throughout this project. We appreciate and thank our project
in-charge Mr. Mohit Khandelwal and Mr. Nitin Sharma for their time
and suggestions. We would also like to thank our H.O.D. Mr. Rohit
Singhal and our friends for their moral support during the project.
Table of Contents
Candidates Declaration ........................................................................................................... i
Table of Contents .......................................................................................................................... ii
1. Introduction ..............................................................................................................................1
1.1
Background Study............................................................................................................1
1.2
Project Scope ...................................................................................................................1
2. Overall Description ..................................................................................................................1
2.1
Product Perspective ..........................................................................................................1
2.2
Product Features...............................................................................................................2
2.3
User Classes and Characteristics .....................................................................................2
2.4
Operating Environment ....................................................................................................2
2.5
Design and Implementation Constraints ..........................................................................2
2.6
Assumptions and Dependencies ......................................................................................2
3. External Interface Requirements ...........................................................................................3
3.1
User Interfaces .................................................................................................................3
3.2
Hardware Interfaces .........................................................................................................3
3.3
Software Interfaces ..........................................................................................................3
3.4
Communications Interfaces .............................................................................................3
4. Other Nonfunctional Requirements .......................................................................................3
4.1
Performance Requirements ..............................................................................................3
4.2
Safety Requirements ........................................................................................................4
4.3
Security Requirements .....................................................................................................4
4.4
Software Quality Attributes .............................................................................................4
5. Design Specifications ...............................................................................................................4
5.1
Assumptions.....................................................................................................................4
5.2
Constraints .......................................................................................................................4
5.3
Design Methodology........................................................................................................4
5.4
Risk and Volatile areas ..................................................................................................10
6. Architecture ............................................................................................................................10
7. Database Schema ...................................................................................................................10
7.1
Tables, Fields and Relationships....................................................................................10
7.1.1
Databases .............................................................................................................. 10
7.1.2
New Tables ........................................................................................................... 10
8. Cost Estimation..11
Appendix A: Glossary..................................................................................................................13
Appendix B: References ..............................................................................................................13
1. Introduction
1.1 Background Study
In the summer of 1993, no search engine existed for the web, though numerous specialized
catalogues were maintained by hand. Oscar Nierstrasz at the University of Geneva wrote a series of
Perl scripts that periodically mirrored these pages and rewrote them into a standard format. This
formed the basis for W3Catalog, the web's first primitive search engine, released on September 2,
1993. The web's second search engine Aliweb appeared in November 1993. One of the first "all
text" crawler-based search engines was WebCrawler, which came out in 1994. Google adopted the
idea of selling search terms in 1998, from a small search engine company named goto.com. Around
2000, Google's search engine rose to prominence. The company achieved better results for many
searches with an innovation called PageRank. By 2000, Yahoo! was providing search services
based on Inktomi's search engine. Yahoo! switched to Google's search engine until 2004, when it
launched its own search engine based on the combined technologies of its acquisitions. Microsoft's
rebranded search engine, Bing, was launched on June 1, 2009. By the passing of time the use of
search engine is increasing. As increased use of search engine for searching information, a system
has been developed that helps users to search information. When a person wants to search anything
he simply places his words in search engine. Then search engine returns him relevant information
according to his/her words based on many more criteria. But user has to extract their necessary
information after doing much analysis as search engines cant give the exact information manually.
Search engines use many criteria such as SEO (Search Engine Optimization), searching and
returning information but we choose primarily only the words that are given for searching.
1.2 Project Scope
With a 2 months time constraint we students have looked into the analysis of the search engine and
its design and implementation (integration of modules too). College Information based search
engine is the specific area on which will be dealing in the next 7 months prior to the implementation
details. For gaining an insight into how the existing search engine works, a comparatative study of
various features the several engines offer have been made. A survey of the existing search engines
has also been conducted in order to understand the in-addition expectations from current search
engine. The planning stage and requirement gathering stage is a base work for further analysis and
design. Hence planning and requirement gathering stage has also been allotted a time period of 2
months.
2. Overall Description
2.1 Product Perspective
IET Search Engine is a standalone system.It provides modules for crawling, indexing, sorting and
searching web pages.
Page | 1
Page | 2
Optimal Requirements
2400 MHz
RAM
1 GB
128 MB
Page | 4
5.2 Constraints
This product is a web based application hence a major constraint on the performance will be
due to the bandwidth of the servers web connection. A faster bandwidth will result in faster
crawling of web pages.
5.3 Design Methodology
Modular Design
The whole system is divided into two parts i.e. the user and the admin section. That is why, the
modular design of the system is also divided into two modular diagrams.
Actor
Use-case
Page | 5
Page | 6
Relational Diagram
Page | 7
Relational Diagram
Activity Diagram
Activity Diagram
Page | 8
Page | 9
Page | 10
6. Architecture
7. Database Schema
7.1 Tables, Fields and Relationships
Database Name : iet_data
Table Name : admin_detail, user_info, missing_keyword, keywords
7.1.1 Databases
Database Name : iet_data
7.1.2 New Tables
Table Name
Field Names
admin_detail
user_id
user_name
user_password
last_login
id_keyword
url
title
user_info
Data
Type
int
varchar
varchar
varchar
int
varchar
varchar
Allow
Nulls
Not null
Not null
Field Description
matches
discription
missing_keyword id_search
keyword
date_of_search
keywords
id_keyword
User_id
keyword
date_of_creation
date_of_lastupdate
varchar
varchar
int
varchar
varchar
int
varchar
varchar
varchar
varchar
Not null
Page | 11
Not null
8. Cost Estimation
We have made use of the COCOMO model to estimate the cost of this project:The Constructive Cost Model(COCOMO) is an algorithmic software cost estimation model
developed by Barry W. Boehm. The model uses a basic regression formula with parameters that are
derived from historical project data and current as well as future project characteristics.
COCOMO applies to three classes of software projects:
Organic projects - "small" teams with "good" experience working with "less than rigid"
requirements.
Semi-detached projects - "medium" teams with mixed experience working with a mix of
rigid and less than rigid requirements.
where, KLOC is the estimated number of delivered lines (expressed in thousands ) of code for
project. The coefficients ab, bb, cb and db are given in the following table:
Software project
ab
bb
cb
db
Organic
2.4
1.05
2.5
0.38
Semi-detached
3.0
1.12
2.5
0.35
Embedded
3.6
1.20
2.5
0.32
Basic COCOMO is good for quick estimate of software costs. However it does not account for
differences in hardware constraints, personnel quality and experience, use of modern tools and
techniques, and so on.
Our Project are the type Semi-Detached so the values for a, b, c, d are 3, 1.12, 2.5, 0.35and for
Estimating the KLOC First of all we will have to estimate the no. of classes which are as follows:
Main Class
Web Crawling
Indexing
Searching
DBconnection Class
Database Scanning
Searched Code
So these are some of the classes which has to incorporated in the project. So on the basis of this is
near around line code value may be 1500-2000.
The estimated value may exceed in future but for now let us assume it to be in between 1500-2000
i.e. 1750.
So taking 1.750 as KLOC
Effort = 3*(1.750)1.12 = 5.61 =6(approx) person months
Development Time = 2.5*(6)0.35 months = 5(approx) months
And the team members involved in our project are there. So if salary of one developer is
5,000/month then according to 5 months the salary per developer would be 25,000.
So overall cost for the project will be 75,000/-
Page | 12
Appendix A: Glossary
SRS Software Requirement Specification
Web crawler - Generic terms applied to any program which visits websites and systematically
retrieves information from them.
HTML - Article formatted in HTML so as to be readable by a web browser Hypertext Markup
Language.
Database - A collection of electronically stored data or unit records (facts, bibliographic data, texts)
with a common user interface and software for the retrieval and manipulation of data.
Web Host - An intermediary online service which stores items that can be downloaded by the user.
SEO Search Engine Optimization
WWW. World Wide Web
Appendix B: References
[1]
[2]
[3]
http://en.wikipedia.org/wiki/Web_search_engine
http://en.wikipedia.org/wiki/Web_crawler.
http://www.brightplanet.com/2012/11/deep-web-search-engines-vs-web-harvest-enginesfinding-intel-in- a-growing-internet/
[4] Roger S.Pressmans Software Engineering
Page | 13