Anda di halaman 1dari 90

The National University of Lesotho

Department of Mathematics and Computer Science


COMPUTER SCIENCE PROJECT

C S 4 4 0 3
2011/12, ACADEMIC YEAR.
Mathematics and Computer Science Digital Library System

MACSDL
June 25th 2012.
Compiled & submitted by:

1. Mosola, N.N 200800142 2. Koali, M.S 200800572 3. Senatsi, K.V 200800535 Supervisor: Mr. L.Poulo

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM

ABSTRACT: The world is evolving rapidly, while the pace of technology rises exponentially. Living in an information age, free access to information is a high demand, making people share what they have and obtain what they do not possess. Information sharing is common these days, with knowledge being shared amongst individuals, a need to manage such information is necessary. The National University of Lesotho, amongst its faculties is the faculty of Science and Technology which has a few departments. The department of Mathematics and Computer Science (MACS) seeks to have an information management system, where large pools of information can be stored, accessed freely and managed adequately. A web based MACS digital library (DL) system is the answer. MACS DL will manage shared digital information objects for students and lecturers to enhance learning at NUL. The system will bring an evolution to the Information, communication and Technology (ICT) usage within the NUL barracks where both students and their lecturers are in need of information on daily basis. Information retrieval will be at the heart of the system, while a repository of digital objects is kept. Following the prototyping process model and software life cycle, the system will be developed to address this issue. This document discusses all the relevant steps that take place as the system is under development.

ACKNOWLEDGEMENTS
Thanks to the Mathematics and Computer Science department at NUL, the project was indeed an eye opener; lots of great lessons have been picked up from this one and surely are the ones to build for the future. Working together on this project has made us a unit and we hope to work together again, we were a great team, an incredible team indeed and with you, a new Computing era is born. Big and ongoing thanks go to our supervisor and thesis advisor, Mr.Lebeko Poulo, for introducing us to this fascinating subject of Digital Libraries (DLs) and Information Retrieval (IR). Even in this four credit hour course, we learned more about DLs and IR systems than we could have learned in a lifetime in any other field of study. Thank you for giving us a chance to work on such a fascinating and challenging project! Our warm and kind regards go to the student union in the MACS department, this project would not have been possible without them willing to spare minutes with us during the requirements elicitation phase, testing and the evaluation phases of this endeavor.

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM

CHAPTER ONE
Introduction Building a digital library (DL) is inevitably an expensive and resource-intensive. Before embarking on such a project, it is important to consider some basic principles underlying the design, implementation and maintenance of a DL. The principles applied in building this project, hereinafter called MACS DL, do not only apply on this endeavor but are essential to building large digital libraries that we know today, good examples are: The ACM digital library (http://portal.acm.org/dl) , New Zealand Digital Library (http://www.nzdl.org/cgi-bin/library), National Science Digital Library (http://nsdl.org/), to mention but a few. A digital library is a focused collection of digital objects, including text, video, and audio, along with methods for access and retrieval, and for selection, organization, and maintenance of the collection. C.f. Witten, Ian and David Bainbridge (2002), How to Build a Digital Library, Morgan Kaufman, p. 6. Brief background of DLs As the need to avail information and resources for access globally arose, digital library systems (DLS) were born and their importance grew to greater heights over traditional libraries to digitally preserve collections of valuable resources and information on the Web for educational and research purposes. As a result, the basic idea was to create web-based, easily-accessible collection of digital information whose organization and management would be automated to address the inefficiency of traditional libraries. MACS DL is no exception as all the principles used in its development follow the same route. What is MACS DL? MACS DL is an educational portal built for use at the National University of Lesotho (NUL), under the department of Mathematics and Computer Science (MACS) in the faculty of Science and Technology (FOST), to enhance the mode of course delivery and provide facilities to academics in this faculty. MACS DL provides services to the mentioned NUL community such as file sharing, browsing documents, searching textual materials, storing unlimited amount of digital objects on the server for current and future purposes, and information retrieval (IR). Motivation The higher education industry in Lesotho is experiencing an unprecedented growth rate. This trend is largely a result of new enabling technologies that have facilitated the virtual delivery of academic programs. This has in turn led to libraries becoming key success factors in the virtual academic environment. As students at NUL, it has come to our attention that the famous Thomas Mofolo library within the premises of NUL is not adequate and well-equipped enough to provide services to the students, researchers, and N.U.L staff in general. With that in mind, we aim to promote, support,

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM

manage and disseminate high quality research, development and innovation in information, library and related fields. Project aims We aim to encourage and facilitate the development of information strategies in higher education communities such as NUL. The main reason of building a digital library system is to provide unlimited, free and remote access, to information from multiple users around the NUL campus. Problem Statement The National University of Lesotho (NUL) has a vision to be a leading African university. In the faculty of Science and Technology (FOST), the department of Mathematics and Computer Science (MACS) has a vision to facilitate learning and enable both students and lecturers have a better way of managing and conducting their academic work. Currently, MACS does not have an academic portal that manages textual digital objects to enhance learning. Students and lecturers rely on the internet search engines such as Google, Yahoo, etc, for any academic material they need. MACS department requires a more direct digital library that encourages file sharing for easy access of materials used in the MACS department. Proposed solution A well managed digital library system that will serve as a repository of rich information of greatest demand contributed by students and lecturers in the MACS department. Our hope is that this will increase the availability of student research for scholars, empower lecturers and students to conduct researches and advance digital library technology worldwide. MACS DL shall be a repository that archives any textual objects for current and future reference to enhance learning at NUL and provide free access to information. The fundamental reason for building a digital library for MACS department at NUL is belief that it will provide better delivery of information than was not possible in the past. Why a [MACS] DL? Some of the advantages of DLs, though not limited to, are the following: DLs bring the libraries closer to users: Information is more and easily accessible, and increases information usage. This is very much different to what happens when a traditional library, like Thomas Mofolo, is used since users need to physically go to the library. Searching and browsing capabilities: Computer systems are better than manual methods for finding information. DLs offer efficient and advanced search, information retrieval and browsing techniques that enable users to better search for their information need, browse material searched with relative ease.

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM

Information sharing: Placing digital information on a network makes it available to everyone. With a MACS DL maintained on the NUL site, it will vastly be an improvement over expensive, physical duplication of little used material, which is sometimes inaccessible without having to travel to the location where it is stored, like Thomas Mofolo library. Availability of information: MACS DLs doors will never close; usage of MACS DLs collections can be done when library (i.e. Thomas Mofolo library) buildings are closed. Materials are never checked-out, misplaced, or stolen! Project Plan MACS DL system is sub-divided into two major parts, namely: The DL: This is by large, the most important of the two. The DL is a focused collection of digital objects organized and maintained in a proper manner. The DL will contain a pool of electronic versions of books and journals. Search engine This will assist in the information retrieval (IR) and file indexing (FI). The plan is to have a successful implemented DL with an incorporated search engine that enables users of the DL to retrieve information they require. Upon successful completion of these two, the system will be deemed to have met the users requirements, later discussed in this document.

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM

CHAPTER TWO
System Requirements Specification (SRS) System functions and purpose: The system is a managed digital library portal for higher education and research purposes to be used at NUL under FOST, in the MACS department by both students and lecturers. MACS DL manages textual collections. The system allows intended users to upload materials to the server, browse through the collection, sort the collection, search for any material on the server, and download material from the collection. Hardware and Software requirements: 1. Hardware: Computers with a minimum secondary disk space of 20GB and primary memory of 128MB. 2. Software: Apache tomcat web server, MySQL database server and Java Integrated Development Environment (IDE). Performance specification: Using data structures and algorithms designs, each module/function of the system has been optimized as to never burden the processor with prolonged processes. User interface (UI): The system will provide an interactive, easy to learn and use interfaces to interact with its users enabling it to be used effectively and efficiently. The system UI obeys the basics of human computer interaction principles and designs. System data: Any data captured into the system, e.g. users information is stored in relational database schemas with normalized objects to conform to data integrity rules and consistency. The system provides tight information security measures to allow access only to users with credentials to access the system data. System design constraints: Imposed on the system design, MACS DL only manages textual digital objects. The system uses English language only, bearing in mind that the intended users are academics and can actually understand the language.

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM

Requirements Engineering (RE) Requirements engineering establishes a solid base for design and construction of any system. Without it, the resulting software would have a higher probability of not meeting users needs. To build elegant software that actually solves users problems and meets the SRS mentioned earlier, the developers conducted an extensive study around the NUL campus to gather views from the targeted/potential users. The following RE steps were followed: Inception: this is where the scope and nature of the system was defined. 1. Scope: A web based digital library system that manages textual objects. 2. Nature: The system is an academic portal helping students and researchers to share materials and search for any texts on the server. Elicitation: This step helps to define what is actually required. Interviews with NUL students in the MACS department and lecturers were conducted to help developers elicit the users requirements to identify the problem properly and propose elements of the solution. The following diagrams were used to elicit users requirements. Figure 2.1 Use Case scenarios Use Case Number 1 Use Case Name Browsing Use Case description Accessing subsets of data by categorical classification. E.g. browse by author name, alphabetical order, title , by date etc

Searching

Indexing, Information retrieval and querying Adding commentary, generalization and reviews Adding new digital objects to the DLS

Annotate

Upload/Submission

Download

Saving a digital object to a local storage media

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM

Fig 2.2 Use case diagram

ANNOTATE

BROWSE

SEARCH

CLIENT DOWNLOAD SUBMIT

Elaboration: The basic requirements gathered were refined and modified to suite the design of the system under development; as a result an analysis model was produced.

Negotiation: The priorities of the system were clarified. E.g. as one of the priorities, the system must be able to upload documents to the server, enable information retrieval and download material from the server. During this step, different approaches to solving the problem identified were coined a preliminary set of set of solution requirements was negotiated amongst the developers. Specification: From the elaboration and negotiation steps, a detailed specification of the system was developed as enough resources had been gathered.

Validation: In an iterative manner, prototyping as a standalone process model, users of the system were frequently visited to make sure what is being developed conforms to what the users required. Management: Throughout the projects life cycle, changes to the initially gathered requirements were brilliantly managed as the prototyping model allows iteration of the steps performed.

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM

As a result of the above RE steps, prototypes were built as an end product. To translate users needs into technical requirements, the development team used Quality Function Deployment (QFD), emphasizing what is valuable to the users, identifying the following three types of requirements: Types of Requirements identified 1. Normal requirements: These were stated by the users during interviews conducted, and provided developers with an understanding of what should be developed. 2. Expected requirements: These were not explicitly mentioned by the users but were identified by the developers. E.g. ease of searching. 3. Exciting requirements: These were identified by developers, as they are beyond the users expectations.

Requirements gathering Techniques used A number of techniques were used to gather requirements from the target population. The following were of great importance in the requirements gathering phase: Stratified sampling: A small group of students in the MACS department were chosen to represent the entire MACS student union. In the communication and planning phase of the prototyping model followed by the developers, ten (10) students were sampled. Observing users: Sampled students were observed as they carried out their daily activities, using Google and Yahoo as internet search engines to search for material they require on the internet. On the other end of things, sampled students were observed as they used the MACS DL search engine. Interviews: face-to-face interviews with the sampled population of students were conducted. Initially, a pilot study was employed, to make certain that the methods proposed by the developers were viable and that in the long run, the solution would be appreciated. As the developers needed concrete answers and proof for future references that interviews were conducted, a live video recording session of students using the MACS DL system as a prototype was recorded. This video is in the possession of the developers and shall be made available to the supervisor.

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM

CHAPTER THREE
Project Risk Analysis and Management SWOT analysis was used to determine the strengths, weaknesses, opportunities and threats of this project. The following table depicts the outcomes of this extensive risk assessment. Table 3.1 SWOT analysis Strengths 1. Skilled project team members in programming web based IR systems 2. Availability of required resources Threats Existing DLs and search engines, such as Google scholar, 4Shared, etc Not enough metadata Integration with external Web resources can be found in digital search engines, such as objects Google 3. New technology Computer illiterate Search engine Information overload facilitating information end users development sharing A thorough study in assessing the risks related to embarking on a project of this nature was conducted by the developers and the above table shows the results of risk analysis using SWOT analysis. Other software engineering methods of risk assessment, management and mitigation were employed to try and analyze the uncertainties that could put the project under risk. These involve: Identifying technical risks for MACS DL project Identifying technology risks for MACS DL project Identifying staff risks for MACS DL project From the above, the developing team came up with the following risk analysis table: Weaknesses Opportunities Apache Tomcat web New technology server storage

10

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM

A scale of 1 to 5 was used to estimate the impact of the risk on the project, and the following mappings were concluded: 1 = catastrophic, 2 = critical, 3 = marginal, 4 = negligible, 5 = low Table 3.2 Risk Table Risk Management Mitigation Monitoring Supervisor not readily Use groupware systems to This risk was inevitable Fortnightly available on campus contact supervisor (i.e. unavoidable) progress reports must be send to the supervisor Ambiguous project scope Refine project scope Request clear definition Brain-storming of scope sessions Developers not familiar Seek related sources from Quickly learn how to use Project progress with technology used supervisor and specialists the technologies required report Requirements change Elicit requirements, build Iteratively refine the Perform prototypes requirements to track requirements changes engineering Project team member drops Ensure timely and Meet with team members Measure out of the project consistent check-ups on regularly discussing the effectiveness of team members project mitigation. E.g. ensure that every member is doing some work on the project Impact 2

1 3 3

The risks identified were then assessed using the methods described above. In a round-robin fashion, the developers had to assign each impact of the risk a value until an agreement was reached, which is depicted in the tables above. Projects are always under some risk if any event is identified that could dent the projects schedule. The schedule of this project was affected by some of the risks identified above, for example; in the requirements elicitation phase, numerous iterations regarding the requirements identified were a must [RE] do.

11

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM

CHAPTER FOUR
Object Oriented Analysis Design (OOA) 1. Class Responsibility Collaborator (CRC) Modeling CRC modeling provides a simple means of identifying and organizing classes. NB: CRC modeling is not an official part of Unified Modeling Language, but a collection of index cards that represent classes. Using this modeling, developers were able to identify potential classes that they could use as the building blocks of the MACS DL system. Benefits identified Portability: No computers are needed as CRC can be used anywhere, during the brainstorming sessions. Tangible: They allow participants to experience at firsthand how the system will work. Limited size: Index cards can only hold a limited amount of information compared to class diagrams. This enforces a high-level analysis. Fig. 4.1Class Responsibility Collaborator (CRC) Cards Class Name: Searcher Class Type : Internal entity Responsibilities Generates Query Filters Query Locates query Retrieves Results Class Name :Inverted File Class Type : External Entity Responsibilities Collaborators Searcher Insert Data into an Index Maintains Index Collaborators Inverted File Retriever Ranker

12

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM

Class Name : Retriever Class Type : External Entity Responsibilities : Displays ranked results Collaborators : Ranker Searcher

Class Name : Ranker Class Type :External Entity Responsibilities Ranks data Collaborators Retriever Searcher

Class Name : Browser Class Type :External Entity Responsibilities Retrieving data from links Collaborators Searcher Retriever

Class Name : Downloader Class Type : External Entity Responsibilities Copies data to local storage Collaborators Searcher Retriever

13

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM

Class Name : Digital Library Class Type : External Entity Responsibilities Processes a given request Collaborators Searcher Browser Downloader Up-loader

Class Name : Up-loader Class Type : External Entity Responsibilities Copies digital objects from local storage into the collection Collaborators

2. Class Diagrams

14

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM

The system consists of the following classes, depicted in class diagrams.Fig 4.2 Class diagrams

BROWSER -Link : string +BrowseLink() : string

DOWNLOADER -File : string +DownloadFile() : void

DIGITAL_LIBRAY -Request : string +ProcessRequest() : void

UPLOADER -Data : string +CopyFile() : void

SEARCHER -SearchQuery : string -Results : string +GenarateQuery() : string +FilterQuery() : string +LocateQuery() : void +RetrieveResults() : string

RETRIEVER -Data : string +RankResults() : void +DisplayResults() : void Inverted_File -Query : string -Size : int +InsertData() : void +Maintains() : void

Ranker -Data : string +RankData() : void

OOA continued 3. Data flow diagram (DFD) DFDs show how data is captured as input, transformed in the processes and output as results to the users.

15

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM

Fig 4.3 DFD

BROWSE

A request for a digital object

CLIENT

A request for a digital object

SEARCH

A digital object

dig

ita

lo

bje

ct

DATABASE SERVER
Digital object

Query/Results

RETRIEVE

Feedback

Indexed object

INDEX

om

en

SUBMIT ANNOTATE

CLIENT

Co

c me

) t(s

ig nd

ita

b lo

jec

Download Request

DOWNLOAD

CHAPTER FIVE
Data Structures and Algorithms

16

bject Digital o

t(s )

New digiatl object

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM

The following data structures were used in the development of the DL. The developers extensively studied the various data structures to use, and from a long list of candidates, it was thought that the best ones to use were the following: Hash Table: This data structure store all index terms. A hash table location references a posting list node for a specific index term. Why Hash Table? A Hash table data structure provides efficient searching which has been optimized to a time complexity of O (1) to find a posting list for an index term. Posting list (implemented using linked list) This is a linked list data structure in which a node in the list encapsulates term frequency (the number of times a term appears in the document) and document id (document filename). A new node is added in the list every time a document is indexed which contains the term. This operation runs at O (1). Algorithms: Indexing 0. Read an index object from disk. 0.1.Extract entire text from a given document. 0.2. Break the text into tokens /terms. 0.3.Filter the stop-words from the terms. 0.4.Stem each term, applying the stemming process 0.5.For each stemmed version of the term: Begin 0.5.1. if a term does not exist 0.5.1.1 Store the term into the hash table 0.5.1.1.2 Create a corresponding posting list for the term. 0.5.2. Else 0.5.2.1.Add a node in the posting list. End 0.6.Save an index object to disk. Searching and Ranking 1. Read an index object from disk. 1.1.Break the user query into tokens/terms. 1.2. Stem each term, applying the stemming process 1.3.For each stemmed-term: Begin 1.2.1. If a term exists 1.2.2. Retrieve its posting list and compute its weight in relation to query vector Q, and all document vectors (DiDn ) where n is the number of documents in the collection. 1.2.3. Else

17

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM

1.2.3.1 The term weight is zero. End 1.4.For each document Begin 1.4.1. Compute the score (using the ranking formula). End 1.4.2. Sort documents according to their score. 1.4.3. Return sorted documents (The document with highest score is the most relevant document to the given query). System Design and Engineering This section discusses how the system was engineered. In this context, there are numerous steps that were followed, now that the developers were equipped with the requirement from the RE phase, classes to implement from the OOA phase, data structures and algorithms to implement, the developers now had to design and engineer the system to meet the requirements. Indexing Documents Overview: Searching, indexing and ranking techniques are at the core of the implementation of this piece of work. This chapter discusses the searching algorithms efficiency for indexing and ranking documents. Indexing extracts terms from a given document when uploaded to the server, to indicate what the document is all about or summarize its content. This process takes extracted terms and places them in an inverted index/file data structure. Searching pertains to posing a query and awaiting results from the digital library (DL) system. Information retrieval is the process of identifying the most relevant information that satisfies the given search query. The point of using an index is to increase the speed and efficiency of searches of the document collection. Without indexing, searching would have to be sequential, thus increasing the complexity of the algorithms. An inverted index contains two parts: an index of terms generally called the term index, which stores a distinct list of terms found in the document collection and, for each term, a posting list, which is simply a list of documents that contain the term. When submitting documents to the DL system, punctuations are removed, all terms converted to lower case, and stop words are removed. Stop words are those terms with little information content, e.g. conjunctions. This strategy will be discussed in depth later in this document. Suppose there are two documents; D1 and D2 and D1 has the following contents: Mathematics and Computer Science department whilst D2 contains: Department of Social Science. Key terms: Information retrieval (IR), Inverted Index (II), ranking, stop words, stemming, term weight, posting list.

18

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM

Table 5.1 Inverted file structure analogy Term Mathematics Computer Science Department Social Inverted Index architecture
Fig 5.2 Inverted Index architecture.

Document ;Term frequency 1;1 1;1 1;1, 2;1 1;1, 2;1 2;1

Index Builder

Inverted Index

St e to mm ke e ns d

Hash Table

Posting List

Text Extractor

Stop words list

Indexing documents A document is uploaded through an interface to add it to the collection. The index Builder class is instantiated and constructed with the document name. The document is then indexed using the indexDocument method which simply allows the Text Extractor instance to extract text from the document and breaks the text into tokens and also filter the stop words. The stemmer instance stems the tokens. The inverted Index class will then be instantiated to store the stemmed terms into a hash table and a posting list is created for each term. The entire process forms the inverted index.

to k

en

Document

Stemmer

19

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM

Posting List(s) [implemented as linked lists] A posting list indicates, for a given term, which documents contain the term. Typically, a Linked list data structure is used to store the entries in a posting list. This is because in most retrieval operations, a user enters a query and all documents that contain the query are obtained. This is done by hashing on the term in the index and finding the associated posting list. Once the posting list is obtained, a simple scan of the linked list yields all the documents that satisfy the query.

Index Builder The index builder drives the indexing process. The index builder loops through all the document objects and calls the indexDocument method to add each document to the inverted index. Once all the documents have been processed, the writeIndextoDisk method is used to store the invertedIndex object to disk, which is read every time a new document is uploaded to check for duplicates, a programming technique called SERIALIZABLE functions was used to make these functions SERIALIZABLE so that each time the program runs, the inverted index is read from disk hence all data in it will not only be available at runtime but saved to this inverted index file. Applying stemming process c.f. Porters stemming algorithm Stemming simply refers to changing all term forms to canonical versions. For example studying, studies, and studied all map to study. Stemming reduces words by stripping off suffixes, converting them to neutral stems that are devoid of tense, number, and in some languages case and gender information. This relaxes the match between query terms and words in the documents so that, for example, libraries is deemed equivalent to library. Stemming is not appropriate for all queries, particularly those involving names and other very specific words. This process avoids mapping words with different roots to the same term. Porters Stemming algorithm has been used to provide this service to the MACS DL system. Below is a description of Porters stemming algorithm, which can be found on the following URL:http://snowball.tartus.org/text/introduction.html, http://snowball.tartus.org/algorithms/lovins/stemmer.html. Porters stemming algorithm defines five successively applied steps of word transformation. Each step consists of a set of rules in the form <condition> <suffix> <new suffix>. For example, a rule (m > 0) EED EE means if the word has at least one vowel and consonant plus EED ending, change the ending to EE. This would mean words such as agreed become agree, while feed remains unchanged since the condition would not be satisfied hence another production rule would be used.

20

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM

The algorithm is very concise, having just about sixty (60) rules, and very readable for a programmer. It is also very efficient in terms of computation complexity as compared to other affix and/ or statistical, stemming algorithms such as N-gram stemming, Hidden-Markov Model (HMM) algorithm, to mention but a few, although HMM algorithms are beneficial in fields such as machine translation and natural language processing, where numerous languages form the data set. The flaws identified with using classical stemmers like Porters stemming algorithm is that they often conflate words with similar syntax but completely different semantics. For example, news and new are both stemmed to new while they belong to two quite different categories. Dr. Porter, did not only publish the standard implementation of his work written in C and Java programming languages, but also developed a whole stemmers framework called Snowball. This framework provides a stemmer definition script language and a translator to ANSI C and Java. The main purpose was to enable programmers to develop their own stemmers for other character sets or languages. Currently there are implementations for Romance, Germanic, Uralic and Scandinavian languages as well as English, Russian, and Turkish on the websites given. We chose Porters stemming algorithm because of its efficiency in dealing with English related corpus, and it really helped in paving the way for developing MACS DL. Applying Stop words removal Stop words make up a large fraction of the text in most documents. Eliminating such words from consideration speeds processing, saves huge amount of disk space in indexes, and does not damage retrieval effectiveness. A list of words filtered out during automatic indexing because they make poor index terms is called a stop word list or a negative dictionary. These are words such as: a, and, on, in, the, about etc. Here we remove the words such as articles, Prepositions, conjunctions etc. from the documents. The following screen shot depicts an inverted index object after indexing two documents; the output of the indexing module was as follows:

21

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM

22

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM

CHAPTER SIX
Searching, Browsing, Ranking and Information Retrieval (IR) IR aims to retrieve large amounts of data, as fast as possible from different kinds of information stored in more than one form, be it visual, audio or textual. The user can retrieve information through posing a query, where the information retrieval module/function will retrieve all the information that satisfies the query. This is in contrast to what a database system does, where an exact answer is retrieved from a database object that matches a query using a select statement. IR systems do not retrieve a definite answer, but produce ranking of documents that seem to contain information relevant to the query given to the system. This is a process called indexing, which was covered earlier in this document. MACS DL information retrieval mechanism has been engineered to produce only the results that best match the provided query, filtering unwanted results. Methodology Several different types of IR mechanisms exist, but MACS DL system employs a method called Inverted File indexing. This is the most well organized index structure for text query evaluation as the system was developed to be used on textual digital objects. IR systems high level architecture A general scheme in figure 6.1 explains the essential structure of classical IR system. Through the first phase is the preprocessing mechanism, the raw documents of the corpus are processed to tokenized documents and then indexed as a list of postings per terms. At the second phase the user gives a query to represent his "information need". The query is then transformed to a system query and its relevant documents are retrieved from the index. The retrieved documents are ranked according to their relevance to the query and returned to the user through a user interface, later discussed in this document. Figure 6.1: Classical IR system architecture

23

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM

Term Weighting This text retrieval module, like the rest, has been designed based on a comparison of content identifiers attached both to stored texts and to the users information queries. A formal representation of the term vectors is obtained by including in each vector all possible content terms allowed in the system and adding term weight assignments to provide distinction amongst terms. If Wk represents the weight of term k in document D or query Q, and t terms in all are available for content representation, the term vectors for document D and/or query Q can be written as: D = (t0, w0, t1, w1,...., tn, wn) and Q = (q0, w0; q1, w1;. . .; qr,wr). Searching process Searching is the most important part of the DL system. Information is retrieved based on the search process. This technique gives results based on the relevancy of the query provided. Finally, the related documents are then displayed on an output interface as links on a web page. The following screenshot shows the result of searching, after three documents were indexed correctly. Document 1 a document on digital libraries Document 2 - a document on digital libraries and Information retrieval Document 3 a document on distributed databases. Query = Introduction to digital libraries The computations of the term weights, term frequency, in relation to an uploaded document gave the following output:

24

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM

Ranking retrieved documents Ranking uses similarity to select items that can be used in ranking the output triggered by a query. This involves ordering from the most likely items that satisfy the query. It also displays the most likely relevant terms first. To rank a document retrieved by a query similarity between them has to be calculated. The below formula is used to measure similarity between query and item. Ranking is done in two phases, these are: Coarse grain ranking Documents are sorted depending on the frequency of the query tokens. The document that contains all query terms will be ranked first. Fine grain ranking Depends upon weights of terms. In this phase, the similarity function is calculated between document and query. This module sorts the retrieved documents based on their relevance to the query posed, using the following formula:

The following screenshot depicts the result of a query, with the results ranked according to their relevance to the query posted.

25

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM

In ranking, an artificial measure is used to gauge the similarity of each document to the query, and a fixed number of the closest matching documents are returned as answers. Metadata browsing Browsing is often described as the other side of the coin from searching, but really the two are at opposite ends of a spectrum. Searching is purposeful, whereas browsing tends to be casual. Terms such as random, informal, unsystematic, and without design are used to capture the unplanned nature of browsing and, often, the lack of a specific goal. Searching implies that you know what youre looking for, whereas browsing implies that youll know it when you see it. The metadata provided with the documents in a collection can support different browsing activities. Information collections that are entirely devoid of metadata can be searched. This is one of the real strength of full-text searching, but they cannot be browsed in any meaningful way unless some additional data is present. The structure that is implicit in metadata is the key to providing browsing facilities. Here are some examples of browsing: Lists: This is the simplest structure that is simply an ordered list. It can either be alphabetical, in an ascending or descending order. Dates: An automatically generated selector gives a choice of years, months and dates that can be used to browse metadata. Name: Offers users the flexibility of browsing collections using authors names. For example Deitel. Title: Users can browse collections using the titles of the documents in a pool of collections. For example, Advanced Java Programming.

26

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM

CHAPTER SEVEN
User Interface (UI) Design A user interface describes how users of the system interact with it. Human Computer Interaction (HCI) basics and principles have been employed in developing the MACS DL user interfaces, to enable users to have a seamless interaction with the system. Common interface styles that were used are: Menus Forms

Principles of UI design
Consistency: The system is expected to be consistent. MACS DL achieved consistency in

the choice of colors used. The system has consistent interfaces and styles.
Learn-ability: The system should be easy to learn how to use. MACS DL is very easy to

use, providing labels and necessary information to guide users on how to best utilize it. Informative feedback: The system should provide informative feedback to users after an operation was performed. MACS DL adheres to this principle as at each instance, the system provides users with feedback after a query was posed and results displayed. Provide error prevention and handling: The system must have mechanisms to prevent users from committing errors and if any, be able to handle them. MACS DL is no exception as it prevents errors and system crashes. Off-load the short term memory: Reduces the number of steps users have to perform when carrying out an operation. MACS DL was designed to have interfaces with links and proper labels that make users to remember easily. Provide short-cuts for users: The system provides hyperlinks as a form of shortcuts to navigate web pages. System dialogue yielding closure: The system informs its users about its current state at each instance. For example, after posing a query, the system retrieves the results with a message that reads RESULTS MATCHING THE QUERY to yield closure of the IR operation.

27

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM

Provide internal locus of control: The system allows users to be in control of it. Every operation the system performs is triggered by users. For example, documents retrieved are only downloaded when clicked.

Technologies used in the UI development Java server pages and servlets to make the system web based. Java scripts eXtensible Mark-up Language (XML) to allow file formats Hypertext Mark-up Language (HTML) Cascading style sheets to provide presentable documents with minimal effort eXtensible Style sheet Language (XSL) for supporting XML and HTML that are XML compliant.

28

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM

CHAPTER EIGHT
System Testing and Evaluation The system was frequently tested for errors after completing each module. Testing is the process of exercising a program with the specific intent of finding errors prior to delivery to the end user. The system was thoroughly tested mainly to show the following: Errors Requirements conformance Performance Quality

Who did the testing? The developing team did most of the testing while independent testers were also invited to test the system. Testing strategies Unit test Integration test White box test Validation test System test Regression test

The following table depicts some of the modules and criteria used in the testing phase. Table 8.1 Test results Test case GUI functionality Test strategy Unit test Description Results

Testing action performed when PASS buttons and controls are clicked

Code snippets

Integration test

Integrating modules complete system

to

form

a PASS

System performance

White box test

Accessing the system on concurrently

PASS

29

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM

System functionality

Integration and Integrating system modules and PASS Unit test testing each of them for functionality

Databases connectivity

Integration test

Integrating a third party software, PASS Oracle 10g database server

Human Interaction

Computer Unit test

Each module was tested for interaction PASS with users

Textual objects

Validation test

Testing if the material uploaded is text PASS or not

Error handling

Regression test

Testing system errors

PASS

System Evaluation Evaluating the system for users to accept it as a usable tool. Direct observation and Pilot study evaluation techniques were used to find out the users views during this phase. Direct observation: The developers observed directly when some sampled users were evaluating the system. Users had to perform all the operations that are implemented in the MACS DL system and evaluate results. Pilot study: A small group of users was asked a set of questions regarding the system. Using a questionnaire, the pilot study was conducted and users provided their evaluation heuristics. Some of the questions asked were: Is the system usable? Is the system useful?

Evaluation results The results obtained from the system evaluation phase were used to enhance the systems functionality to make it more effective and efficient. The results were collected to guide the developers and also users on how to improve the system and how to best use it, respectively. The following is an in depth analysis of results obtained from the evaluation phase:

30

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM

Direct observation of users: We tried to investigate the factors that influence the perceived ease of use and usefulness of digital libraries among NUL students. Data were collected from under-graduate students at NUL. Individual undergraduate students were the population sample identified, and using stratified sampling method, each student around the NUL campus such as the Thomas Mofolo library and classrooms was handed a questionnaire. Evaluation results and analysis Out of One hundred and fifty questionnaires that were distributed, only sixty nine were returned, giving a response rate of 46%. Based on the study, 60% of the respondents were Computer Science and Engineering students, 20% were from social sciences and 10% from humanities. Table 8.2 A scale of 1 to 4, ranking as follows was used to grade the scores: 1 = Best, 2 = Good, 3 = Not sure, 4 = Bad Item evaluated HCI (Usability) HCI (Functionality) Project functionality System training There will be no need of training the users as the system is usable and easy to learn. Furthermore, MACS DL system is no exception to the already web based existing digital library systems that are in use today, which NUL MACS department students are already accustomed to using. Score 1 1 1 Answer Best Best Best

Conclusions and future prospects MACS DL system was a success, making it an exciting endeavor that served as an eye opener to the developers in their academic career as plenty of new computing concepts were learned during the execution of this project. The system is ready for deployment and use in an organization as huge as NUL. This system covers major parts of search engine implementation like stop-word removal, stemming, automatic Indexing, searching. To make this system a complete search engine we could add other parts of it like clustering and thesaurus expansion. We could implement this

31

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM

system for any digital objects collection such as videos, images etc. This system takes a lot of time to upload large documents, perhaps in the future new implementation strategies could be employed to make this faster.

References 1. Arms, W. Digital Libraries. MIT Press, Cambridge, MA, 2000. 2. Alexa T.M and Marie E.G, Principles For Digital Library Development, accessed on September 10th , 2011, from http://www.lhncbc.nlm.nih.gov/dlb/pubs/200105_cacm_mccray.pdf

3. Bin Li at, The History of Digital Libraries. Accessed on September 12th , 2011, from http://www.ils.unc.edu/~lib/digital-library.html 4. Gerald Salton and Christopher Buckley Term-Weighting approaches in automatic text retrieval, Cambridge, 2000.

5. Williams B. Frakes and Ricardo Baeza- Yates, Information Retrieval: Data Structures & Algorithms, 88-94 6. Witten et al, How to Build a Digital Library, Morgan Kaufman Publishers

32

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM

APPENDIX A
Acronyms A1. MACS Mathematic and Computer Science A2. NUL National University of Lesotho A3. FOST- Faculty of Science and Technology A4. IR Information Retrieval A5. FI File indexing A6. II Inverted Index A7. UI User Interface A8. CRC Class Responsibility Collaborator A9. DFD Data Flow Diagram A11. RE Requirements Engineering A12. SRS System Requirements Specification

33

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM

APPENDIX B
Programs //Cascading Style sheet
#header1 { height:200px; padding-top: 20px; padding: 0px 0px 0px 0px; width: 900px; background-repeat:no-repeat; background-position:top; padding-bottom: 3px; } #logos { font-family: Arial,sans-serif; color:#FFFFFF; font-size:18px; font-style:italic; padding: 15px 0px 0px 135px; background:url(images/buka.jpg) left top no-repeat; height: 200px; }

* { border: 0; margin: 0;

34

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM


} #uploader-button { font-family: Arial, Helvetica, sans-serif; font-size: 12px; font-weight:normal; color: #ffffff; width: 60px; height: 21px; background: url(images/read.gif); background-repeat:no-repeat; background-position:left top; border: none; float:right; }

img { border: 0px; }

body{ font: 12px Arial, Helvetica, sans-serif; color: #000000; background: url(images/body_bg.jpg) top repeat-x #FFFFFF; line-height: 20px; }

#bg{ background: url(images/bg.jpg) center top no-repeat; }

35

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM

/* search */

#search { float:right; padding-right:45px; padding-top:1px

#search form { margin: 0; }

#search fieldset { margin: 0; padding: 0; border: none; }

#search input { float: left; font: 11px Georgia, "Times New Roman", Times, serif; }

#search-text {

36

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM


width: 230px; height: 19px; padding-top: 4px; padding-left: 10px; padding-right: 12px; border: none; background: url(images/search.png); background-repeat:no-repeat; background-position:left top; color: #000000; }

#search-submit { width: 40px; height: 23px; background: url(images/search2.png); background-repeat:no-repeat; background-position:left top; border: none; }

/*MENU*/

/*MENU*/ #menu { width:650px; height:55px; }

37

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM


#menu ul { list-style:none; padding-left:0px; } #menu li { display:inline; } #menu ul li a { font-family: Arial,sans-serif; font-size: 18px; font-weight:normal; color: #008ae8; float: left; width: 85px; height: 30px; display: block; text-align: left; text-decoration: none; padding-top: 5px; padding-left:40px; background: url(images/menu_bg.png); background-repeat:no-repeat; background-position:10px 5px; } #menu a:hover { width: 85px; height: 35px; color: #093285;

38

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM


text-decoration: none; background: url(images/menu_hov.png); background-repeat:no-repeat; background-position:10px 5px; }

#left_part { width: 100px; float:left; padding: 0px 0px 0px 0px; }

.main_top { background: url(images/main_top.png) no-repeat top; height: 15px; } .main_bot { background: url(images/main_bot.png) no-repeat top; height: 15px; width:750px; padding-bottom: 10px; } .main_bg1 { background: url(images/main_bg.png); padding-left: 8px; color: black;

39

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM


padding-right: 7px; font-family: Tahoma;

/*main page*/ #main { width: 900px; margin: 0px auto; background:url(images/main.jpg) right top no-repeat; } #main2 { width: 750px; height: 400px; margin-left: 8px; clear:both; /*background: url(images/left_bg.jpg);*/ background-repeat:repeat-y; background-position:left; }

#header { width:900px; height: 100px; }

#logo { padding: 0px 0px 0px 0px; height: 113px;

40

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM


}

#logo H2 {

font-family: Arial, Helvetica, sans-serif; color:#000000; font-size:18px; font-style:italic; }

#logo a { text-decoration: none; text-transform: lowercase; font-style: italic; font-size: 16px; color: #000000; }

#logo H2 a{ font-size: 12px; font-family: Arial, Helvetica, sans-serif; font-weight:100; }

/* buttons */

#buttons { text-align:center; height: 30px; margin: 0px auto; padding: 0px 0px 0px 0px;

41

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM


background: url(images/buttons.png); width: 600px; }

#buttons a { font-family: Georgia, "Times New Roman", Times, serif; font-size: 18px; display: block; float: left; text-decoration: none; color: #0059FF; text-align: center; padding-top: 0px; font-weight:100; width: 170px; }

#buttons .but:hover { text-decoration:underline; }

.top { height:334px; padding-top: 10px; padding-left: 10px; background:url(images/top.jpg) left top no-repeat; } .top_bot { background: url(images/top_bot.jpg) left top no-repeat; height: 28px}

#content

42

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM


{ width: 876px; margin: 0px auto; background: #E6F6FF; padding: 0px 12px 5px 12px; line-height: 22px; background-repeat:repeat-y; text-align: left; background-position:left; }

#content_razd { background: url(images/content_razd.gif) 586px repeat-y ; }

#content_top { width: 900px; background: url(images/content_top.png) 0px top no-repeat ; height: 10px; }

#content_bot { width: 900px; background: url(images/content_bot.png) 0px bottom no-repeat ; height: 9px; }

.float_l { float:left;}

.col { width: 265px;

43

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM


float:left; padding: 0px 0px 0px 0px;}

.col_razd { background:url(images/col_text.gif) center repeat-y; height: 124px; width: 40px; float:left; margin-top: 35px;

h1 { padding: 0px 0px 5px 0px; font-family: Georgia, "Times New Roman", Times, serif; font-size: 16px; font-weight: bold; color:#051B93;}

#left{ width: 558px; float: left; color:#000000; margin-left: 0px; }

.text{ padding: 0px 0px 15px 0px; }

.img_l {

float:left;

44

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM


margin: 6px 15px 40px 0px; }

.img_r {

float: right; margin: 9px 10px 3px 10px;

.span_cont {

color: #07249F;

font-size:12px; font-weight:bold; }

#content H2{ font-family: Georgia, "Times New Roman", Times, serif; font-size:16px; font-weight: bold; color: #07249F; text-align: left; padding: 5px 0px 5px 0px; }

.read_r{ text-align: right; padding: 0px 8px 0px 0px; background: url(images/read.gif) right 3px no-repeat; }

.razd_g { background: url(images/razd_g.gif) 0px 2px repeat-x; height: 5px;

45

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM

.read_r a { font-size:12px; color: #ffffff; text-decoration: none; padding-right: 9px;

.next { width: 100%; text-align: right; padding: 0px 0px 0px 0px;}

.next a{ color:#FFFFFF; text-decoration: none; }

.next a:hover { text-decoration: underline; }

.more { text-align:right;} .more a { color: #009FFF; text-decoration:none; }

#right{ float: right;

46

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM


width: 270px; }

.span_dat { color: #002380; text-decoration: underline;}

#bottom { background: #E6F6FF; margin: 0px auto; color:#000000; padding: 0px 0px 0px 15px;

#b_col1 { width: 220px; float: left; margin-left: 0px; } #b_col2 { width: 180px; float: left; margin-left: 57px; } #b_col3 { width: 160px; float: left; margin-left: 20px; text-align: left; } #b_col4 {

47

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM


width: 184px; float: left; margin-left: 35px; text-align: left; }

.a_icons { color:#FF0000; text-decoration:none;} .a_icons:hover { text-decoration: underline;}

#bottom ul { list-style:none; padding: 0px 0px 0px 0px;}

#bottom li { padding: 8px 0px 0px 0px; } #bottom ul a:hover { text-decoration:underline; }

#bottom ul a { color:#000000; text-decoration:none; font-weight: 100;}

.fu_i { padding: 0px 14px 0px 0px; vertical-align: middle ; }

48

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM

#b_col2 ul { list-style:none; padding: 0px 0px 0px 0px;}

#b_col2 li { padding: 4px 0px 0px 18px; background: url(images/fish2.gif) 0px 11px no-repeat;}

#b_col2 a { color:#FFFFFF; }

#footer{ font-size: 11px; color: #000000; text-align: center; padding: 20px 0px 0px 0px; height: 60px; text-align: center; margin: 0px auto;

#footer a{ color: #000000; font-size: 11px; text-decoration: none; } #footer a:hover{ color: #000000; font-size: 11px;

49

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM


text-decoration: underline; }

/* -----------------------------------------------------------------------DO NOT CHANGE THE FOLLOWING ------------------------------------------------------------------------- */

div.pp_overlay {background: #000;display: none;left: 0;position: absolute;top: 0;width: 100%;z-index: 9500;} div.pp_pic_holder {display: none;position: absolute;width: 100px;z-index: 10000;}

//Java source code for Index class, index.java


package InvertedIndex; import InvertedIndex.Index.PostingListNode; import java.io.Serializable; import java.util.ArrayList; import java.util.Hashtable;

public class Index implements Serializable { public class documentVector implements Serializable {

public String docId; public double score; public ArrayList docVector;

public documentVector() { //compiled code throw new RuntimeException("Compiled Code"); }

public documentVector(String documentId) { //compiled code

50

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM


throw new RuntimeException("Compiled Code"); } }

public class PostingList implements Serializable {

public PostingListNode first; public int documentFrequency;

public PostingList() { //compiled code throw new RuntimeException("Compiled Code"); }

public void Add(PostingListNode Node) { //compiled code throw new RuntimeException("Compiled Code"); } }

public class PostingListNode implements Serializable {

public String documentId; public int docReference; public int termFrequency; public PostingListNode next;

public PostingListNode() { //compiled code throw new RuntimeException("Compiled Code"); }

51

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM


public PostingListNode(String docId, int tf, int docRef) { //compiled code throw new RuntimeException("Compiled Code"); } } private ArrayList PostingLists; private int count; private Hashtable<String, Integer> IndexTerms; public int numOfdocuments; public ArrayList docVectors; public ArrayList queryVector; private ArrayList queryterms; public Hashtable<String, Integer> documents; private String stopwordsPath; public Index(String stopwords_Path) { //compiled code throw new RuntimeException("Compiled Code"); } public void addIndexTerm(String termId, String docId, int tf) { throw new RuntimeException("Compiled Code"); }

public void Search(String query) throws Exception { //compiled code throw new RuntimeException("Compiled Code"); } public void getVectors() { //compiled code throw new RuntimeException("Compiled Code"); } public void computeScores() { //compiled code

52

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM


throw new RuntimeException("Compiled Code"); } public void sortDocuments() { //compiled code throw new RuntimeException("Compiled Code"); } public ArrayList RetrieveAnswer(String query) throws Exception { //compiled code throw new RuntimeException("Compiled Code"); } } //Source code for class IndexBuider package InvertedIndex;

import java.util.ArrayList; import java.util.Hashtable;

public class IndexBuilder {

public Index invertedIndex; private String document; private String response; private Hashtable<String, Integer> termsfrequency; public ArrayList QueryResults; private TextExtractor Extractor;

public IndexBuilder(String stopwords_path) throws Exception { //compiled code throw new RuntimeException("Compiled Code"); }

public IndexBuilder(String docId, String stopwords_path) {

53

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM


//compiled code throw new RuntimeException("Compiled Code"); }

private int frequency(ArrayList tokens, String term) { //compiled code throw new RuntimeException("Compiled Code"); }

public void indexDocument() throws Exception { //compiled code throw new RuntimeException("Compiled Code"); }

public void SaveIndexToDisk(String path) throws Exception { //compiled code throw new RuntimeException("Compiled Code"); }

public void ReadIndexFromDisk(String path) throws Exception { //compiled code throw new RuntimeException("Compiled Code"); }

public void AnswerQuery(String query) throws Exception { //compiled code throw new RuntimeException("Compiled Code"); }

public static void main(String[] args) throws Exception { //compiled code throw new RuntimeException("Compiled Code");

54

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM


} }

//Java source code for class StemText package InvertedIndex;

import java.util.ArrayList;

class StemText {

private char[] b; private int i; private int i_end; private int j; private int k; private static final int INC = 50;

public StemText() { //compiled code throw new RuntimeException("Compiled Code"); }

public void add(char ch) { //compiled code throw new RuntimeException("Compiled Code"); }

public void add(char[] w, int wLen) { //compiled code throw new RuntimeException("Compiled Code"); }

55

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM


public String toString() { //compiled code throw new RuntimeException("Compiled Code"); }

public int getResultLength() { //compiled code throw new RuntimeException("Compiled Code"); }

public char[] getResultBuffer() { //compiled code throw new RuntimeException("Compiled Code"); }

private final boolean cons(int i) { //compiled code throw new RuntimeException("Compiled Code"); }

private final int m() { //compiled code throw new RuntimeException("Compiled Code"); }

private final boolean vowelinstem() { //compiled code throw new RuntimeException("Compiled Code"); }

private final boolean doublec(int j) { //compiled code

56

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM


throw new RuntimeException("Compiled Code"); }

private final boolean cvc(int i) { //compiled code throw new RuntimeException("Compiled Code"); }

private final boolean ends(String s) { //compiled code throw new RuntimeException("Compiled Code"); }

private final void setto(String s) { //compiled code throw new RuntimeException("Compiled Code"); }

private final void r(String s) { //compiled code throw new RuntimeException("Compiled Code"); }

private final void step1() { //compiled code throw new RuntimeException("Compiled Code"); }

private final void step2() { //compiled code throw new RuntimeException("Compiled Code"); }

57

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM

private final void step3() { //compiled code throw new RuntimeException("Compiled Code"); }

private final void step4() { //compiled code throw new RuntimeException("Compiled Code"); }

private final void step5() { //compiled code throw new RuntimeException("Compiled Code"); }

private final void step6() { //compiled code throw new RuntimeException("Compiled Code"); }

public void stem() { //compiled code throw new RuntimeException("Compiled Code"); } public ArrayList stemIndexTerms(ArrayList textTokens) { //compiled code throw new RuntimeException("Compiled Code"); } } //Source code for class stopwords package InvertedIndex;

58

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM

import java.io.BufferedReader; import java.io.IOException; import java.util.Hashtable;

public class StopWords {

public Hashtable<String, Integer> stopWords; private BufferedReader stopWordsFile; private int count;

public StopWords(String path) throws IOException { //compiled code throw new RuntimeException("Compiled Code"); } } //Source code for class TextExtractor package InvertedIndex;

import java.io.File; import java.io.IOException; import java.util.ArrayList; import javax.xml.parsers.ParserConfigurationException; import org.xml.sax.SAXException;

public class TextExtractor { private File file; private String filename; public String textFromFile; public ArrayList Tokens; private String stopwordsPath; public TextExtractor(String stopwords_Path) {

59

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM


//compiled code throw new RuntimeException("Compiled Code"); } public TextExtractor(String Filename, String stopwords_Path) { //compiled code throw new RuntimeException("Compiled Code"); } public void ExtractText() throws Exception { //compiled code throw new RuntimeException("Compiled Code"); } public void indexTerms() throws Exception { //compiled code throw new RuntimeException("Compiled Code"); } private void pdfFile() throws Exception { //compiled code throw new RuntimeException("Compiled Code"); } private void docxFile() throws IOException, ParserConfigurationException, SAXException { //compiled code throw new RuntimeException("Compiled Code"); } private void pptFile() throws IOException { //compiled code throw new RuntimeException("Compiled Code"); } private void txtFile() throws IOException { //compiled code throw new RuntimeException("Compiled Code"); } }

60

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM


//Source code for

61

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM

//Java source code creating a home page interface


<%@page contentType="text/html" pageEncoding="UTF-8"%> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <head> <meta http-equiv="content-type" content="text/html; charset=utf-8" /> <title>Mathematics & Computer Science Digital Library System</title> <meta name="keywords" content="" /> <meta name="description" content="" /> <script type="text/javascript" src="lib/jquery-1.3.2.min.js"></script> <script type="text/javascript" src="lib/jquery.tools.js"></script> <script type="text/javascript" src="lib/jquery.custom.js"></script> <link href="styles.css" rel="stylesheet" type="text/css" /> <link href="style.css" rel="stylesheet" type="text/css" /> </head> <script language="JAVASCRIPT" type="TEXT/JAVASCRIPT"> function confirmMessage() { //display a confirmation box yielding closure of a system operation { alert("File successfully uploaded to server"); } } $(document).ready(function() { var passfield = document.getElementById('password_field_id'); passfield.type = 'text'; });

function focusCheckDefaultValue(field, type, defaultValue) {

62

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM


if (field.value == defaultValue) { field.value = ''; } if (type == 'pass') { field.type = 'password'; } } function blurCheckDefaultValue(field, type, defaultValue) { if (field.value == '') { field.value = defaultValue; } if (type == 'pass' && field.value == defaultValue) { field.type = 'text'; } else if (type == 'pass' && field.value != defaultValue) { field.type = 'password'; } }

</script>

<body>

<div id="bg">

63

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM


<div id="main"> <div id="content">

<div class="navi"></div>

<div id ="header1">

<div id="menu"> <ul> <!--create button links--> <li id="button1"><a href="macsdl.jsp" title="">Home</a></li> <li id="button2"><a href="ByAuthor.jsp" title="">Browse</a></li> <li id="button2"><a href="#" title="">Contacts</a></li> </ul> </div> <div id ="logos"></div> <div id="search"> <form method="get" action="searchResults.jsp"> <fieldset> <input type="text" name="search" id="search-text" size="25" value ="Search" onFocus="javascript:focusCheckDefaultValue(this, '', 'Search');" onBlur="javascript:blurCheckDefaultValue(this, '', 'Search');" >

<input type="submit" id="search-submit" value="" /> </fieldset> </form> </div>

</div> <br/><br/>

64

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM

<div align="left"> <img src="images/img11.jpg" class="img_l" align="left"alt="" /><br/><br/> <span class="span_cont">About MACS DL </span><br /> MACS DL is an educational portal for higher learning, with unlimited amounts of large pools of books,journals etc, everything you ever needed. </div>

<form enctype="Multipart/form-data" action="uploadfile.jsp" method="post" > <br/><br/><br/> <center> <table border="2"> <tr> <center> <td colspan="2"> <p align ="center"><b>Upload and share your files with the NUL community</b> </td> </center>

</tr> <tr> <td> <b>Choose a file to upload:</b> </td> <td> <input name="inputfile" type="file"> </td> </tr> <tr> <td colspan="2"> <p onclick="confirmMessage()"></p> </td> align="right"><input type="submit" id ="uploader-button" value="UPLOAD"

65

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM


</tr> </table> </center> </form> <div class="razd_g"></div><br />

<div class="col"> <h1>Add to the MACS DL</h1> <img src="images/col_img1.jpg" class="img_l" alt="" />Add you objects and share with the NUL community by uploading your files<br/>to the server, download and get stuff you need most!

</div> <div class="col_razd"></div> <div class="col"> <h1 class="tit">Browse by date</h1> <img src="images/col_img2.jpg" class="img_l" alt="" />Browse the collection by date, specify the date and browse freely.

</div> <div class="col_razd"></div> <div class="col"> <h1 class="tit">SEARCH MACS DL</h1> <img src="images/col_img3.jpg" class="img_l" alt="" />Type any query in the above search text field and click the search button. Get the results instantly! </div> <div style="clear: both"></div> <div style="height:15px; width: 100%"></div> <div class="razd_g"></div>

<div style="clear: both"></div>

</div> <div id="content_bot"></div> <!-- content ends -->

66

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM


<div style="height:15px; width: 100%"></div> <!-- bottom end --> <!-- footer begins -->

<div id="footer"> <p>Copyright 2012<p>Design by <a href="http://www.nul.ls" title="MACS DL">Mosola Napo N</a> <!--End of notice --></p><!-- end of copyright notice--> </div> <!-- footer ends --> </div>

</div> </body> </html> //Java Source code for Uploading files <%@page contentType="text/html" pageEncoding="UTF-8"%> <%@page language="java"%> <%@page import="InvertedIndex.*"%> <%@page import ="java.io.File,java.io.FileInputStream,java.io.InputStream"%> <%@page import="java.io.*,java.util.*, javax.servlet.*" %> <%@page import="javax.servlet.http.*,javax.servlet.ServletException"%> <%@page import="org.apache.commons.fileupload.*" %> <%@page import="org.apache.commons.fileupload.disk.*"%> <%@page import="org.apache.commons.fileupload.servlet.*" %> <%@page import="org.apache.commons.io.output.*" %>

<% // // //Upload document to the server. File file ;

67

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM


// Verify the content type String contentType = request.getContentType(); if ((contentType.indexOf("multipart/form-data") >= 0)) {

DiskFileItemFactory factory = new DiskFileItemFactory();

String Path="C:/Users/KELVIN/Documents/NetBeansProjects/DigitalLibrarySearch/documents/"; factory.setRepository(new File(Path));

String filename=null; // Create a new file upload handler ServletFileUpload upload = new ServletFileUpload(factory); try { // Parse the request to get file items. List fileItems = upload.parseRequest(request);

// Process the uploaded file items Iterator i = fileItems.iterator(); while ( i.hasNext () ) { FileItem fi = (FileItem)i.next(); filename=fi.getName(); file=new File(Path+filename); fi.write( file ) ; %> You have successfully uploaded the file by the name of:<br> <%=filename%> <% } }catch(Exception ex) {

68

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM


%> <%=ex%> <% }%> <%@page import="org.apache.tika.metadata.Metadata"%> <%@page import="org.apache.tika.parser.AutoDetectParser"%> <%@page import="org.apache.tika.sax.BodyContentHandler"%> <%@page import="java.sql.*"%> <% try { Connection conn=null; // ResultSet results=null; Statement stat;

//Class.forName("com.mysql.jdbc.Driver"); //conn=DriverManager.getConnection("jdbc:mysql://localhost:3306/dl", // // "root", "admin");

Class.forName("oracle.jdbc.driver.OracleDriver"); conn=DriverManager.getConnection ("jdbc:oracle:thin:dl/admin@localhost:1521/XE"); String resourceLocation = Path+filename; File file2 = new File(resourceLocation); InputStream input = new FileInputStream(file2); Metadata metadata = new Metadata(); BodyContentHandler handler = new BodyContentHandler(); AutoDetectParser parser = new AutoDetectParser(); parser.parse(input, handler, metadata); String Author= metadata.get(Metadata.AUTHOR); String Title=metadata.get(Metadata.TITLE); String last_modified=metadata.get(Metadata.LAST_MODIFIED);

69

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM


%> Author: <%=Author %><b></b>Title: <%=Title %><b></b>Last_Modified: <%=last_modified %> <% if(Author!=null&&Title!=null&&last_modified!=null) { stat=conn.createStatement(); int count=stat.executeUpdate ("insert into browse Values('"+Author.toLowerCase()+"','"+ Title.toLowerCase()+"','"+last_modified.toLowerCase()+"','"+filename+"')"); } } catch(SQLException exc) { ; }%> <% // //Index the uploaded document IndexBuilder index = new IndexBuilder(Path+filename,Path+"stopwords.txt"); index.ReadIndexFromDisk(Path+"invertedIndex.object"); index.indexDocument(); index.SaveIndexToDisk(Path+"invertedIndex.object"); %> <% }else { %> No document uploaded! <%

70

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM


} %> %> <meta http-equiv="refresh" content="0; URL=http://localhost:8080/DigitalLibrarySearch/macsdl.jsp"> <meta name="keywords" content="automatic redirection">

//Java server page for Browsing: Browse by Author


<%@page contentType="text/html" pageEncoding="UTF-8"%> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">

<html> <head> <meta http-equiv="content-type" content="text/html; charset=utf-8" /> <title>Mathematics & Computer Science Digital Library System</title> <meta name="keywords" content="" /> <meta name="description" content="" /> <script type="text/javascript" src="lib/jquery-1.3.2.min.js"></script> <script type="text/javascript" src="lib/jquery.tools.js"></script> <script type="text/javascript" src="lib/jquery.custom.js"></script> <link href="styles.css" rel="stylesheet" type="text/css" /> <link href="style.css" rel="stylesheet" type="text/css" /> </head> <script language="JAVASCRIPT" type="TEXT/JAVASCRIPT">

function confirmMessage() { //display a confirmation box asking the visitor if they want to get a message

71

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM

{ alert("File successfully uploaded to server"); } } $(document).ready(function() { var passfield = document.getElementById('password_field_id'); passfield.type = 'text'; });

function focusCheckDefaultValue(field, type, defaultValue) { if (field.value == defaultValue) { field.value = ''; } if (type == 'pass') { field.type = 'password'; } } function blurCheckDefaultValue(field, type, defaultValue) { if (field.value == '') { field.value = defaultValue; } if (type == 'pass' && field.value == defaultValue) { field.type = 'text'; }

72

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM


else if (type == 'pass' && field.value != defaultValue) { field.type = 'password'; } }

</script>

<body>

<div id="bg">

<div id="main"> <div id="content">

<div class="navi"></div> <!-- create automatically the point dor the navigation depending on the numbers of items -->

<div id ="header1">

<div id="menu"> <ul> <li id="button1"><a href="macsdl.jsp" title="">Home</a></li> <li id="button2"><a href="ByAuthor.jsp" title="">Browse</a></li> <li id="button2"><a href="#" title="">Contacts</a></li> </ul> </div> <div id ="logos"></div> <div align="center"> <br/> <center>

73

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM


<a href="Browse.jsp" >Browse by author</a><br/><br/> <a href="BrowseByTittle.jsp" >Browse title</a><br/><br/> <a href="BrowsebyDate.jsp" >Browse by date</a><br/><br/> </center> </div>

</div> <br/><br/><br/><br/><br/><br/><br/><br/><br/><br/><br/><br/> <div class="razd_g"></div><br />

<div class="col"> <h1>Add to the MACS DL</h1> <img src="images/col_img1.jpg" class="img_l" alt="" />Add you objects and share with the NUL community by uploading your files<br/>to the server, download and get stuff you need most!

</div> <div class="col_razd"></div> <div class="col"> <h1 class="tit">Browse by date</h1> <img src="images/col_img2.jpg" class="img_l" alt="" />Browse the collection by date, specify the date and browse freely.

</div> <div class="col_razd"></div> <div class="col"> <h1 class="tit">SEARCH MACS DL</h1> <img src="images/col_img3.jpg" class="img_l" alt="" />Type any query in the above search text field and click the search button. Get the results instantly! </div> <div style="clear: both"></div> <div style="height:15px; width: 100%"></div> <div class="razd_g"></div>

<div style="clear: both"></div>

74

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM

</div> <div id="content_bot"></div> <!-- content ends --> <div style="height:15px; width: 100%"></div> <!-- bottom end --> <!-- footer begins -->

<div id="footer"> <p>Copyright 2012<p>Design by <a href="http://www.nul.ls" title="MACS DL">Mosola Napo N</a> <!--End of notice --></p><!-- end of copyright notice--> </div> <!-- footer ends --> </div>

</div>

</body> </html>

//Java

server Page for Browsing: Browse by Title

<%@page contentType="text/html" pageEncoding="UTF-8"%> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <html> <head> <meta http-equiv="content-type" content="text/html; charset=utf-8" /> <title>Mathematics & Computer Science Digital Library System</title> <meta name="keywords" content="" /> <meta name="description" content="" />

75

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM


<script type="text/javascript" src="lib/jquery-1.3.2.min.js"></script> <script type="text/javascript" src="lib/jquery.tools.js"></script> <script type="text/javascript" src="lib/jquery.custom.js"></script> <link href="styles.css" rel="stylesheet" type="text/css" /> </head> <script language="JAVASCRIPT" type="TEXT/JAVASCRIPT">

function confirmMessage() { //display a confirmation box asking the visitor if they want to get a message

{ alert("File successfully uploaded to server"); } } $(document).ready(function() { var passfield = document.getElementById('password_field_id'); passfield.type = 'text'; });

function focusCheckDefaultValue(field, type, defaultValue) { if (field.value == defaultValue) { field.value = ''; } if (type == 'pass') { field.type = 'password'; } }

76

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM


function blurCheckDefaultValue(field, type, defaultValue) { if (field.value == '') { field.value = defaultValue; } if (type == 'pass' && field.value == defaultValue) { field.type = 'text'; } else if (type == 'pass' && field.value != defaultValue) { field.type = 'password'; } } </script> <body>

<div id="bg"> <div id="main"> <div id="content"> <div class="navi"></div> <!-- create automatically the point dor the navigation depending on the numbers of items -->

<div id ="header1"> <div id="menu"> <ul> <li id="button1"><a href="macsdl.jsp" title="">Home</a></li> <li id="button2"><a href="ByAuthor.jsp" title="">Browse</a></li> <li id="button2"><a href="#" title="">Contacts</a></li> </ul> </div>

77

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM


<div id ="logos"></div> </div> <br/><br/> <div id="main"> <%@page import="java.io.*"%> <%@page import="java.sql.*,java.util.*" %> <% String nam=request.getParameter("Name"); if(nam!=null) { Connection conn=null; ResultSet results=null; Statement stat;

Class.forName("oracle.jdbc.driver.OracleDriver"); conn=DriverManager.getConnection ("jdbc:oracle:thin:dl/admin@localhost:1521/XE"); stat=conn.createStatement(); results = stat.executeQuery("Select reference from browse "+ "Where title Like '%"+ nam.toLowerCase()+"%'"); while (results.next()) { String filename=results.getString("reference"); %> <!--embed src="test.pdf" width="800px" height="110px"></embed---> <!--a href="test.pdf">test</a--> <br><br><br> <center> <h1>Browse Results:</h1> <div id="main"> <div class="main_top"></div> <div class="main_bg1"> <tr>

78

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM


<td> <a href="downloadfile.jsp?<%=filename%> "><h2><%=filename%> </h2> </a><br><br> </td> </tr> </div> <div class="main_bot"></div> </div> </center> <% } results.close(); } else {%> <div id="search"> <b>Enter The Title Of The Book:</b> <form method="get" action="BrowseByTittle.jsp"> <fieldset> <input type="text" name="Name" id="search-text" size="25" value ="Title" onFocus="javascript:focusCheckDefaultValue(this, '', 'Title');" onBlur="javascript:blurCheckDefaultValue(this, '', 'Title');" > <input type="submit" id="search-submit" value="" /> </fieldset> </form> </div> <% }%> <br/><br/><br/><br/><br/><br/> <div class="razd_g"></div><br />

79

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM


<div class="col"> <h1>Add to the MACS DL</h1> <img src="images/col_img1.jpg" class="img_l" alt="" />Add you objects and share with the NUL community by uploading your files<br/>to the server, download and get stuff you need most! </div> <div class="col_razd"></div> <div class="col"> <h1 class="tit">Browse by date</h1> <img src="images/col_img2.jpg" class="img_l" alt="" />Browse the collection by date, specify the date and browse freely. </div> <div class="col_razd"></div> <div class="col"> <h1 class="tit">SEARCH MACS DL</h1> <img src="images/col_img3.jpg" class="img_l" alt="" />Type any query in the above search text field and click the search button. Get the results instantly! </div> <div style="clear: both"></div> <div style="height:15px; width: 100%"></div> <div class="razd_g"></div> <div style="clear: both"></div> </div> <div id="content_bot"></div> <!-- content ends --> <div style="height:15px; width: 100%"></div> </div> </div> </div>

</body> </html>

//Java server page for Browsing: Browse by date


<%@page contentType="text/html" pageEncoding="UTF-8"%>

80

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM


<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <html> <head> <meta http-equiv="content-type" content="text/html; charset=utf-8" /> <title>Mathematics & Computer Science Digital Library System</title> <meta name="keywords" content="" /> <meta name="description" content="" /> <script type="text/javascript" src="lib/jquery-1.3.2.min.js"></script> <script type="text/javascript" src="lib/jquery.tools.js"></script> <script type="text/javascript" src="lib/jquery.custom.js"></script> <link href="styles.css" rel="stylesheet" type="text/css" /> </head> <script language="JAVASCRIPT" type="TEXT/JAVASCRIPT"> function confirmMessage() { //display a confirmation box asking the visitor if they want to get a message { alert("File successfully uploaded to server"); } } $(document).ready(function() { var passfield = document.getElementById('password_field_id'); passfield.type = 'text'; });

function focusCheckDefaultValue(field, type, defaultValue) { if (field.value == defaultValue) { field.value = '';

81

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM


} if (type == 'pass') { field.type = 'password'; } } function blurCheckDefaultValue(field, type, defaultValue) { if (field.value == '') { field.value = defaultValue; } if (type == 'pass' && field.value == defaultValue) { field.type = 'text'; } else if (type == 'pass' && field.value != defaultValue) { field.type = 'password'; } } </script> <body> <div id="bg"> <div id="main"> <div id="content"> <div class="navi"></div> <!-- create automatically the point dor the navigation depending on the numbers of items --> <div id ="header1"> <div id="menu"> <ul> <li id="button1"><a href="macsdl.jsp" title="">Home</a></li> <li id="button2"><a href="ByAuthor.jsp" title="">Browse</a></li>

82

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM


<li id="button2"><a href="#" title="">Contacts</a></li> </ul> </div> <div id ="logos"></div> </div> <br/><br/> <div id="main"> <%@page import="java.io.*"%> <%@page import="java.sql.*,java.util.*" %> <% String nam=request.getParameter("Name"); if(nam!=null) { Connection conn=null; ResultSet results=null; Statement stat; Class.forName("oracle.jdbc.driver.OracleDriver"); conn=DriverManager.getConnection ("jdbc:oracle:thin:dl/admin@localhost:1521/XE"); stat=conn.createStatement(); results = stat.executeQuery("Select reference from browse "+ "Where date_modified Like '%"+ nam.toLowerCase()+"%'"); while (results.next()) { String filename=results.getString("reference"); %> <!--embed src="test.pdf" width="800px" height="110px"></embed---> <!--a href="test.pdf">test</a--> <br><br><br> <center> <h1>Browse Results:</h1> <div id="main"> <div class="main_top"></div>

83

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM


<div class="main_bg1"> <tr> <td> <a href="downloadfile.jsp?<%=filename%> "><h2><%=filename%> </h2> </a><br><br> </td> </tr> </div> <div class="main_bot"></div> </div> </center> <% } results.close(); } else {%> <div id="search"> <b>Enter Year Of Publication:</b> <form method="get" action="BrowsebyDate.jsp"> <fieldset> <input type="text" name="Name" id="search-text" size="25" value ="Year" onFocus="javascript:focusCheckDefaultValue(this, '', 'Year');" onBlur="javascript:blurCheckDefaultValue(this, '', 'Year');" > <input type="submit" id="search-submit" value="" /> </fieldset> </form> </div>

<%

84

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM


}%> <br/><br/><br/><br/><br/><br/> <div class="razd_g"></div><br /> <div class="col"> <h1>Add to the MACS DL</h1> <img src="images/col_img1.jpg" class="img_l" alt="" />Add you objects and share with the NUL community by uploading your files<br/>to the server, download and get stuff you need most! </div> <div class="col_razd"></div> <div class="col"> <h1 class="tit">Browse by date</h1> <img src="images/col_img2.jpg" class="img_l" alt="" />Browse the collection by date, specify the date and browse freely. </div> <div class="col_razd"></div> <div class="col"> <h1 class="tit">SEARCH MACS DL</h1> <img src="images/col_img3.jpg" class="img_l" alt="" />Type any query in the above search text field and click the search button. Get the results instantly! </div> <div style="clear: both"></div> <div style="height:15px; width: 100%"></div> <div class="razd_g"></div> <div style="clear: both"></div> </div> <div id="content_bot"></div> <!-- content ends --> <div style="height:15px; width: 100%"></div> </div> </div> </div>

</body> </html>

85

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM

//Java Source code for search results


<%@page contentType="text/html" pageEncoding="UTF-8"%> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <html> <head> <meta http-equiv="content-type" content="text/html; charset=utf-8" /> <title>Mathematics & Computer Science Digital Library System</title> <meta name="keywords" content="" /> <meta name="description" content="" /> <script type="text/javascript" src="lib/jquery-1.3.2.min.js"></script> <script type="text/javascript" src="lib/jquery.tools.js"></script> <script type="text/javascript" src="lib/jquery.custom.js"></script> <link href="styles.css" rel="stylesheet" type="text/css" /> </head> <script language="JAVASCRIPT" type="TEXT/JAVASCRIPT"> function confirmMessage() { //display a confirmation box asking the visitor if they want to get a message

{ alert("File successfully uploaded to server"); } } $(document).ready(function() { var passfield = document.getElementById('password_field_id'); passfield.type = 'text'; });

function focusCheckDefaultValue(field, type, defaultValue)

86

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM


{ if (field.value == defaultValue) { field.value = ''; } if (type == 'pass') { field.type = 'password'; } } function blurCheckDefaultValue(field, type, defaultValue) { if (field.value == '') { field.value = defaultValue; } if (type == 'pass' && field.value == defaultValue) { field.type = 'text'; } else if (type == 'pass' && field.value != defaultValue) { field.type = 'password'; } } </script> <body> <div id="bg"> <div id="main"> <div id="content"> <div class="navi"></div> <div id ="header1">

87

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM


<div id="menu"> <ul> <li id="button1"><a href="macsdl.jsp" title="">Home</a></li> <li id="button2"><a href="ByAuthor.jsp" title="">Browse</a></li> <li id="button2"><a href="#" title="">Contacts</a></li> </ul> </div> <div id ="logos"></div> <div id="search"> <form method="get" action="searchResults.jsp"> <fieldset> <input type="text" name="search" id="search-text" size="25" value ="Search" onFocus="javascript:focusCheckDefaultValue(this, '', 'Search');" onBlur="javascript:blurCheckDefaultValue(this, '', 'Search');" >

<input type="submit" id="search-submit" value="" /> </fieldset> </form> </div>

</div> <br/><br/>

<center> <br/><br/><br/> <h1>Search Results related to the query</h1> <div id="main2"> <div class="main_top"></div> <div class="main_bg1"> <!--p style="line-height: 200%; margin-bottom: 3px" >First Name :</p--> <tr>

88

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM


<td>

<%@page import="InvertedIndex.*,java.io.*"%> <%//Display browsed items String Path= "C:/Users/KELVIN/Documents/NetBeansProjects/DigitalLibrarySearch/documents/"; IndexBuilder invertedIndex = new IndexBuilder(Path+"stopwords.txt");

invertedIndex.ReadIndexFromDisk(Path+"invertedIndex.object"); invertedIndex.AnswerQuery(request.getParameter("search"));

File filename; for(int i=0;i<invertedIndex.QueryResults.size();i++) { filename=new File((String)invertedIndex.QueryResults.get(i)); String file=filename.getName(); %> <h1> <a href="downloadfile.jsp?<%=file%>"> <%=file%> </a></h1> <% } %> </td> </tr><br /><br/>

</div> <div class="main_bot"></div> </div> </center>

89

MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM


<!-- content ends --> </div> </div> </body> </html>

//Source code for downloading a file from Server, downloadfile.jsp <% String filename=request.getQueryString(); String Path="C:/Users/KELVIN/Documents/NetBeansProjects/DigitalLibrarySearch/documents/"; File file=new File(Path+filename); BufferedInputStream reader= new BufferedInputStream(new FileInputStream(file)); try { //servlet=response.getOutputStream(); response.setContentType("APPLICATION/OCTET-STREAM"); response.setHeader("Content-Disposition","attachment;filename="+file.getName()); //response.setContentLength((int)file.length()); //start to read file contents in bytes int iterator=0; while((iterator==reader.read())!= -1) out.write(iterator); reader.close(); out.close(); } //Errors were caught catch(Exception error){ } %>

90

Anda mungkin juga menyukai