C S 4 4 0 3
2011/12, ACADEMIC YEAR.
Mathematics and Computer Science Digital Library System
MACSDL
June 25th 2012.
Compiled & submitted by:
1. Mosola, N.N 200800142 2. Koali, M.S 200800572 3. Senatsi, K.V 200800535 Supervisor: Mr. L.Poulo
ABSTRACT: The world is evolving rapidly, while the pace of technology rises exponentially. Living in an information age, free access to information is a high demand, making people share what they have and obtain what they do not possess. Information sharing is common these days, with knowledge being shared amongst individuals, a need to manage such information is necessary. The National University of Lesotho, amongst its faculties is the faculty of Science and Technology which has a few departments. The department of Mathematics and Computer Science (MACS) seeks to have an information management system, where large pools of information can be stored, accessed freely and managed adequately. A web based MACS digital library (DL) system is the answer. MACS DL will manage shared digital information objects for students and lecturers to enhance learning at NUL. The system will bring an evolution to the Information, communication and Technology (ICT) usage within the NUL barracks where both students and their lecturers are in need of information on daily basis. Information retrieval will be at the heart of the system, while a repository of digital objects is kept. Following the prototyping process model and software life cycle, the system will be developed to address this issue. This document discusses all the relevant steps that take place as the system is under development.
ACKNOWLEDGEMENTS
Thanks to the Mathematics and Computer Science department at NUL, the project was indeed an eye opener; lots of great lessons have been picked up from this one and surely are the ones to build for the future. Working together on this project has made us a unit and we hope to work together again, we were a great team, an incredible team indeed and with you, a new Computing era is born. Big and ongoing thanks go to our supervisor and thesis advisor, Mr.Lebeko Poulo, for introducing us to this fascinating subject of Digital Libraries (DLs) and Information Retrieval (IR). Even in this four credit hour course, we learned more about DLs and IR systems than we could have learned in a lifetime in any other field of study. Thank you for giving us a chance to work on such a fascinating and challenging project! Our warm and kind regards go to the student union in the MACS department, this project would not have been possible without them willing to spare minutes with us during the requirements elicitation phase, testing and the evaluation phases of this endeavor.
CHAPTER ONE
Introduction Building a digital library (DL) is inevitably an expensive and resource-intensive. Before embarking on such a project, it is important to consider some basic principles underlying the design, implementation and maintenance of a DL. The principles applied in building this project, hereinafter called MACS DL, do not only apply on this endeavor but are essential to building large digital libraries that we know today, good examples are: The ACM digital library (http://portal.acm.org/dl) , New Zealand Digital Library (http://www.nzdl.org/cgi-bin/library), National Science Digital Library (http://nsdl.org/), to mention but a few. A digital library is a focused collection of digital objects, including text, video, and audio, along with methods for access and retrieval, and for selection, organization, and maintenance of the collection. C.f. Witten, Ian and David Bainbridge (2002), How to Build a Digital Library, Morgan Kaufman, p. 6. Brief background of DLs As the need to avail information and resources for access globally arose, digital library systems (DLS) were born and their importance grew to greater heights over traditional libraries to digitally preserve collections of valuable resources and information on the Web for educational and research purposes. As a result, the basic idea was to create web-based, easily-accessible collection of digital information whose organization and management would be automated to address the inefficiency of traditional libraries. MACS DL is no exception as all the principles used in its development follow the same route. What is MACS DL? MACS DL is an educational portal built for use at the National University of Lesotho (NUL), under the department of Mathematics and Computer Science (MACS) in the faculty of Science and Technology (FOST), to enhance the mode of course delivery and provide facilities to academics in this faculty. MACS DL provides services to the mentioned NUL community such as file sharing, browsing documents, searching textual materials, storing unlimited amount of digital objects on the server for current and future purposes, and information retrieval (IR). Motivation The higher education industry in Lesotho is experiencing an unprecedented growth rate. This trend is largely a result of new enabling technologies that have facilitated the virtual delivery of academic programs. This has in turn led to libraries becoming key success factors in the virtual academic environment. As students at NUL, it has come to our attention that the famous Thomas Mofolo library within the premises of NUL is not adequate and well-equipped enough to provide services to the students, researchers, and N.U.L staff in general. With that in mind, we aim to promote, support,
manage and disseminate high quality research, development and innovation in information, library and related fields. Project aims We aim to encourage and facilitate the development of information strategies in higher education communities such as NUL. The main reason of building a digital library system is to provide unlimited, free and remote access, to information from multiple users around the NUL campus. Problem Statement The National University of Lesotho (NUL) has a vision to be a leading African university. In the faculty of Science and Technology (FOST), the department of Mathematics and Computer Science (MACS) has a vision to facilitate learning and enable both students and lecturers have a better way of managing and conducting their academic work. Currently, MACS does not have an academic portal that manages textual digital objects to enhance learning. Students and lecturers rely on the internet search engines such as Google, Yahoo, etc, for any academic material they need. MACS department requires a more direct digital library that encourages file sharing for easy access of materials used in the MACS department. Proposed solution A well managed digital library system that will serve as a repository of rich information of greatest demand contributed by students and lecturers in the MACS department. Our hope is that this will increase the availability of student research for scholars, empower lecturers and students to conduct researches and advance digital library technology worldwide. MACS DL shall be a repository that archives any textual objects for current and future reference to enhance learning at NUL and provide free access to information. The fundamental reason for building a digital library for MACS department at NUL is belief that it will provide better delivery of information than was not possible in the past. Why a [MACS] DL? Some of the advantages of DLs, though not limited to, are the following: DLs bring the libraries closer to users: Information is more and easily accessible, and increases information usage. This is very much different to what happens when a traditional library, like Thomas Mofolo, is used since users need to physically go to the library. Searching and browsing capabilities: Computer systems are better than manual methods for finding information. DLs offer efficient and advanced search, information retrieval and browsing techniques that enable users to better search for their information need, browse material searched with relative ease.
Information sharing: Placing digital information on a network makes it available to everyone. With a MACS DL maintained on the NUL site, it will vastly be an improvement over expensive, physical duplication of little used material, which is sometimes inaccessible without having to travel to the location where it is stored, like Thomas Mofolo library. Availability of information: MACS DLs doors will never close; usage of MACS DLs collections can be done when library (i.e. Thomas Mofolo library) buildings are closed. Materials are never checked-out, misplaced, or stolen! Project Plan MACS DL system is sub-divided into two major parts, namely: The DL: This is by large, the most important of the two. The DL is a focused collection of digital objects organized and maintained in a proper manner. The DL will contain a pool of electronic versions of books and journals. Search engine This will assist in the information retrieval (IR) and file indexing (FI). The plan is to have a successful implemented DL with an incorporated search engine that enables users of the DL to retrieve information they require. Upon successful completion of these two, the system will be deemed to have met the users requirements, later discussed in this document.
CHAPTER TWO
System Requirements Specification (SRS) System functions and purpose: The system is a managed digital library portal for higher education and research purposes to be used at NUL under FOST, in the MACS department by both students and lecturers. MACS DL manages textual collections. The system allows intended users to upload materials to the server, browse through the collection, sort the collection, search for any material on the server, and download material from the collection. Hardware and Software requirements: 1. Hardware: Computers with a minimum secondary disk space of 20GB and primary memory of 128MB. 2. Software: Apache tomcat web server, MySQL database server and Java Integrated Development Environment (IDE). Performance specification: Using data structures and algorithms designs, each module/function of the system has been optimized as to never burden the processor with prolonged processes. User interface (UI): The system will provide an interactive, easy to learn and use interfaces to interact with its users enabling it to be used effectively and efficiently. The system UI obeys the basics of human computer interaction principles and designs. System data: Any data captured into the system, e.g. users information is stored in relational database schemas with normalized objects to conform to data integrity rules and consistency. The system provides tight information security measures to allow access only to users with credentials to access the system data. System design constraints: Imposed on the system design, MACS DL only manages textual digital objects. The system uses English language only, bearing in mind that the intended users are academics and can actually understand the language.
Requirements Engineering (RE) Requirements engineering establishes a solid base for design and construction of any system. Without it, the resulting software would have a higher probability of not meeting users needs. To build elegant software that actually solves users problems and meets the SRS mentioned earlier, the developers conducted an extensive study around the NUL campus to gather views from the targeted/potential users. The following RE steps were followed: Inception: this is where the scope and nature of the system was defined. 1. Scope: A web based digital library system that manages textual objects. 2. Nature: The system is an academic portal helping students and researchers to share materials and search for any texts on the server. Elicitation: This step helps to define what is actually required. Interviews with NUL students in the MACS department and lecturers were conducted to help developers elicit the users requirements to identify the problem properly and propose elements of the solution. The following diagrams were used to elicit users requirements. Figure 2.1 Use Case scenarios Use Case Number 1 Use Case Name Browsing Use Case description Accessing subsets of data by categorical classification. E.g. browse by author name, alphabetical order, title , by date etc
Searching
Indexing, Information retrieval and querying Adding commentary, generalization and reviews Adding new digital objects to the DLS
Annotate
Upload/Submission
Download
ANNOTATE
BROWSE
SEARCH
Elaboration: The basic requirements gathered were refined and modified to suite the design of the system under development; as a result an analysis model was produced.
Negotiation: The priorities of the system were clarified. E.g. as one of the priorities, the system must be able to upload documents to the server, enable information retrieval and download material from the server. During this step, different approaches to solving the problem identified were coined a preliminary set of set of solution requirements was negotiated amongst the developers. Specification: From the elaboration and negotiation steps, a detailed specification of the system was developed as enough resources had been gathered.
Validation: In an iterative manner, prototyping as a standalone process model, users of the system were frequently visited to make sure what is being developed conforms to what the users required. Management: Throughout the projects life cycle, changes to the initially gathered requirements were brilliantly managed as the prototyping model allows iteration of the steps performed.
As a result of the above RE steps, prototypes were built as an end product. To translate users needs into technical requirements, the development team used Quality Function Deployment (QFD), emphasizing what is valuable to the users, identifying the following three types of requirements: Types of Requirements identified 1. Normal requirements: These were stated by the users during interviews conducted, and provided developers with an understanding of what should be developed. 2. Expected requirements: These were not explicitly mentioned by the users but were identified by the developers. E.g. ease of searching. 3. Exciting requirements: These were identified by developers, as they are beyond the users expectations.
Requirements gathering Techniques used A number of techniques were used to gather requirements from the target population. The following were of great importance in the requirements gathering phase: Stratified sampling: A small group of students in the MACS department were chosen to represent the entire MACS student union. In the communication and planning phase of the prototyping model followed by the developers, ten (10) students were sampled. Observing users: Sampled students were observed as they carried out their daily activities, using Google and Yahoo as internet search engines to search for material they require on the internet. On the other end of things, sampled students were observed as they used the MACS DL search engine. Interviews: face-to-face interviews with the sampled population of students were conducted. Initially, a pilot study was employed, to make certain that the methods proposed by the developers were viable and that in the long run, the solution would be appreciated. As the developers needed concrete answers and proof for future references that interviews were conducted, a live video recording session of students using the MACS DL system as a prototype was recorded. This video is in the possession of the developers and shall be made available to the supervisor.
CHAPTER THREE
Project Risk Analysis and Management SWOT analysis was used to determine the strengths, weaknesses, opportunities and threats of this project. The following table depicts the outcomes of this extensive risk assessment. Table 3.1 SWOT analysis Strengths 1. Skilled project team members in programming web based IR systems 2. Availability of required resources Threats Existing DLs and search engines, such as Google scholar, 4Shared, etc Not enough metadata Integration with external Web resources can be found in digital search engines, such as objects Google 3. New technology Computer illiterate Search engine Information overload facilitating information end users development sharing A thorough study in assessing the risks related to embarking on a project of this nature was conducted by the developers and the above table shows the results of risk analysis using SWOT analysis. Other software engineering methods of risk assessment, management and mitigation were employed to try and analyze the uncertainties that could put the project under risk. These involve: Identifying technical risks for MACS DL project Identifying technology risks for MACS DL project Identifying staff risks for MACS DL project From the above, the developing team came up with the following risk analysis table: Weaknesses Opportunities Apache Tomcat web New technology server storage
10
A scale of 1 to 5 was used to estimate the impact of the risk on the project, and the following mappings were concluded: 1 = catastrophic, 2 = critical, 3 = marginal, 4 = negligible, 5 = low Table 3.2 Risk Table Risk Management Mitigation Monitoring Supervisor not readily Use groupware systems to This risk was inevitable Fortnightly available on campus contact supervisor (i.e. unavoidable) progress reports must be send to the supervisor Ambiguous project scope Refine project scope Request clear definition Brain-storming of scope sessions Developers not familiar Seek related sources from Quickly learn how to use Project progress with technology used supervisor and specialists the technologies required report Requirements change Elicit requirements, build Iteratively refine the Perform prototypes requirements to track requirements changes engineering Project team member drops Ensure timely and Meet with team members Measure out of the project consistent check-ups on regularly discussing the effectiveness of team members project mitigation. E.g. ensure that every member is doing some work on the project Impact 2
1 3 3
The risks identified were then assessed using the methods described above. In a round-robin fashion, the developers had to assign each impact of the risk a value until an agreement was reached, which is depicted in the tables above. Projects are always under some risk if any event is identified that could dent the projects schedule. The schedule of this project was affected by some of the risks identified above, for example; in the requirements elicitation phase, numerous iterations regarding the requirements identified were a must [RE] do.
11
CHAPTER FOUR
Object Oriented Analysis Design (OOA) 1. Class Responsibility Collaborator (CRC) Modeling CRC modeling provides a simple means of identifying and organizing classes. NB: CRC modeling is not an official part of Unified Modeling Language, but a collection of index cards that represent classes. Using this modeling, developers were able to identify potential classes that they could use as the building blocks of the MACS DL system. Benefits identified Portability: No computers are needed as CRC can be used anywhere, during the brainstorming sessions. Tangible: They allow participants to experience at firsthand how the system will work. Limited size: Index cards can only hold a limited amount of information compared to class diagrams. This enforces a high-level analysis. Fig. 4.1Class Responsibility Collaborator (CRC) Cards Class Name: Searcher Class Type : Internal entity Responsibilities Generates Query Filters Query Locates query Retrieves Results Class Name :Inverted File Class Type : External Entity Responsibilities Collaborators Searcher Insert Data into an Index Maintains Index Collaborators Inverted File Retriever Ranker
12
Class Name : Retriever Class Type : External Entity Responsibilities : Displays ranked results Collaborators : Ranker Searcher
Class Name : Ranker Class Type :External Entity Responsibilities Ranks data Collaborators Retriever Searcher
Class Name : Browser Class Type :External Entity Responsibilities Retrieving data from links Collaborators Searcher Retriever
Class Name : Downloader Class Type : External Entity Responsibilities Copies data to local storage Collaborators Searcher Retriever
13
Class Name : Digital Library Class Type : External Entity Responsibilities Processes a given request Collaborators Searcher Browser Downloader Up-loader
Class Name : Up-loader Class Type : External Entity Responsibilities Copies digital objects from local storage into the collection Collaborators
2. Class Diagrams
14
The system consists of the following classes, depicted in class diagrams.Fig 4.2 Class diagrams
SEARCHER -SearchQuery : string -Results : string +GenarateQuery() : string +FilterQuery() : string +LocateQuery() : void +RetrieveResults() : string
RETRIEVER -Data : string +RankResults() : void +DisplayResults() : void Inverted_File -Query : string -Size : int +InsertData() : void +Maintains() : void
OOA continued 3. Data flow diagram (DFD) DFDs show how data is captured as input, transformed in the processes and output as results to the users.
15
BROWSE
CLIENT
SEARCH
A digital object
dig
ita
lo
bje
ct
DATABASE SERVER
Digital object
Query/Results
RETRIEVE
Feedback
Indexed object
INDEX
om
en
SUBMIT ANNOTATE
CLIENT
Co
c me
) t(s
ig nd
ita
b lo
jec
Download Request
DOWNLOAD
CHAPTER FIVE
Data Structures and Algorithms
16
bject Digital o
t(s )
The following data structures were used in the development of the DL. The developers extensively studied the various data structures to use, and from a long list of candidates, it was thought that the best ones to use were the following: Hash Table: This data structure store all index terms. A hash table location references a posting list node for a specific index term. Why Hash Table? A Hash table data structure provides efficient searching which has been optimized to a time complexity of O (1) to find a posting list for an index term. Posting list (implemented using linked list) This is a linked list data structure in which a node in the list encapsulates term frequency (the number of times a term appears in the document) and document id (document filename). A new node is added in the list every time a document is indexed which contains the term. This operation runs at O (1). Algorithms: Indexing 0. Read an index object from disk. 0.1.Extract entire text from a given document. 0.2. Break the text into tokens /terms. 0.3.Filter the stop-words from the terms. 0.4.Stem each term, applying the stemming process 0.5.For each stemmed version of the term: Begin 0.5.1. if a term does not exist 0.5.1.1 Store the term into the hash table 0.5.1.1.2 Create a corresponding posting list for the term. 0.5.2. Else 0.5.2.1.Add a node in the posting list. End 0.6.Save an index object to disk. Searching and Ranking 1. Read an index object from disk. 1.1.Break the user query into tokens/terms. 1.2. Stem each term, applying the stemming process 1.3.For each stemmed-term: Begin 1.2.1. If a term exists 1.2.2. Retrieve its posting list and compute its weight in relation to query vector Q, and all document vectors (DiDn ) where n is the number of documents in the collection. 1.2.3. Else
17
1.2.3.1 The term weight is zero. End 1.4.For each document Begin 1.4.1. Compute the score (using the ranking formula). End 1.4.2. Sort documents according to their score. 1.4.3. Return sorted documents (The document with highest score is the most relevant document to the given query). System Design and Engineering This section discusses how the system was engineered. In this context, there are numerous steps that were followed, now that the developers were equipped with the requirement from the RE phase, classes to implement from the OOA phase, data structures and algorithms to implement, the developers now had to design and engineer the system to meet the requirements. Indexing Documents Overview: Searching, indexing and ranking techniques are at the core of the implementation of this piece of work. This chapter discusses the searching algorithms efficiency for indexing and ranking documents. Indexing extracts terms from a given document when uploaded to the server, to indicate what the document is all about or summarize its content. This process takes extracted terms and places them in an inverted index/file data structure. Searching pertains to posing a query and awaiting results from the digital library (DL) system. Information retrieval is the process of identifying the most relevant information that satisfies the given search query. The point of using an index is to increase the speed and efficiency of searches of the document collection. Without indexing, searching would have to be sequential, thus increasing the complexity of the algorithms. An inverted index contains two parts: an index of terms generally called the term index, which stores a distinct list of terms found in the document collection and, for each term, a posting list, which is simply a list of documents that contain the term. When submitting documents to the DL system, punctuations are removed, all terms converted to lower case, and stop words are removed. Stop words are those terms with little information content, e.g. conjunctions. This strategy will be discussed in depth later in this document. Suppose there are two documents; D1 and D2 and D1 has the following contents: Mathematics and Computer Science department whilst D2 contains: Department of Social Science. Key terms: Information retrieval (IR), Inverted Index (II), ranking, stop words, stemming, term weight, posting list.
18
Table 5.1 Inverted file structure analogy Term Mathematics Computer Science Department Social Inverted Index architecture
Fig 5.2 Inverted Index architecture.
Document ;Term frequency 1;1 1;1 1;1, 2;1 1;1, 2;1 2;1
Index Builder
Inverted Index
St e to mm ke e ns d
Hash Table
Posting List
Text Extractor
Indexing documents A document is uploaded through an interface to add it to the collection. The index Builder class is instantiated and constructed with the document name. The document is then indexed using the indexDocument method which simply allows the Text Extractor instance to extract text from the document and breaks the text into tokens and also filter the stop words. The stemmer instance stems the tokens. The inverted Index class will then be instantiated to store the stemmed terms into a hash table and a posting list is created for each term. The entire process forms the inverted index.
to k
en
Document
Stemmer
19
Posting List(s) [implemented as linked lists] A posting list indicates, for a given term, which documents contain the term. Typically, a Linked list data structure is used to store the entries in a posting list. This is because in most retrieval operations, a user enters a query and all documents that contain the query are obtained. This is done by hashing on the term in the index and finding the associated posting list. Once the posting list is obtained, a simple scan of the linked list yields all the documents that satisfy the query.
Index Builder The index builder drives the indexing process. The index builder loops through all the document objects and calls the indexDocument method to add each document to the inverted index. Once all the documents have been processed, the writeIndextoDisk method is used to store the invertedIndex object to disk, which is read every time a new document is uploaded to check for duplicates, a programming technique called SERIALIZABLE functions was used to make these functions SERIALIZABLE so that each time the program runs, the inverted index is read from disk hence all data in it will not only be available at runtime but saved to this inverted index file. Applying stemming process c.f. Porters stemming algorithm Stemming simply refers to changing all term forms to canonical versions. For example studying, studies, and studied all map to study. Stemming reduces words by stripping off suffixes, converting them to neutral stems that are devoid of tense, number, and in some languages case and gender information. This relaxes the match between query terms and words in the documents so that, for example, libraries is deemed equivalent to library. Stemming is not appropriate for all queries, particularly those involving names and other very specific words. This process avoids mapping words with different roots to the same term. Porters Stemming algorithm has been used to provide this service to the MACS DL system. Below is a description of Porters stemming algorithm, which can be found on the following URL:http://snowball.tartus.org/text/introduction.html, http://snowball.tartus.org/algorithms/lovins/stemmer.html. Porters stemming algorithm defines five successively applied steps of word transformation. Each step consists of a set of rules in the form <condition> <suffix> <new suffix>. For example, a rule (m > 0) EED EE means if the word has at least one vowel and consonant plus EED ending, change the ending to EE. This would mean words such as agreed become agree, while feed remains unchanged since the condition would not be satisfied hence another production rule would be used.
20
The algorithm is very concise, having just about sixty (60) rules, and very readable for a programmer. It is also very efficient in terms of computation complexity as compared to other affix and/ or statistical, stemming algorithms such as N-gram stemming, Hidden-Markov Model (HMM) algorithm, to mention but a few, although HMM algorithms are beneficial in fields such as machine translation and natural language processing, where numerous languages form the data set. The flaws identified with using classical stemmers like Porters stemming algorithm is that they often conflate words with similar syntax but completely different semantics. For example, news and new are both stemmed to new while they belong to two quite different categories. Dr. Porter, did not only publish the standard implementation of his work written in C and Java programming languages, but also developed a whole stemmers framework called Snowball. This framework provides a stemmer definition script language and a translator to ANSI C and Java. The main purpose was to enable programmers to develop their own stemmers for other character sets or languages. Currently there are implementations for Romance, Germanic, Uralic and Scandinavian languages as well as English, Russian, and Turkish on the websites given. We chose Porters stemming algorithm because of its efficiency in dealing with English related corpus, and it really helped in paving the way for developing MACS DL. Applying Stop words removal Stop words make up a large fraction of the text in most documents. Eliminating such words from consideration speeds processing, saves huge amount of disk space in indexes, and does not damage retrieval effectiveness. A list of words filtered out during automatic indexing because they make poor index terms is called a stop word list or a negative dictionary. These are words such as: a, and, on, in, the, about etc. Here we remove the words such as articles, Prepositions, conjunctions etc. from the documents. The following screen shot depicts an inverted index object after indexing two documents; the output of the indexing module was as follows:
21
22
CHAPTER SIX
Searching, Browsing, Ranking and Information Retrieval (IR) IR aims to retrieve large amounts of data, as fast as possible from different kinds of information stored in more than one form, be it visual, audio or textual. The user can retrieve information through posing a query, where the information retrieval module/function will retrieve all the information that satisfies the query. This is in contrast to what a database system does, where an exact answer is retrieved from a database object that matches a query using a select statement. IR systems do not retrieve a definite answer, but produce ranking of documents that seem to contain information relevant to the query given to the system. This is a process called indexing, which was covered earlier in this document. MACS DL information retrieval mechanism has been engineered to produce only the results that best match the provided query, filtering unwanted results. Methodology Several different types of IR mechanisms exist, but MACS DL system employs a method called Inverted File indexing. This is the most well organized index structure for text query evaluation as the system was developed to be used on textual digital objects. IR systems high level architecture A general scheme in figure 6.1 explains the essential structure of classical IR system. Through the first phase is the preprocessing mechanism, the raw documents of the corpus are processed to tokenized documents and then indexed as a list of postings per terms. At the second phase the user gives a query to represent his "information need". The query is then transformed to a system query and its relevant documents are retrieved from the index. The retrieved documents are ranked according to their relevance to the query and returned to the user through a user interface, later discussed in this document. Figure 6.1: Classical IR system architecture
23
Term Weighting This text retrieval module, like the rest, has been designed based on a comparison of content identifiers attached both to stored texts and to the users information queries. A formal representation of the term vectors is obtained by including in each vector all possible content terms allowed in the system and adding term weight assignments to provide distinction amongst terms. If Wk represents the weight of term k in document D or query Q, and t terms in all are available for content representation, the term vectors for document D and/or query Q can be written as: D = (t0, w0, t1, w1,...., tn, wn) and Q = (q0, w0; q1, w1;. . .; qr,wr). Searching process Searching is the most important part of the DL system. Information is retrieved based on the search process. This technique gives results based on the relevancy of the query provided. Finally, the related documents are then displayed on an output interface as links on a web page. The following screenshot shows the result of searching, after three documents were indexed correctly. Document 1 a document on digital libraries Document 2 - a document on digital libraries and Information retrieval Document 3 a document on distributed databases. Query = Introduction to digital libraries The computations of the term weights, term frequency, in relation to an uploaded document gave the following output:
24
Ranking retrieved documents Ranking uses similarity to select items that can be used in ranking the output triggered by a query. This involves ordering from the most likely items that satisfy the query. It also displays the most likely relevant terms first. To rank a document retrieved by a query similarity between them has to be calculated. The below formula is used to measure similarity between query and item. Ranking is done in two phases, these are: Coarse grain ranking Documents are sorted depending on the frequency of the query tokens. The document that contains all query terms will be ranked first. Fine grain ranking Depends upon weights of terms. In this phase, the similarity function is calculated between document and query. This module sorts the retrieved documents based on their relevance to the query posed, using the following formula:
The following screenshot depicts the result of a query, with the results ranked according to their relevance to the query posted.
25
In ranking, an artificial measure is used to gauge the similarity of each document to the query, and a fixed number of the closest matching documents are returned as answers. Metadata browsing Browsing is often described as the other side of the coin from searching, but really the two are at opposite ends of a spectrum. Searching is purposeful, whereas browsing tends to be casual. Terms such as random, informal, unsystematic, and without design are used to capture the unplanned nature of browsing and, often, the lack of a specific goal. Searching implies that you know what youre looking for, whereas browsing implies that youll know it when you see it. The metadata provided with the documents in a collection can support different browsing activities. Information collections that are entirely devoid of metadata can be searched. This is one of the real strength of full-text searching, but they cannot be browsed in any meaningful way unless some additional data is present. The structure that is implicit in metadata is the key to providing browsing facilities. Here are some examples of browsing: Lists: This is the simplest structure that is simply an ordered list. It can either be alphabetical, in an ascending or descending order. Dates: An automatically generated selector gives a choice of years, months and dates that can be used to browse metadata. Name: Offers users the flexibility of browsing collections using authors names. For example Deitel. Title: Users can browse collections using the titles of the documents in a pool of collections. For example, Advanced Java Programming.
26
CHAPTER SEVEN
User Interface (UI) Design A user interface describes how users of the system interact with it. Human Computer Interaction (HCI) basics and principles have been employed in developing the MACS DL user interfaces, to enable users to have a seamless interaction with the system. Common interface styles that were used are: Menus Forms
Principles of UI design
Consistency: The system is expected to be consistent. MACS DL achieved consistency in
the choice of colors used. The system has consistent interfaces and styles.
Learn-ability: The system should be easy to learn how to use. MACS DL is very easy to
use, providing labels and necessary information to guide users on how to best utilize it. Informative feedback: The system should provide informative feedback to users after an operation was performed. MACS DL adheres to this principle as at each instance, the system provides users with feedback after a query was posed and results displayed. Provide error prevention and handling: The system must have mechanisms to prevent users from committing errors and if any, be able to handle them. MACS DL is no exception as it prevents errors and system crashes. Off-load the short term memory: Reduces the number of steps users have to perform when carrying out an operation. MACS DL was designed to have interfaces with links and proper labels that make users to remember easily. Provide short-cuts for users: The system provides hyperlinks as a form of shortcuts to navigate web pages. System dialogue yielding closure: The system informs its users about its current state at each instance. For example, after posing a query, the system retrieves the results with a message that reads RESULTS MATCHING THE QUERY to yield closure of the IR operation.
27
Provide internal locus of control: The system allows users to be in control of it. Every operation the system performs is triggered by users. For example, documents retrieved are only downloaded when clicked.
Technologies used in the UI development Java server pages and servlets to make the system web based. Java scripts eXtensible Mark-up Language (XML) to allow file formats Hypertext Mark-up Language (HTML) Cascading style sheets to provide presentable documents with minimal effort eXtensible Style sheet Language (XSL) for supporting XML and HTML that are XML compliant.
28
CHAPTER EIGHT
System Testing and Evaluation The system was frequently tested for errors after completing each module. Testing is the process of exercising a program with the specific intent of finding errors prior to delivery to the end user. The system was thoroughly tested mainly to show the following: Errors Requirements conformance Performance Quality
Who did the testing? The developing team did most of the testing while independent testers were also invited to test the system. Testing strategies Unit test Integration test White box test Validation test System test Regression test
The following table depicts some of the modules and criteria used in the testing phase. Table 8.1 Test results Test case GUI functionality Test strategy Unit test Description Results
Testing action performed when PASS buttons and controls are clicked
Code snippets
Integration test
to
form
a PASS
System performance
PASS
29
System functionality
Integration and Integrating system modules and PASS Unit test testing each of them for functionality
Databases connectivity
Integration test
Human Interaction
Textual objects
Validation test
Error handling
Regression test
PASS
System Evaluation Evaluating the system for users to accept it as a usable tool. Direct observation and Pilot study evaluation techniques were used to find out the users views during this phase. Direct observation: The developers observed directly when some sampled users were evaluating the system. Users had to perform all the operations that are implemented in the MACS DL system and evaluate results. Pilot study: A small group of users was asked a set of questions regarding the system. Using a questionnaire, the pilot study was conducted and users provided their evaluation heuristics. Some of the questions asked were: Is the system usable? Is the system useful?
Evaluation results The results obtained from the system evaluation phase were used to enhance the systems functionality to make it more effective and efficient. The results were collected to guide the developers and also users on how to improve the system and how to best use it, respectively. The following is an in depth analysis of results obtained from the evaluation phase:
30
Direct observation of users: We tried to investigate the factors that influence the perceived ease of use and usefulness of digital libraries among NUL students. Data were collected from under-graduate students at NUL. Individual undergraduate students were the population sample identified, and using stratified sampling method, each student around the NUL campus such as the Thomas Mofolo library and classrooms was handed a questionnaire. Evaluation results and analysis Out of One hundred and fifty questionnaires that were distributed, only sixty nine were returned, giving a response rate of 46%. Based on the study, 60% of the respondents were Computer Science and Engineering students, 20% were from social sciences and 10% from humanities. Table 8.2 A scale of 1 to 4, ranking as follows was used to grade the scores: 1 = Best, 2 = Good, 3 = Not sure, 4 = Bad Item evaluated HCI (Usability) HCI (Functionality) Project functionality System training There will be no need of training the users as the system is usable and easy to learn. Furthermore, MACS DL system is no exception to the already web based existing digital library systems that are in use today, which NUL MACS department students are already accustomed to using. Score 1 1 1 Answer Best Best Best
Conclusions and future prospects MACS DL system was a success, making it an exciting endeavor that served as an eye opener to the developers in their academic career as plenty of new computing concepts were learned during the execution of this project. The system is ready for deployment and use in an organization as huge as NUL. This system covers major parts of search engine implementation like stop-word removal, stemming, automatic Indexing, searching. To make this system a complete search engine we could add other parts of it like clustering and thesaurus expansion. We could implement this
31
system for any digital objects collection such as videos, images etc. This system takes a lot of time to upload large documents, perhaps in the future new implementation strategies could be employed to make this faster.
References 1. Arms, W. Digital Libraries. MIT Press, Cambridge, MA, 2000. 2. Alexa T.M and Marie E.G, Principles For Digital Library Development, accessed on September 10th , 2011, from http://www.lhncbc.nlm.nih.gov/dlb/pubs/200105_cacm_mccray.pdf
3. Bin Li at, The History of Digital Libraries. Accessed on September 12th , 2011, from http://www.ils.unc.edu/~lib/digital-library.html 4. Gerald Salton and Christopher Buckley Term-Weighting approaches in automatic text retrieval, Cambridge, 2000.
5. Williams B. Frakes and Ricardo Baeza- Yates, Information Retrieval: Data Structures & Algorithms, 88-94 6. Witten et al, How to Build a Digital Library, Morgan Kaufman Publishers
32
APPENDIX A
Acronyms A1. MACS Mathematic and Computer Science A2. NUL National University of Lesotho A3. FOST- Faculty of Science and Technology A4. IR Information Retrieval A5. FI File indexing A6. II Inverted Index A7. UI User Interface A8. CRC Class Responsibility Collaborator A9. DFD Data Flow Diagram A11. RE Requirements Engineering A12. SRS System Requirements Specification
33
APPENDIX B
Programs //Cascading Style sheet
#header1 { height:200px; padding-top: 20px; padding: 0px 0px 0px 0px; width: 900px; background-repeat:no-repeat; background-position:top; padding-bottom: 3px; } #logos { font-family: Arial,sans-serif; color:#FFFFFF; font-size:18px; font-style:italic; padding: 15px 0px 0px 135px; background:url(images/buka.jpg) left top no-repeat; height: 200px; }
* { border: 0; margin: 0;
34
body{ font: 12px Arial, Helvetica, sans-serif; color: #000000; background: url(images/body_bg.jpg) top repeat-x #FFFFFF; line-height: 20px; }
35
/* search */
#search input { float: left; font: 11px Georgia, "Times New Roman", Times, serif; }
#search-text {
36
#search-submit { width: 40px; height: 23px; background: url(images/search2.png); background-repeat:no-repeat; background-position:left top; border: none; }
/*MENU*/
37
38
.main_top { background: url(images/main_top.png) no-repeat top; height: 15px; } .main_bot { background: url(images/main_bot.png) no-repeat top; height: 15px; width:750px; padding-bottom: 10px; } .main_bg1 { background: url(images/main_bg.png); padding-left: 8px; color: black;
39
/*main page*/ #main { width: 900px; margin: 0px auto; background:url(images/main.jpg) right top no-repeat; } #main2 { width: 750px; height: 400px; margin-left: 8px; clear:both; /*background: url(images/left_bg.jpg);*/ background-repeat:repeat-y; background-position:left; }
40
#logo H2 {
#logo a { text-decoration: none; text-transform: lowercase; font-style: italic; font-size: 16px; color: #000000; }
/* buttons */
#buttons { text-align:center; height: 30px; margin: 0px auto; padding: 0px 0px 0px 0px;
41
#buttons a { font-family: Georgia, "Times New Roman", Times, serif; font-size: 18px; display: block; float: left; text-decoration: none; color: #0059FF; text-align: center; padding-top: 0px; font-weight:100; width: 170px; }
.top { height:334px; padding-top: 10px; padding-left: 10px; background:url(images/top.jpg) left top no-repeat; } .top_bot { background: url(images/top_bot.jpg) left top no-repeat; height: 28px}
#content
42
#content_top { width: 900px; background: url(images/content_top.png) 0px top no-repeat ; height: 10px; }
#content_bot { width: 900px; background: url(images/content_bot.png) 0px bottom no-repeat ; height: 9px; }
.float_l { float:left;}
43
.col_razd { background:url(images/col_text.gif) center repeat-y; height: 124px; width: 40px; float:left; margin-top: 35px;
h1 { padding: 0px 0px 5px 0px; font-family: Georgia, "Times New Roman", Times, serif; font-size: 16px; font-weight: bold; color:#051B93;}
.img_l {
float:left;
44
.img_r {
.span_cont {
color: #07249F;
font-size:12px; font-weight:bold; }
#content H2{ font-family: Georgia, "Times New Roman", Times, serif; font-size:16px; font-weight: bold; color: #07249F; text-align: left; padding: 5px 0px 5px 0px; }
.read_r{ text-align: right; padding: 0px 8px 0px 0px; background: url(images/read.gif) right 3px no-repeat; }
45
.next { width: 100%; text-align: right; padding: 0px 0px 0px 0px;}
46
#bottom { background: #E6F6FF; margin: 0px auto; color:#000000; padding: 0px 0px 0px 15px;
#b_col1 { width: 220px; float: left; margin-left: 0px; } #b_col2 { width: 180px; float: left; margin-left: 57px; } #b_col3 { width: 160px; float: left; margin-left: 20px; text-align: left; } #b_col4 {
47
48
#b_col2 li { padding: 4px 0px 0px 18px; background: url(images/fish2.gif) 0px 11px no-repeat;}
#b_col2 a { color:#FFFFFF; }
#footer{ font-size: 11px; color: #000000; text-align: center; padding: 20px 0px 0px 0px; height: 60px; text-align: center; margin: 0px auto;
#footer a{ color: #000000; font-size: 11px; text-decoration: none; } #footer a:hover{ color: #000000; font-size: 11px;
49
div.pp_overlay {background: #000;display: none;left: 0;position: absolute;top: 0;width: 100%;z-index: 9500;} div.pp_pic_holder {display: none;position: absolute;width: 100px;z-index: 10000;}
public class Index implements Serializable { public class documentVector implements Serializable {
50
public void Add(PostingListNode Node) { //compiled code throw new RuntimeException("Compiled Code"); } }
public String documentId; public int docReference; public int termFrequency; public PostingListNode next;
51
public void Search(String query) throws Exception { //compiled code throw new RuntimeException("Compiled Code"); } public void getVectors() { //compiled code throw new RuntimeException("Compiled Code"); } public void computeScores() { //compiled code
52
public Index invertedIndex; private String document; private String response; private Hashtable<String, Integer> termsfrequency; public ArrayList QueryResults; private TextExtractor Extractor;
public IndexBuilder(String stopwords_path) throws Exception { //compiled code throw new RuntimeException("Compiled Code"); }
53
private int frequency(ArrayList tokens, String term) { //compiled code throw new RuntimeException("Compiled Code"); }
public void indexDocument() throws Exception { //compiled code throw new RuntimeException("Compiled Code"); }
public void SaveIndexToDisk(String path) throws Exception { //compiled code throw new RuntimeException("Compiled Code"); }
public void ReadIndexFromDisk(String path) throws Exception { //compiled code throw new RuntimeException("Compiled Code"); }
public void AnswerQuery(String query) throws Exception { //compiled code throw new RuntimeException("Compiled Code"); }
public static void main(String[] args) throws Exception { //compiled code throw new RuntimeException("Compiled Code");
54
import java.util.ArrayList;
class StemText {
private char[] b; private int i; private int i_end; private int j; private int k; private static final int INC = 50;
public void add(char ch) { //compiled code throw new RuntimeException("Compiled Code"); }
public void add(char[] w, int wLen) { //compiled code throw new RuntimeException("Compiled Code"); }
55
private final boolean cons(int i) { //compiled code throw new RuntimeException("Compiled Code"); }
private final int m() { //compiled code throw new RuntimeException("Compiled Code"); }
private final boolean vowelinstem() { //compiled code throw new RuntimeException("Compiled Code"); }
56
private final boolean cvc(int i) { //compiled code throw new RuntimeException("Compiled Code"); }
private final boolean ends(String s) { //compiled code throw new RuntimeException("Compiled Code"); }
private final void setto(String s) { //compiled code throw new RuntimeException("Compiled Code"); }
private final void r(String s) { //compiled code throw new RuntimeException("Compiled Code"); }
private final void step1() { //compiled code throw new RuntimeException("Compiled Code"); }
private final void step2() { //compiled code throw new RuntimeException("Compiled Code"); }
57
private final void step3() { //compiled code throw new RuntimeException("Compiled Code"); }
private final void step4() { //compiled code throw new RuntimeException("Compiled Code"); }
private final void step5() { //compiled code throw new RuntimeException("Compiled Code"); }
private final void step6() { //compiled code throw new RuntimeException("Compiled Code"); }
public void stem() { //compiled code throw new RuntimeException("Compiled Code"); } public ArrayList stemIndexTerms(ArrayList textTokens) { //compiled code throw new RuntimeException("Compiled Code"); } } //Source code for class stopwords package InvertedIndex;
58
public Hashtable<String, Integer> stopWords; private BufferedReader stopWordsFile; private int count;
public StopWords(String path) throws IOException { //compiled code throw new RuntimeException("Compiled Code"); } } //Source code for class TextExtractor package InvertedIndex;
import java.io.File; import java.io.IOException; import java.util.ArrayList; import javax.xml.parsers.ParserConfigurationException; import org.xml.sax.SAXException;
public class TextExtractor { private File file; private String filename; public String textFromFile; public ArrayList Tokens; private String stopwordsPath; public TextExtractor(String stopwords_Path) {
59
60
61
62
</script>
<body>
<div id="bg">
63
<div class="navi"></div>
<div id ="header1">
<div id="menu"> <ul> <!--create button links--> <li id="button1"><a href="macsdl.jsp" title="">Home</a></li> <li id="button2"><a href="ByAuthor.jsp" title="">Browse</a></li> <li id="button2"><a href="#" title="">Contacts</a></li> </ul> </div> <div id ="logos"></div> <div id="search"> <form method="get" action="searchResults.jsp"> <fieldset> <input type="text" name="search" id="search-text" size="25" value ="Search" onFocus="javascript:focusCheckDefaultValue(this, '', 'Search');" onBlur="javascript:blurCheckDefaultValue(this, '', 'Search');" >
</div> <br/><br/>
64
<div align="left"> <img src="images/img11.jpg" class="img_l" align="left"alt="" /><br/><br/> <span class="span_cont">About MACS DL </span><br /> MACS DL is an educational portal for higher learning, with unlimited amounts of large pools of books,journals etc, everything you ever needed. </div>
<form enctype="Multipart/form-data" action="uploadfile.jsp" method="post" > <br/><br/><br/> <center> <table border="2"> <tr> <center> <td colspan="2"> <p align ="center"><b>Upload and share your files with the NUL community</b> </td> </center>
</tr> <tr> <td> <b>Choose a file to upload:</b> </td> <td> <input name="inputfile" type="file"> </td> </tr> <tr> <td colspan="2"> <p onclick="confirmMessage()"></p> </td> align="right"><input type="submit" id ="uploader-button" value="UPLOAD"
65
<div class="col"> <h1>Add to the MACS DL</h1> <img src="images/col_img1.jpg" class="img_l" alt="" />Add you objects and share with the NUL community by uploading your files<br/>to the server, download and get stuff you need most!
</div> <div class="col_razd"></div> <div class="col"> <h1 class="tit">Browse by date</h1> <img src="images/col_img2.jpg" class="img_l" alt="" />Browse the collection by date, specify the date and browse freely.
</div> <div class="col_razd"></div> <div class="col"> <h1 class="tit">SEARCH MACS DL</h1> <img src="images/col_img3.jpg" class="img_l" alt="" />Type any query in the above search text field and click the search button. Get the results instantly! </div> <div style="clear: both"></div> <div style="height:15px; width: 100%"></div> <div class="razd_g"></div>
66
<div id="footer"> <p>Copyright 2012<p>Design by <a href="http://www.nul.ls" title="MACS DL">Mosola Napo N</a> <!--End of notice --></p><!-- end of copyright notice--> </div> <!-- footer ends --> </div>
</div> </body> </html> //Java Source code for Uploading files <%@page contentType="text/html" pageEncoding="UTF-8"%> <%@page language="java"%> <%@page import="InvertedIndex.*"%> <%@page import ="java.io.File,java.io.FileInputStream,java.io.InputStream"%> <%@page import="java.io.*,java.util.*, javax.servlet.*" %> <%@page import="javax.servlet.http.*,javax.servlet.ServletException"%> <%@page import="org.apache.commons.fileupload.*" %> <%@page import="org.apache.commons.fileupload.disk.*"%> <%@page import="org.apache.commons.fileupload.servlet.*" %> <%@page import="org.apache.commons.io.output.*" %>
67
String filename=null; // Create a new file upload handler ServletFileUpload upload = new ServletFileUpload(factory); try { // Parse the request to get file items. List fileItems = upload.parseRequest(request);
// Process the uploaded file items Iterator i = fileItems.iterator(); while ( i.hasNext () ) { FileItem fi = (FileItem)i.next(); filename=fi.getName(); file=new File(Path+filename); fi.write( file ) ; %> You have successfully uploaded the file by the name of:<br> <%=filename%> <% } }catch(Exception ex) {
68
Class.forName("oracle.jdbc.driver.OracleDriver"); conn=DriverManager.getConnection ("jdbc:oracle:thin:dl/admin@localhost:1521/XE"); String resourceLocation = Path+filename; File file2 = new File(resourceLocation); InputStream input = new FileInputStream(file2); Metadata metadata = new Metadata(); BodyContentHandler handler = new BodyContentHandler(); AutoDetectParser parser = new AutoDetectParser(); parser.parse(input, handler, metadata); String Author= metadata.get(Metadata.AUTHOR); String Title=metadata.get(Metadata.TITLE); String last_modified=metadata.get(Metadata.LAST_MODIFIED);
69
70
<html> <head> <meta http-equiv="content-type" content="text/html; charset=utf-8" /> <title>Mathematics & Computer Science Digital Library System</title> <meta name="keywords" content="" /> <meta name="description" content="" /> <script type="text/javascript" src="lib/jquery-1.3.2.min.js"></script> <script type="text/javascript" src="lib/jquery.tools.js"></script> <script type="text/javascript" src="lib/jquery.custom.js"></script> <link href="styles.css" rel="stylesheet" type="text/css" /> <link href="style.css" rel="stylesheet" type="text/css" /> </head> <script language="JAVASCRIPT" type="TEXT/JAVASCRIPT">
function confirmMessage() { //display a confirmation box asking the visitor if they want to get a message
71
{ alert("File successfully uploaded to server"); } } $(document).ready(function() { var passfield = document.getElementById('password_field_id'); passfield.type = 'text'; });
function focusCheckDefaultValue(field, type, defaultValue) { if (field.value == defaultValue) { field.value = ''; } if (type == 'pass') { field.type = 'password'; } } function blurCheckDefaultValue(field, type, defaultValue) { if (field.value == '') { field.value = defaultValue; } if (type == 'pass' && field.value == defaultValue) { field.type = 'text'; }
72
</script>
<body>
<div id="bg">
<div class="navi"></div> <!-- create automatically the point dor the navigation depending on the numbers of items -->
<div id ="header1">
<div id="menu"> <ul> <li id="button1"><a href="macsdl.jsp" title="">Home</a></li> <li id="button2"><a href="ByAuthor.jsp" title="">Browse</a></li> <li id="button2"><a href="#" title="">Contacts</a></li> </ul> </div> <div id ="logos"></div> <div align="center"> <br/> <center>
73
<div class="col"> <h1>Add to the MACS DL</h1> <img src="images/col_img1.jpg" class="img_l" alt="" />Add you objects and share with the NUL community by uploading your files<br/>to the server, download and get stuff you need most!
</div> <div class="col_razd"></div> <div class="col"> <h1 class="tit">Browse by date</h1> <img src="images/col_img2.jpg" class="img_l" alt="" />Browse the collection by date, specify the date and browse freely.
</div> <div class="col_razd"></div> <div class="col"> <h1 class="tit">SEARCH MACS DL</h1> <img src="images/col_img3.jpg" class="img_l" alt="" />Type any query in the above search text field and click the search button. Get the results instantly! </div> <div style="clear: both"></div> <div style="height:15px; width: 100%"></div> <div class="razd_g"></div>
74
</div> <div id="content_bot"></div> <!-- content ends --> <div style="height:15px; width: 100%"></div> <!-- bottom end --> <!-- footer begins -->
<div id="footer"> <p>Copyright 2012<p>Design by <a href="http://www.nul.ls" title="MACS DL">Mosola Napo N</a> <!--End of notice --></p><!-- end of copyright notice--> </div> <!-- footer ends --> </div>
</div>
</body> </html>
//Java
<%@page contentType="text/html" pageEncoding="UTF-8"%> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <html> <head> <meta http-equiv="content-type" content="text/html; charset=utf-8" /> <title>Mathematics & Computer Science Digital Library System</title> <meta name="keywords" content="" /> <meta name="description" content="" />
75
function confirmMessage() { //display a confirmation box asking the visitor if they want to get a message
{ alert("File successfully uploaded to server"); } } $(document).ready(function() { var passfield = document.getElementById('password_field_id'); passfield.type = 'text'; });
function focusCheckDefaultValue(field, type, defaultValue) { if (field.value == defaultValue) { field.value = ''; } if (type == 'pass') { field.type = 'password'; } }
76
<div id="bg"> <div id="main"> <div id="content"> <div class="navi"></div> <!-- create automatically the point dor the navigation depending on the numbers of items -->
<div id ="header1"> <div id="menu"> <ul> <li id="button1"><a href="macsdl.jsp" title="">Home</a></li> <li id="button2"><a href="ByAuthor.jsp" title="">Browse</a></li> <li id="button2"><a href="#" title="">Contacts</a></li> </ul> </div>
77
Class.forName("oracle.jdbc.driver.OracleDriver"); conn=DriverManager.getConnection ("jdbc:oracle:thin:dl/admin@localhost:1521/XE"); stat=conn.createStatement(); results = stat.executeQuery("Select reference from browse "+ "Where title Like '%"+ nam.toLowerCase()+"%'"); while (results.next()) { String filename=results.getString("reference"); %> <!--embed src="test.pdf" width="800px" height="110px"></embed---> <!--a href="test.pdf">test</a--> <br><br><br> <center> <h1>Browse Results:</h1> <div id="main"> <div class="main_top"></div> <div class="main_bg1"> <tr>
78
79
</body> </html>
80
81
82
83
<%
84
</body> </html>
85
{ alert("File successfully uploaded to server"); } } $(document).ready(function() { var passfield = document.getElementById('password_field_id'); passfield.type = 'text'; });
86
87
</div> <br/><br/>
<center> <br/><br/><br/> <h1>Search Results related to the query</h1> <div id="main2"> <div class="main_top"></div> <div class="main_bg1"> <!--p style="line-height: 200%; margin-bottom: 3px" >First Name :</p--> <tr>
88
<%@page import="InvertedIndex.*,java.io.*"%> <%//Display browsed items String Path= "C:/Users/KELVIN/Documents/NetBeansProjects/DigitalLibrarySearch/documents/"; IndexBuilder invertedIndex = new IndexBuilder(Path+"stopwords.txt");
invertedIndex.ReadIndexFromDisk(Path+"invertedIndex.object"); invertedIndex.AnswerQuery(request.getParameter("search"));
File filename; for(int i=0;i<invertedIndex.QueryResults.size();i++) { filename=new File((String)invertedIndex.QueryResults.get(i)); String file=filename.getName(); %> <h1> <a href="downloadfile.jsp?<%=file%>"> <%=file%> </a></h1> <% } %> </td> </tr><br /><br/>
89
//Source code for downloading a file from Server, downloadfile.jsp <% String filename=request.getQueryString(); String Path="C:/Users/KELVIN/Documents/NetBeansProjects/DigitalLibrarySearch/documents/"; File file=new File(Path+filename); BufferedInputStream reader= new BufferedInputStream(new FileInputStream(file)); try { //servlet=response.getOutputStream(); response.setContentType("APPLICATION/OCTET-STREAM"); response.setHeader("Content-Disposition","attachment;filename="+file.getName()); //response.setContentLength((int)file.length()); //start to read file contents in bytes int iterator=0; while((iterator==reader.read())!= -1) out.write(iterator); reader.close(); out.close(); } //Errors were caught catch(Exception error){ } %>
90