Speech Synthesizing Sentient Query Analyser and Responder

SPEECH SYNTHESIZING SENTIENT QUERY ANALYSER AND RESPONDER
A PROJECT REPORT Submitted by
S.SANTHOSH KUMAR V.KARTHIKEYAN
in partial fulfillment for the award of the degree of
BACHELOR OF ENGINEERING
IN
COMPUTER SCIENCE AND ENGINEERING
SHREE MOTILAL KANHAIYALAL FOMRA INSTITUTE OF TECHNOLOGY, KELAMBAKKAM ANNA UNIVERSITY : CHENNAI 600 025
MAY 2011
ANNA UNIVERSITY : CHENNAI 600 025

2
BONAFIDE CERTIFICATE
Certified that this project report SPEECH SYNTHESIZING SENTIENT QUERY ANALYSER AND RESPONDER is the bonafide work of R.Rajarajan and S.Vikram Srinath who carried out the project work under my supervision.
SIGNATURE MR.BHARATHI RAJA Head of the Department Department of Computer Science, Shree Motilal kanhaiyalal fomra Institute of technology, Kelambakkam-603103
SIGNATURE MR.VIJAI ANAND SUPERVISOR Department of Computer Science Shree Motilal kanhaiyalal fomra Institute of technology, kelambakkam-603103
INTERNAL EXAMINER
EXTERNAL EXAMINER
ACKNOWLEDGEMENT
We would like to express my thanks to our honorable Correspondent Mr.ShreeKumar Fomra for the facilities and support given by him in the college. We would like to express our sincere thanks to benevolent Principal Dr.R.John Stephen, Ph.D., for his valuable guidance, able supervision, encouragement and enthusiasm throughout this project. We deeply indebted to Mr. S.BarathiRaja, M.E Head of the Department, Department of Computer Science and Engineering, for extending all facilities and directions to complete this Project. We profoundly express our gratitude and indebtedness to our guide Mr.VijaiAnand, M.E Lecturer, Department of Computer Science And Engineering, S M K fomra Institute of Technology, Chennai for his valuable guidance and encouragement throughout this project work We would like to thank all other staff members, our parents and all my friends for their valuable comments and encouraging remarks.
ABSTRACT
The main aim of this project is to develop a speech search engine which is capable of doing both text to speech and speech to text search analysis and responding based on the query given by user. With our application we thrive to bridge the gap between human-machine communications by using the various speech synthesizing techniques. It can take voice input as search query and respond back with the right result with as much accuracy as possible keeping the response as much humanly as possible. It listens to the words spoken by the user and converts them to text and queries using it. It then finds out the most relevant result and reads the most relevant text out of it.
TABLE OF CONTENTS
CHAPTER
TITLE PAGE NO.
LIST OF FIGURES
2
ii
LIST OF ABBREVATIONS 1 INTRODUCTION

1
iii
About the Project
15
SYSTEM ANALYSIS
1
Existing system
16 16
2.2 Proposed system 3 REQUIREMENTS SPECIFICATION 3.1 Introduction

1 2
17 17 18
Hardware and Software specification Technologies Used 3.3.1 Introduction to Visual Studio 3.3.2 Introduction to .NET Framework 3.3.3 Introduction to VB.net 3.3.4 World of Open Source 3.3.4.1 PHP 3.3.4.2 Ruby 3.3.5 APACHE Server
19 20 21 22 23 20 21
3.3.5.1 Introduction to APACHE server 3.3.6 Introduction to MySQL 4 SYSTEM DESIGN 4.1 Architecture Diagram 4.2 Sequence Diagram
22 23
4.3 Use Case Diagram 5 SYSTEM DESIGN DETAILED 5.1 Modules 5.2 Module explanation 6 VERIFICATION AND VALIDATION
24
26 26 27
CODING SNAP SHOTS REFERENCE
LIST OF FIGURES
FIGURE NO
TITLE
PAGE NO
Architecture diagram of .NET framework Architecture diagram of SSSQAR Sequence Diagram of SSQAR Use Case Diagram SSQAR
LIST OF ABBREVATIONS
IEEE HTML HTTP The Institute of Electrical and Electronics Engineers, Inc. Hyper Text Markup Language Hyper Text Transport Protocol
2
SRS API PHP OS SSQAR
Software Requirements Specification Application Programming Interface Hypertext Preprocessor Operating System Speech synthesizing sentient query analyzer and responder
INTRODUCTION
2
CHAPTER 1 INTRODUCTION
About the project:

Speech synthesizing sentient query analyzer and responder is an end to
end voice process, where the user can ask his queries directly to the system. It fetches and analyzes those queries brings back the perfect and accurate result in voice mode. In simple words this is a lively interrogation between the human and the computer.
Human voice is received with the help of a microphone, and then Microsofts speech recognition API called MS Speech API (SAPI) 5.3 is used to convert the given voice query in to textual query. Regular expression and pattern matching system is used to prune the given textual query, as a result a perfect keyword is received. This keyword is then fed into online search engines (Google web services). From the list of results the accurate answer is yielded by data mining process. Respective grammar is added to the perfect result in the result builder module. Finally the textual form of result is converted into speech with the help of MS Speech SPI (SAPI) 5.3 again.
SYSTEM ANALYSIS
2
CHAPTER 2 SYSTEM ANALYSIS
2.1 Existing system

The existing system requires at the least minimal keyboard or mouse interaction. It lacks direct connectivity to the cloud and integration with web services which makes them confined to a local storage with lesser data handling capabilities. It does not allow spelling correction with letter by letter spelling dictation. It responds to keyword tokens only and tends to fail or perform inefficiently when further inputs are given.
Proposed system
The proposed system takes input in the regular human form. It has direct cloud connectivity to recover updated information. Integrated with various web services using a secured API key. It also has offline local storage to recover frequently asked queries efficiently. Uses regular expression and text pattern matching to recover the executed queries.
REQUIREMENT SPECIFICATION
CHAPTER 3 REQUIREMENT SPECIFICATIONS

3.1 INTRODUCTION The requirements specification is a technical specification of
requirements for the software products. It is the first step in the requirements analysis process it lists the requirements of a particular software system including functional, performance and security requirements. The requirements also provide usage scenarios from a user, an operational and an administrative perspective. The purpose of software requirements specification is to provide a detailed overview of the software project, its parameters and goals. This describes the project target audience and its user interface, hardware and software requirements . It defines how the client, team and audience see the project and its functionality.
3.2 HARDWARE AND SOFTWARE SPECIFICATION

3.2.1 HARDWARE REQUIREMENTS
Hard Disk RAM Processor
: : :
80GB and Above 512 MB and Above Pentium IV and Above
3.2.2 SOFTWARE REQUIREMENTS

Server
Windows Operating System 2000 & Above VB.NET, PHP, Ruby Apache MySQL
Language : Server Database : :
3.3 TECHNOLOGIES USED
3.3.1 INTRODUCTION TO VISUAL STUDIO:
Microsoft Visual Studio is an integrated development environment (IDE) from Microsoft. It can be used to develop console and graphical user interface applications along with Windows Forms applications, web sites, web applications, and web services in both native code together with managed code for all platforms supported by Microsoft Windows, Windows Mobile, Windows CE, .NET Framework, .NET Compact Framework and Microsoft Silverlight. Visual Studio includes a code editor supporting IntelliSense as well as code refactoring. The integrated debugger works both as a source-level debugger and a machine-level debugger. Other built-in tools include a forms designer for building GUI applications, web designer, class designer, and database schema designer. It accepts plug-ins that enhance the functionality at almost every levelincluding
2
adding support for source-control systems (like Subversion and Visual SourceSafe) and adding new toolsets like editors and visual designers for domain-specific languages or toolsets for other aspects of the software development lifecycle (like the Team Foundation Server client: Team Explorer). Visual Studio supports different programming languages by means of language services, which allow the code editor and debugger to support (to varying degrees) nearly any programming language, provided a language-specific service exists.
ARCHITECTURE OF VISUAL STUDIO:
Visual Studio does not support any programming language, solution or tool intrinsically. Instead, it allows plugging in various types of functionality. Specific functionality is coded as a VSPackage. When installed, the functionality is available as a Service. The IDE provides three services: SVsSolution, which provides the ability to enumerate projects and solutions; SVsUIShell, which provides windowing and UI functionality (including tabs, toolbars and tool windows); and SVsShell, which deals with registration of VSPackages. In addition, the IDE is also responsible for coordinating and enabling communication between services. All editors, designers, project types and other tools are implemented as VSPackages. Visual Studio uses COM to access the VSPackages. The Visual Studio SDK also includes the Managed Package Framework (MPF), which is a set of managed wrappers around the COM-interfaces that allow the Packages to be written in any CLI compliant language. However, MPF does not provide all the functionality exposed by the Visual Studio COM interfaces. The services can then be consumed for creation of other packages, which add functionality to the Visual Studio IDE.
Support for programming languages is added by using a specific VSPackage called a Language Service. A language service defines various interfaces which the VSPackage implementation can implement to add support for various functionalities. Functionalities that can be added this way include syntax coloring, statement completion; brace matching, parameter information tooltips, member lists and error markers for background compilation. If the interface is implemented, the functionality will be available for the language. Language services are to be implemented on a per-language basis. The implementations can reuse code from the parser or the compiler for the language. Language services can be implemented either in native code or managed code. For native code, either the native COM interfaces or the Babel Framework (part of Visual Studio SDK) can be used. For managed code, the MPF includes wrappers for writing managed language services. Visual Studio does not include any source control support built in but it defines two alternative ways for source control systems can integrate with the IDE. A Source Control VSPackage can provide its own customized user interface. In contrast, a source control plug-in using the MSSCCI (Microsoft Source Code Control Interface) provides a set of functions that are used to implement various source control functionality, with a standard Visual Studio user interface. MSSCCI was first used to integrate Visual SourceSafe with Visual Studio 6.0 but was later opened up via the Visual Studio SDK. Visual Studio .NET 2002 used MSSCCI 1.1, and Visual Studio .NET 2003 used MSSCCI 1.2. Visual Studio 2005, 2008 and 2010 use MSSCCI Version 1.3, which adds support for rename and delete propagation as well as asynchronous opening. Visual Studio supports running multiple instances of the environment (each with its own set of VSPackages). The instances use different registry hives (see MSDN's
2
definition of the term "registry hive" in the sense used here) to store their configuration state and are differentiated by their AppId (Application ID). The instances are launched by an AppId-specific .exe that selects the AppId, sets the root hive and launches the IDE. VSPackages registered for one AppId are integrated with other VSPackages for that AppId. The various product editions of Visual Studio are created using the different AppIds. The Visual Studio Express edition products are installed with their own AppIds, but the Standard, Professional and Team Suite products share the same AppId. Consequently, one can install the Express editions side-by-side with other editions, unlike the other editions which update the same installation. The professional edition includes a superset of the VSPackages in the standard edition and the team suite includes a superset of the VSPackages in both other editions. The AppId system is leveraged by the Visual Studio Shell in Visual Studio 2008.
3.3.2 INTRODUCTION TO .NET FRAMEWORK
The .NET Framework (pronounced dot net) is a software framework for Microsoft Windows operating systems. It includes a large library, and it supports several programming languages which allow language interoperability (each language can use code written in other languages). The .NET library is available to all the programming languages that .NET supports. The framework's Base Class Library provides user interface, data access, database connectivity, cryptography, web application development, numeric algorithms, and network communications. The class library is used by programmers, who combine it with their own code to produce applications.
2
Programs written for the .NET Framework execute in a software (as contrasted to hardware) environment, known as the Common Language Runtime (CLR). The CLR is an application virtual machine so that programmers need not consider the capabilities of the specific CPU that will execute the program. The CLR also provides other important services such as security, memory management, and exception handling. The class library and the CLR together constitute the .NET Framework. The .NET Framework is intended to be used by most new applications created for the Windows platform. To develop new applications, software developers must also install Microsoft's SDK for Windows 7 or .NET Framework 4 (or newer) or Visual Studio 2010.
Principal design features:

Interoperability Because computer systems commonly require interaction between new and older applications, the .NET Framework provides means to access functionality that is implemented in programs that execute outside the .NET environment. Access to COM components is provided in the System.Runtime.InteropServices and System.EnterpriseServices namespaces of the framework; access to other functionality is provided using the P/Invoke feature. Common Language Runtime Engine The Common Language Runtime (CLR) is the execution engine of the .NET Framework. All .NET programs execute under the supervision of the CLR, guaranteeing certain properties and behaviors in the areas of memory management, security, and exception handling.
2
Language Independence The .NET Framework introduces a Common Type System, or CTS. The CTS specification defines all possible data types and programming constructs supported by the CLR and how they may or may not interact with each other conforming to the Common Language Infrastructure (CLI) specification. Because of this feature, the .NET Framework supports the exchange of types and object instances between libraries and applications written using any conforming .NET language. Base Class Library The Base Class Library (BCL), part of the Framework Class Library (FCL), is a library of functionality available to all languages using the .NET Framework. The BCL provides classes which encapsulate a number of common functions, including file reading and writing, graphic rendering, database interaction, XML document manipulation and so on. Simplified Deployment The .NET Framework includes design features and tools that help manage the installation of computer software to ensure that it does not interfere with previously installed software, and that it conforms to security requirements. Security The design is meant to address some of the vulnerabilities, such as buffer overflows, that have been exploited by malicious software. Additionally, .NET provides a common security model for all applications. Portability The design of the .NET Framework allows it theoretically to be platform agnostic, and thus cross-platform compatible. That is, a program written to use the framework should run without change on any type of system for which the framework is implemented. While Microsoft has never implemented the full framework on any system except Microsoft Windows, the framework is engineered
2
to be platform agnostic, and cross-platform implementations are available for other operating systems (see Silverlight and the Alternative implementations section below). Microsoft submitted the specifications for the Common Language Infrastructure (which includes the core class libraries, Common Type System, and the Common Intermediate Language), the C# language, and the C++/CLI language to both ECMA and the ISO, making them available as open standards. This makes it possible for third parties to create compatible implementations of the framework and its languages on other platforms.
Architecture of .NET framework

Common Language Infrastructure (CLI) The purpose of the Common Language Infrastructure, is to provide a languageneutral platform for application development and execution, including functions for Exception handling, Garbage Collection, security, and interoperability. By implementing the core aspects of the .NET Framework within the scope of the CLI, this functionality will not be tied to a single language but will be available across the many languages supported by the framework. Microsoft's
implementation of the CLI is called the Common Language Runtime, or CLR. Assemblies The CIL code is housed in .NET assemblies. As mandated by specification, assemblies are stored in the Portable Executable (PE) format, common on the Windows platform for all DLL and EXE files. The assembly consists of one or more files, one of which must contain the manifest, which has the metadata for the assembly. The complete name of an assembly (not to be confused with the filename on disk) contains its simple text name, version number, culture, and public key token. The public key token is a unique hash generated when the
2
assembly is compiled, thus two assemblies with the same public key token are guaranteed to be identical from the point of view of the framework.[dubious discuss] A private key can also be specified known only to the creator of the assembly and can be used for strong naming and to guarantee that the assembly is from the same author when a new version of the assembly is compiled (required to add an assembly to the Global Assembly Cache).
Class Library The .NET Framework includes a set of standard class libraries. The class library is organized in a hierarchy of namespaces. Most of the built in APIs are part of either
2
System.* or Microsoft.* namespaces. These class libraries implement a large number of common functions, such as file reading and writing, graphic rendering, database interaction, and XML document manipulation, among others. The .NET class libraries are available to all CLI compliant languages. The .NET Framework class library is divided into two parts: the Base Class Library and the Framework Class Library. The Base Class Library (BCL) includes a small subset of the entire class library and is the core set of classes that serve as the basic API of the Common Language Runtime. The classes in mscorlib.dll and some of the classes in System.dll and System.core.dll are considered to be a part of the BCL. The BCL classes are available in both .NET Framework as well as its alternative implementations including .NET Compact Framework, Microsoft Silverlight and Mono. The Framework Class Library (FCL) is a superset of the BCL classes and refers to the entire class library that ships with .NET Framework. It includes an expanded set of libraries, including Windows Forms, ADO.NET, ASP.NET, Language Integrated Query, Windows Presentation Foundation, Windows Communication Foundation among others. The FCL is much larger in scope than standard libraries for languages like C++, and comparable in scope to the standard libraries of Java.
Memory management The .NET Framework CLR frees the developer from the burden of managing memory (allocating and freeing up when done); instead it does the memory
2
management itself even though there are no actual guarantees as to when the Garbage Collector will perform its work, unless an explicit double-call is issued[citation needed]. To this end, the memory allocated to instantiations of .NET types (objects) is done contiguously from the managed heap, a pool of memory managed by the CLR. As long as there exists a reference to an object, which might be either a direct reference to an object or via a graph of objects, the object is considered to be in use by the CLR. When there is no reference to an object, and it cannot be reached or used, it becomes garbage. However, it still holds on to the memory allocated to it. .NET Framework includes a garbage collector which runs periodically, on a separate thread from the application's thread, that enumerates all the unusable objects and reclaims the memory allocated to them. The .NET Garbage Collector (GC) is a non-deterministic, compacting, mark-andsweep garbage collector. The GC runs only when a certain amount of memory has been used or there is enough pressure for memory on the system. Since it is not guaranteed when the conditions to reclaim memory are reached, the GC runs are non-deterministic. Each .NET application has a set of roots, which are pointers to objects on the managed heap (managed objects). These include references to static objects and objects defined as local variables or method parameters currently in scope, as well as objects referred to by CPU registers. When the GC runs, it pauses the application, and for each object referred to in the root, it recursively enumerates all the objects reachable from the root objects and marks them as reachable. It uses .NET metadata and reflection to discover the objects encapsulated by an object, and then recursively walk them. It then enumerates all the objects on the heap (which were initially allocated contiguously) using reflection. All objects not marked as reachable are garbage. This is the mark phase. Since the memory held
2
by garbage is not of any consequence, it is considered free space. However, this leaves chunks of free space between objects which were initially contiguous. The objects are then compacted together to make used memory contiguous again. Any reference to an object invalidated by moving the object is updated to reflect the new location by the GC. The application is resumed after the garbage collection is over. The GC used by .NET Framework is actually generational. Objects are assigned a generation; newly created objects belong to Generation 0. The objects that survive a garbage collection are tagged as Generation 1, and the Generation 1 objects that survive another collection are Generation 2 objects. The .NET Framework uses up to Generation 2 objects. Higher generation objects are garbage collected less frequently than lower generation objects. This helps increase the efficiency of garbage collection, as older objects tend to have a larger lifetime than newer objects. Thus, by removing older (and thus more likely to survive a collection) objects from the scope of a collection run, fewer objects need to be checked and compacted.
3.3.3 INTRODUCTION TO VB.NET
Visual Basic .NET (VB.NET) is an object-oriented computer programming language that can be viewed as an evolution of the classic Visual Basic (VB) which is implemented on the .NET Framework. Microsoft currently supplies two major implementations of Visual Basic: Microsoft Visual Studio, which is commercial software and Microsoft Visual Studio Express, which is free of charge.
Versions
There are four versions and five releases of Visual Basic .NET implemented by the Visual Basic Team. Visual Basic .NET (VB 7) The original Visual Basic .NET was released alongside Visual C# and ASP.NET in 2002. Significant changes broke backward compatibility with older versions and caused a rift within the developer community. Visual Basic .NET 2003 (VB 7.1) Visual Basic .NET 2003 was released with version 1.1 of the .NET Framework. New features included support for the .NET Compact Framework and a better VB upgrade wizard. Improvements were also made to the performance and reliability of the .NET IDE (particularly the background compiler) and runtime. In addition, Visual Basic .NET 2003 was available in the Visual Studio.NET Academic Edition (VS03AE). VS03AE is distributed to a certain number of scholars from each country without cost.
Visual Basic 2005 (VB 8.0)
Visual Basic 2005 is the name used to refer to the Visual Basic .NET, Microsoft having decided to drop the .NET portion of the title. For this release, Microsoft added many features, including: * Edit and Continue * Design-time expression evaluation. * The My pseudo-namespace (overview, details), which provides:
easy access to certain areas of the .NET Framework that otherwise require significant code to access
dynamically-generated classes (notably My.Forms)
* Improvements to the VB-to-VB.NET converter * The Using keyword, simplifying the use of objects that require the Dispose pattern to free resources * Just My Code, which when debugging hides (steps over) boilerplate code written by the Visual Studio .NET IDE and system library code * Data Source binding, easing database client/server development The above functions (particularly My) are intended to reinforce Visual Basic .NET's focus as a rapid application development platform and further differentiate it from C#. Visual Basic 2005 introduced features meant to fill in the gaps between it and other "more powerful" .NET languages, adding: * .NET 2.0 languages features such as:
generics
2
Partial classes, a method of defining some parts of a class in one file and then adding more definitions later; particularly useful for integrating user code with auto-generated code.
Nullable Types
* Support for unsigned integer data types commonly used in other languages 'IsNot' operator patented One other feature of Visual Basic 2005 is the IsNot operator that makes 'If X IsNot Y' equivalent to 'If Not X Is Y', which gained notoriety when it was found to be the subject of a Microsoft patent application. Visual Basic 2008 (VB 9.0) Visual Basic 9.0 was released together with the Microsoft .NET Framework 3.5 on 19 November 2007. For this release, Microsoft added many features, including: * A true conditional operator, "If(boolean, value, value)", to replace the "IIf" function. * Anonymous types * Support for LINQ * Lambda expressions * XML Literals * Type Inference Visual Basic 2010 (VB 10.0)
In April 2010, Microsoft released Visual Basic 2010. Microsoft had planned to use the Dynamic Language Runtime (DLR) for that release but shifted to a coevolution strategy between Visual Basic and sister language C# to bring both languages into closer parity with one another. Visual Basic's innate ability to interact dynamically with CLR and COM objects has been enhanced to work with dynamic languages built on the DLR such as IronPython and IronRuby. The Visual Basic compiler was improved to infer line continuation in a set of common contexts, in many cases removing the need for the "_" line continuation character. Also, existing support of inline Functions was complemented with support for inline Subs as well as multi-line versions of both Sub and Function lambdas.
3.3.4 WORLD OF OPEN SOURCE SOFTWARE
Open-source software (OSS) is computer software that is available in source code form for which the source code and certain other rights normally reserved for copyright holders are provided under a software license that permits users to study, change, improve and at times also to distribute the software. Some open source licenses meet the requirements of the Open Source Definition. Some open source software is available within the public domain. Open source software is very often developed in a public, collaborative manner. Open-source software is the most prominent example of open-source development and often compared to (technically defined) user-generated content or (legally defined) open content movements. A report by Standish Group states that adoption of open-source software models has resulted in savings of about $60 billion per year to consumers.
3.3.4.1 PHP
PHP is a general-purpose scripting language originally designed for web development to produce dynamic web pages. For this purpose, PHP code is embedded into the HTML source document and interpreted by a web server with a PHP processor module, which generates the web page document. It also has evolved to include a command-line interface capability and can be used in standalone graphical applications. PHP can be deployed on most web servers and as a standalone interpreter, on almost every operating system and platform free of charge. PHP is installed on more than 20 million websites and 1 million web servers. PHP was originally created by Rasmus Lerdorf in 1995. The main implementation of PHP is now produced by The PHP Group and serves as the de facto standard for PHP as there is no formal specification. PHP is free software released under the PHP License; it is incompatible with the GNU General Public License (GPL) due to restrictions on the usage of the term PHP. While PHP originally stood for "Personal Home Page", it is now said to stand for "PHP: Hypertext Preprocessor", a recursive acronym.
USAGE
PHP is a general-purpose scripting language that is especially suited to server-side web development where PHP generally runs on a web server. Any PHP code in a requested file is executed by the PHP runtime, usually to create dynamic web page content. It can also be used for command-line scripting and client-side GUI applications. PHP can be deployed on most web servers, many operating systems and platforms, and can be used with many relational database management systems (RDBMS). It is available free of charge, and the PHP Group provides the complete source code for users to build, customize and extend for their own use.
PHP primarily acts as a filter, taking input from a file or stream containing text and/or PHP instructions and outputs another stream of data; most commonly the output will be HTML. Since PHP 4, the PHP parser compiles input to produce bytecode for processing by the Zend Engine, giving improved performance over its interpreter predecessor. Originally designed to create dynamic web pages, PHP now focuses mainly on server-side scripting,[ and it is similar to other server-side scripting languages that provide dynamic content from a web server to a client, such as Microsoft's Asp.net, Sun Microsystems' JavaServer Pages, and mod_perl. PHP has also attracted the development of many frameworks that provide building blocks and a design structure to promote rapid application development (RAD). Some of these include CakePHP, Symfony, CodeIgniter, and Zend Framework, offering features similar to other web application frameworks. The LAMP architecture has become popular in the web industry as a way of deploying web applications. PHP is commonly used as the P in this bundle alongside Linux, Apache and MySQL, although the P may also refer to Python or Perl or some combination of the three. WAMP packages (Windows/ Apache/ MySQL / PHP) and MAMP packages (Macintosh / Apache / MySQL / PHP) are also available. As of April 2007, over 20 million Internet domains had web services hosted on servers with PHP installed and mod_php was recorded as the most popular Apache HTTP Server module. PHP is used as the server-side programming language on 75% of all web servers. Web content management systems written in PHP include MediaWiki, Joomla, eZ Publish, WordPress, Drupal and Moodle. All websites
2
created using these tools are written in PHP, including the user-facing portion of Wikipedia, Facebook and Digg.
SECURITY
The National Vulnerability Database maintains a list of vulnerabilities found in computer software. The overall proportion of PHP-related vulnerabilities on the database amounted to: 20% in 2004, 28% in 2005, 43% in 2006, 36% in 2007, 35% in 2008, and 30% in 2009. Most of these PHP-related vulnerabilities can be exploited remotely: they allow attackers to steal or destroy data from data sources linked to the webserver (such as an SQL database), send spam or contribute to DoS attacks using malware, which itself can be installed on the vulnerable servers. These vulnerabilities are caused mostly by not following best practice programming rules: technical security flaws of the language itself or of its core libraries are not frequent (23 in 2008, about 1% of the total). Recognizing that programmers cannot be trusted, some languages include taint checking to detect automatically the lack of input validation which induces many issues. Such a feature is being developed for PHP, but its inclusion in a release has been rejected several times in the past. Hosting PHP applications on a server requires careful and constant attention to deal with these security risks. There are advanced protection patches such as Suhosin and Hardening-Patch, especially designed for web hosting environments. PHPIDS adds security to any PHP application to defend against intrusions. PHPIDS detects Cross-site scripting (XSS), SQL injection, header injection, Directory traversal, Remote File Execution, Local File Inclusion, Denial of Service (DoS).
3.3.4.2 RUBY 2
Ruby is a dynamic, reflective, general-purpose object-oriented programming language that combines syntax inspired by Perl with Smalltalk-like features. Ruby originated in Japan during the mid-1990s and was first developed and designed by Yukihiro "Matz" Matsumoto. It was influenced primarily by Perl, Smalltalk, Eiffel, and Lisp. Ruby supports multiple programming paradigms, including functional, object oriented, imperative and reflective. It also has a dynamic type system and automatic memory management; it is therefore similar in varying respects to Python, Perl, Lisp, Dylan, Pike, and CLU. The standard 1.8.7 implementation is written in C, as a single-pass interpreted language. There is currently no specification of the Ruby language, so the original implementation is considered to be the de facto reference. As of 2010[update], there are a number of complete or upcoming alternative implementations of the Ruby language, including YARV, JRuby, Rubinius, IronRuby, MacRuby, and HotRuby. Each takes a different approach, with IronRuby, JRuby and MacRuby providing just-in-time compilation and MacRuby also providing ahead-of-time compilation. The official 1.9 branch uses YARV, as will 2.0 (development), and will eventually supersede the slower Ruby MRI. Rubys Flexibility Ruby is seen as a flexible language, since it allows its users to freely alter its parts. Essential parts of Ruby can be removed or redefined, at will. Existing parts can be added upon. Ruby tries not to restrict the coder. For example, addition is performed with the plus (+) operator. But, if youd rather use the readable word plus, you could add such a method to Rubys built-in Numeric class.
class Numeric def plus(x) self.+(x) end end y = 5.plus 6 # y is now equal to 11 Rubys operators are syntactic sugar for methods. You can redefine them as well. Libraries and Repositories The Ruby Application Archive (RAA), as well as RubyForge, serve as repositories for a wide range of Ruby applications and libraries, containing more than seven thousand items. Although the number of applications available does not match the volume of material available in the Perl or Python community, there are a wide range of tools and utilities which serve to foster further development in the language. RubyGems has become the standard package manager for Ruby libraries. It is very similar in purpose to Perl's CPAN, although its usage is more like apt-get. Recently, many new and existing libraries have found a home on GitHub, which is focused on Git.
Advantages in using Ruby Ruby has a wealth of other features, among which are the following: * Ruby has exception handling features, like Java or Python, to make it easy to handle errors.
2
* Ruby features a true mark-and-sweep garbage collector for all Ruby objects. No need to maintain reference counts in extension libraries. As Matz says, This is better for your health. * Writing C extensions in Ruby is easier than in Perl or Python, with a very elegant API for calling Ruby from C. This includes calls for embedding Ruby in software, for use as a scripting language. A SWIG interface is also available. * Ruby can load extension libraries dynamically if an OS allows. * Ruby features OS independent threading. Thus, for all platforms on which Ruby runs, you also have multithreading, regardless of if the OS supports it or not, even on MS-DOS! * Ruby is highly portable: it is developed mostly on GNU/Linux, but works on many types of UNIX, Mac OS X, Windows 95/98/Me/NT/2000/XP, DOS, BeOS, OS/2, etc.
3.3.5 APACHE SERVER The Apache HTTP Server, commonly referred to as Apache is web server software notable for playing a key role in the initial growth of the World Wide Web. In 2009 it became the first web server software to surpass the 100 million web site milestone. Apache was the first viable alternative to the Netscape Communications Corporation web server (currently known as Oracle iPlanet Web Server), and has since evolved to rival other web servers in terms of functionality and performance. Typically Apache is run on a Unix-like operating system. Apache is developed and maintained by an open community of developers under the auspices of the Apache Software Foundation. The application is available for a wide variety of operating systems, including UNIX, GNU, FreeBSD, Linux, Solaris, Novell NetWare, AmigaOS, Mac OS X, Microsoft
2
Windows, OS/2, TPF, and eComStation. Released under the Apache License, Apache is characterized as open-source software. Since April 1996 Apache has been the most popular HTTP server software in use. As of February 2011[update] Apache served over 59.13% of all websites and more than 66.62% of the million busiest.
Features Apache supports a variety of features, many implemented as compiled modules which extend the core functionality. These can range from server-side programming language support to authentication schemes. Some common language interfaces support Perl, Python, Tcl, and PHP. Popular authentication modules include mod_access, mod_auth, mod_digest, and mod_auth_digest, the successor to mod_digest. A sample of other features include SSL and TLS support (mod_ssl), a proxy module (mod_proxy), a URL rewriter (also known as a rewrite engine, implemented under mod_rewrite), custom log files (mod_log_config), and filtering support (mod_include and mod_ext_filter).
Popular compression methods on Apache include the external extension module, mod_gzip, implemented to help with reduction of the size (weight) of web pages served over HTTP. ModSecurity is an open source intrusion detection and prevention engine for web applications. Apache logs can be analyzed through a web browser using free scripts such as AWStats/W3Perl or Visitors. Virtual hosting allows one Apache installation to serve many different actual websites. For example, one machine with one Apache installation could
2
simultaneously serve www.example.com, www.test.com, test47.testserver.test.com, etc. Apache features configurable error messages, DBMS-based authentication databases, and content negotiation. It is also supported by several graphical user interfaces (GUIs). Performance Although the main design goal of Apache is not to be the "fastest" web server, Apache does have performance comparable to other "high-performance" web servers. Instead of implementing a single architecture, Apache provides a variety of MultiProcessing Modules (MPMs) which allow Apache to run in a process-based, hybrid (process and thread) or event-hybrid mode, to better match the demands of each particular infrastructure. This implies that the choice of correct MPM and the correct configuration is important. Where compromises in performance need to be made, the design of Apache is to reduce latency and increase throughput, relative to simply handling more requests, thus ensuring consistent and reliable processing of requests within reasonable time-frames.
The Apache version considered by the Apache Foundation as providing high-performances is the multi-threaded version which mixes the use of several processes and several threads per process. While this architecture works faster than the previous multi-process based topology (because threads have a lower overhead than processes), it does not match the performances of the event-based architecture provided by other servers, especially when they process events with several worker threads.
This difference can be easily explained by the overhead that one thread per connection brings (as opposed to a couple of worker threads per CPU, each processing many connection events). Each thread needs to maintain its own stack, environment, and switching from one thread to another is also an expensive task for CPUs.
3.3.6 INTRODUCTION TO MySQL
MySQL is a relational database management system (RDBMS)[1] that runs as a server providing multi-user access to a number of databases. It is named after developer Michael Widenius' daughter, My. The SQL phrase stands for Structured Query Language. The MySQL development project has made its source code available under the terms of the GNU General Public License, as well as under a variety of proprietary agreements. MySQL was owned and sponsored by a single for-profit firm, the Swedish company MySQL AB, now owned by Oracle Corporation. Free-software projects that require a full-featured database management system often use MySQL. For commercial use, several paid editions are available, and offer additional functionality. Some free software project examples: Joomla, WordPress, MyBB, phpBB, Drupal and other software built on the LAMP software stack. MySQL is also used in many high-profile, large-scale World Wide Web products, including Wikipedia, Google (though not for searches) and Facebook. Deployment
MySQL can be built and installed manually from source code, but this can be tedious so it is more commonly installed from a binary package unless special customizations are required. On most Linux distributions the package management system can download and install MySQL with minimal effort, though further configuration is often required to adjust security and optimization settings. Though MySQL began as a low-end alternative to more powerful proprietary databases, it has gradually evolved to support higher-scale needs as well. It is still most commonly used in small to medium scale single-server deployments, either as a component in a LAMP based web application or as a standalone database server. Much of MySQL's appeal originates in its relative simplicity and ease of use, which is enabled by an ecosystem of open source tools such as phpMyAdmin. In the medium range, MySQL can be scaled by deploying it on more powerful hardware, such as a multi-processor server with gigabytes of memory. There are however limits to how far performance can scale on a single server, so on larger scales, multi-server MySQL deployments are required to provide improved performance and reliability. A typical high-end configuration can include a powerful master database which handles data write operations and is replicated to multiple slaves that handle all read operations. The master server synchronizes continually with its slaves so in the event of failure a slave can be promoted to become the new master, minimizing downtime. Further improvements in performance can be achieved by caching the results from database queries in
2
memory using memcached, or breaking down a database into smaller chunks called shards which can be spread across a number of distributed server clusters.
Platforms and Interfaces MySQL is written in C and C++. Its SQL parser is written in yacc, and a home-brewed lexical analyzer named sql_lex.cc. MySQL works on many different system platforms, including AIX, BSDi, FreeBSD, HP-UX, eComStation, i5/OS, IRIX, Linux, Mac OS X, Microsoft Windows, NetBSD, Novell NetWare, OpenBSD, OpenSolaris, OS/2 Warp, QNX, Solaris, Symbian, SunOS, SCO OpenServer, SCO UnixWare, Sanos and Tru64. A port of MySQL to OpenVMS also exists. Many programming languages with language-specific APIs include libraries for accessing MySQL databases. These include MySQL Connector/Net for integration with Microsoft's Visual Studio (languages such as C# and VB are most commonly used) and the ODBC driver for Java. In addition, an ODBC interface called MyODBC allows additional programming languages that support the ODBC interface to communicate with a MySQL database, such as ASP or ColdFusion. The HTSQL - URL based query method also ships with a MySQL adapter, allowing direct interaction between a MySQL database and any web client via structured URLs. The MySQL server and official libraries are mostly implemented in ANSI C/ANSI C++.
SYSTEM DESIGN
CHAPTER 4 SYSTEM DESIGN

4.1 ARCHITECTURE DIAGRAM OF SSSQAR
4.2 SEQUENTIAL DIAGRAM OF SSSQAR
4.2 USECASE DIAGRAM OF SSSQAR
SYSTEM DESIGN-DETAILED
CHAPTER 5 SYSTEM DESIGN-DETAILED

5.1 MODULES

Speech Analysis Integrating Rest Query Analysis Result Builder Text to Speech
5.2 MODULES EXPLANATION Speech Analysis

It is responsible for analysis of the analog signals. It sends the input to the speech to text converter. It will connect with the MSAPI 5.3 and make use of the necessary functions using direct API calls. It then converts the spoken speech to text.
Integrating REST The text query given by the speech analysis engine will be sent to a restful router. The router will take these queries and send it to the REST API. The REST API will take care of the query analysis and other executions. The response JSON from the web services are also parsed by the RESTFUL API.
Query Analysis This module is responsible for the necessary query formation and execution. It will build various combinations of regular expressions and search the cloud using web services. The web services offers various tools for implementations. It finds the
nearest page with all the matching patterns in it and gathers the necessary data using the data mining mechanism.
Result Builder This is responsible for preparing the output for the system. It will take the data from the previous module and add necessary grammar to it. It will check the result for any encoding errors and escape the necessary escape sequences and special characters. Then it sends the data to the next module.
Text to Speech The text data from the result builder is parsed and sent to the speech builder. It then connects to the MSAPI again and sends the text result data using an API call. The speech builder will then output the text as voice using the necessary configuration settings.
NG AND TESTINGput to the XXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXX

2
XXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXX XXXXXXXVERVERIFICATION AND VALIDATION
CHAPTER 6 VERIFICATION AND VALIDATION

Once the program exists, we must test it to see if it is free of bugs. High quality products must meet users needs and expectations. Furthermore the product should attain this with minimal or no defects, the focus being on improving products prior to delivery rather than correcting them after delivery. The ultimate goal of building high quality software is users satisfaction. There are two basic approaches to system testing. Validation is the task of predicting correspondence, which cannot be determined until this system is in place. Verification is the exercise of determining correctness. Testing strategies The extent of testing a system is controlled by many factors, such as the risk involved, the limitations of the resources and deadlines. We deploy a testing strategy that does the best job of finding the defects in the product within the given constraints. The different testing strategies are:
Black Box Testing:
The concept of black box testing is used to represent the system whose inside workings are not available for inspection. In black box testing, we try various inputs and examine the resulting outputs. Black box testing works very nicely in testing objects in object oriented environment. For inspection the input and output are defined through use cases or other analysis information.
White Box Testing: White box testing assumes that the specific logic is important and must
be tested to guarantee the systems proper functioning. The main use of the white box id the error based testing. In a white box testing, the bugs are looked for that have a low probability of execution that have been overlooked previously. It is also known as path testing. There are two types of path testing: Statement testing coverage: where every statement in the objects method is covered by executing it at least once. Branch testing coverage: it is to perform enough tests to ensure that every branch alternative is executed at least once. Top down testing A top-down strategy supports the user interface and event driven system. This serves two purposes; first the top down approach can test navigation through screens and verify that it matches the requirement. Second, users at the early stage can see how the final application will look and feel.
Bottom up testing
2
Bottom up testing starts with the details of the system and proceeds to higher levels by a progressive aggregation of details until they collectively fit requirements of the system. In this testing the methods and classes which are independent are tested.
SOURCE CODE
SCREEN SHOTS
REFERENCE
1
F. Jelinek, Statistical Methods for Speech Recognition. Cambridge, U.K.: MIT Press, 1997. S. F. Chen and J. Goodman, An empirical study of smoothing techniques for language modeling, Comput. Speech Lang., vol. 13, no. 4, pp. 359393, 1999. J. T. Goodman, A bit of progress in language modeling, Comput.Speech Lang., pp. 403434, 2001. S. M. Katz, Estimation of probabilities from sparse data for the language model component of a speech recogniser, IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-35, no. 3, pp. 400401, Mar. 1987.
D. Jurafsky and J. H. Martin, Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, 2nd ed. Upper Saddle River, NJ: Prentice- Hall, 2009. R. Kneser and H. Ney, Improved backing-off for -gram language modeling, Proc. ICASSP95, pp. 181184, 1995. Y. Bengio, R. Ducharme, P. Vincent, and C. Jauvin, A neural probabilistic language model, J. Mach. Learn. Res., vol. 3, pp. 11371155, 2003 J. Blitzer, A. Globerson, and F. Pereira, Distributed latent variable models of lexical co-occurrences, in Proc. 10th Int. Workshop Artif. Intell. Statist., 2005. D. J. C. MacKay and L. C. B. Peto, A hierarchical Dirichlet language model, Natural Lang. Eng., vol. 1, no. 3, pp. 119, 1994.
10
S. J. Goldwater, T. L. Griffiths, and M. Johnson, Interpolating between types and tokens by estimating power-law generators, Adv. Neural Inf. Process. Syst. 18, 2006.

Speech Synthesizing Sentient Query Analyser and Responder

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Speech Synthesizing Sentient Query Analyser and Responder

Diunggah oleh

Hak Cipta:

Format Tersedia

SPEECH SYNTHESIZING SENTIENT QUERY ANALYSER AND RESPONDER

A PROJECT REPORT Submitted by

S.SANTHOSH KUMAR V.KARTHIKEYAN

in partial fulfillment for the award of the degree of

COMPUTER SCIENCE AND ENGINEERING

ANNA UNIVERSITY : CHENNAI 600 025

LIST OF ABBREVATIONS 1 INTRODUCTION

About the Project

2.2 Proposed system 3 REQUIREMENTS SPECIFICATION 3.1 Introduction

CODING SNAP SHOTS REFERENCE

SRS API PHP OS SSQAR

About the project:

CHAPTER 2 SYSTEM ANALYSIS

2.1 Existing system

CHAPTER 3 REQUIREMENT SPECIFICATIONS

3.2 HARDWARE AND SOFTWARE SPECIFICATION

Hard Disk RAM Processor

80GB and Above 512 MB and Above Pentium IV and Above

3.2.2 SOFTWARE REQUIREMENTS

Language : Server Database : :

3.3 TECHNOLOGIES USED

3.3.1 INTRODUCTION TO VISUAL STUDIO:

3.3.2 INTRODUCTION TO .NET FRAMEWORK

Principal design features:

Architecture of .NET framework

3.3.3 INTRODUCTION TO VB.NET

Visual Basic 2005 (VB 8.0)

dynamically-generated classes (notably My.Forms)

CHAPTER 4 SYSTEM DESIGN

4.2 SEQUENTIAL DIAGRAM OF SSSQAR

4.2 USECASE DIAGRAM OF SSSQAR

CHAPTER 5 SYSTEM DESIGN-DETAILED

5.2 MODULES EXPLANATION Speech Analysis

NG AND TESTINGput to the XXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXX

CHAPTER 6 VERIFICATION AND VALIDATION

Black Box Testing:

Anda mungkin juga menyukai