Anda di halaman 1dari 6

ABSTRACT

In a country like India where different scripts are in use, automatic identification of handwritten script facilitates many important applications such as automatic transcription of multilingual documents and for the selection of script specific OCR in a multilingual environment. Existing script identification techniques depend on various features extracted from document images at character, word, text line or block level. All of these tasks fall under the general heading of Document analysis, which has been a fast growing area of research in recent years. We propose a novel method towards multi-script identification at block level. We describe a system that automatically identifies the script used in documents stored electronically in image form. The system can learn to distinguish any number of scripts. It develops a set of representative symbols (templates) for each script by clustering textual symbols from a set of training To identify a new document's script, the system compares a subset of symbols from the document to each script's templates, screening out rare or unreliable templates, and choosing the script whose templates provide the best match. The increase in usage of handheld devices which accept handwritten input has created a growing demand for algorithms that can efficiently analyze and retrieve handwritten data.

TABLE OF CONTENTS

CHAPTER NO.

TITLE

PAGE NO

ABSTRACT LIST OF FIGURES LIST OF TABLES LIST OF KEYWORDS

i iv v vi

1.

INTRODUCTION

1.1 1.2

History of Script Identification Script Identification 1.2.1 1.2.2 1.2.3 Character, word or line analysis Text block analysis Hybrid analysis 4 3

1.3

Feasibility Study 1.3.1 1.3.2 Existing System Proposed System

7 7 9 10 10 10 10

1.4

System Requirements 1.4.1 1.4.2 1.4.3 Hardware Requirements Software Requirements Software Description

2.

LITERATURE REVIEW

12

3.

SYSTEM DESIGN 3.1 3.2 3.3 Pre-Processing Feature Extraction Neural Network

18 23 25 27

4.

IMPLEMENTATION

28

4.1

Pre-Processing 4.1.1 Codes

29 30 31 31 37

4.2 4.3

Feature Extraction Neural Network Training 4.3.1 Using the Neural Network Fitting Tool GUI

5.

SYSTEM TESTING

50

5.1 5.2

Testing the whole system Black box Testing and White box Testing

52 52

6.

CONCLUSION AND FUTURE WORK

55

6.1 6.2

Conclusion Future work

56 56

APPENDIX

57

Snapshots

58

APPENDIX

64

Bibliography Reference

65 66

LIST OF FIGURES

FIGURE NO.

FIGURE NAME

PAGE NO.

3.1 3.2 3.3 3.4 3.5 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10 4.11 4.12 4.13 4.14 A.1 A.2 A.3 A.4 A.5

Overall Architecture of script Identification Classification Process of Identification Pre-Processing Image Retrieval Process Neural Network Training Performance Graph of Identification Regression Graph of Identification Neural Network Fitting Tool Starting Page Neural Network Fitting Tool Neural Network Fitting Data Set Chooser Validation and Test Data Network Size Assumption Train Network Neural Network Training Regression Graph Regression Graph (Plot Regression) Evaluate Network Save the Result identifying single Number Identifying Block of Numbers Identifying Block of Numbers Command Prompt for displaying Numbers Input for the Alphabets Combined with Number

18 19 21 23 24 30 32 33 36 37 38 39 40 41 42 43 44 45 46 58 59 60 71

72

A.6

Output for the Script containing Alphabets with Numbers 73

LIST OF TABLES

Table No.
1. 2.

Table Name
Summarization of the Methods on Script Identification Difference between Black-box and white-box Testing

Page No.
5 50

LIST OF KEYWORDS
1. 2. 3. 4. 5. 6. 7. 8. OCR SOFM LVQ MATLAB ECM ANN FRS SRS Optical Character Recognition Self Organizing Feature Maps Learning Vector Quantization Matrix Laboratory Enterprise Content Management Artificial Neural Network Functional Requirement Specification System Requirement Specification

Anda mungkin juga menyukai