Alexandria ACM SC - Introduction To Natural Language Processing

Introduction to Natural Language Processing
Ahmad M. Bakr Computer and Systems Engineering Department Faculty of Engineering Alexandria University, Egypy
Agenda
Introduction.
Basic text processing techniques.

Information Retrieval. Sentiment Analysis. Named Entity Recognition. Question Answering. Relation Extraction.
Introduction
NLP is a branch of artificial intelligence that deals
with analyzing, understanding and generating the languages that humans use naturally in order to interface with computers. Natural language processing aims to teach computers to understand the way humans learn and use language.
Introduction
Speech processing: get flight information or book a hotel over the
phone. Information extraction: discover names of people and events they participate in, from a document. Machine translation: translate a document from one human language into another. Question answering: find answers to natural language questions in a text collection or database. Summarization: generate a short biography of Noam Chomsky from one or more news articles.
Text Processing
Text processing is manipulation of text, especially
the transformation of text from one format to another. Usually from plain text (set of paragraphs) to a form that is easy to be included in calculations. Vector Space Model (VSM) is one of the forms used by application to represent document as a vector of its words.
dj={W1,W2, W3 . Wn}
Each word is assigned a weight (i.e TF-IDF)

Weight = Term Frequency * 1/(Document Frequency)
Similarity between two documents can be
calculated as the similarity between the vectors of these documents.
Information Retrieval
Information retrieval is the activity of obtaining
information resources relevant to an information need from a collection of information resources.
Usually information is indexed to speed up the
queries. Inverted Index is one of the primary attempts to index text based on its words.
Can we use inverted index to search for
sentences A B C?
Document Index Graph
Sentiment Analysis
Sentiment analysis or opinion mining refers to the
application of natural language processing, computational linguistics, and text analytics to identify and extract subjective information in source materials.
Sentiment Analysis
Techniques:
Maintaining a list of words for each class
Example This is a nice movie , This is a bad movie Using classifiers that trained with sentences for each class separately
Named Entity Recognition

NER is a subtask of information extraction that
seeks to locate and classify atomic elements in text into predefined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc.
Name Entity Recognition

Approaches:
Database based recognition (word net) Rule based model Statistical models (ex. HMM and Maximum Entropy)

Wikipedia-based NER

Wikipedia-based NER
Index all pages titles
Two phase algorithm

Given a text, search all titles. (phase one) Score the candidate titles (phase two)
What factors should the scoring formula consider
Question Answering
What is Question Answering
QA is a computer science discipline within the
fields of information retrieval and natural language processing (NLP), which is concerned with building systems that automatically answer questions posed by humans in a natural language.
Question Answering
A QA implementation, usually a computer
program, may construct its answers by querying a structured database of knowledge or information, usually a knowledge base. More commonly, QA systems can pull answers from an unstructured collection of natural language documents.
Question Answering
Question Answering
Question Classification
Question classifier module determines the type of
question and the type of answer.

Examples:1) Who discovered x-rays? should be classied
into the type of human (individual) Examples: 2) Where is Alexandria Located ? should be classied into the type of place
Rule-based approaches Using Classifiers to be trained with possible question
types Question is put in a form of parse tree to capture the relationship between its entities (i.e subjects, objects etc) The main purpose of the parse tree is to understand the question and the links between its entities.
Question Answering
Query Formulation
Apply text processing techniques to form a query
from the question. Techniques as:

Stemming (Swimming Swim)
Adding synonymous (USA United States of America)

Give weights to words of the question (nouns takes higher
weights)
Question Answering
Search knowledge base
The main target is to identify the paragraphs that
possibly contain answers to the users question Knowledge based is usually indexed.
Answers Extraction
Parse the candidate paragraphs to extract
sentences with possible answers Construct the parse tree of the matches sentences Parse tree gives insights about the relationship between the entities of a candidate sentence Rank the possible answers based on their relevance to the question.

Alexandria ACM SC - Introduction To Natural Language Processing

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Alexandria ACM SC - Introduction To Natural Language Processing

Diunggah oleh

Hak Cipta:

Format Tersedia

Introduction to Natural Language Processing

Basic text processing techniques.

Each word is assigned a weight (i.e TF-IDF)

Similarity between two documents can be

calculated as the similarity between the vectors of these documents.

information resources relevant to an information need from a collection of information resources.

Named Entity Recognition

Name Entity Recognition

Name Entity Recognition

Name Entity Recognition

Two phase algorithm

What factors should the scoring formula consider

QA is a computer science discipline within the

question and the type of answer.

Rule-based approaches Using Classifiers to be trained with possible question

from the question. Techniques as:

Adding synonymous (USA United States of America)

Anda mungkin juga menyukai