Anda di halaman 1dari 30

Artificial Intelligence

Machine Learning

Institute of Information and Communication Technology University of Sindh, Jamshoro

Dr. Zeeshan Bhatti

BSSW-PIV

Chapter 5

By:DR. Dr. ZEESHAN

Zeeshan BHATTI

Bhatti

11

MACHINE LEARNING: A DEFINITION

Definition:

A computer program is said to learn from

experience E with respect to some class of tasks T and performance measure P, if its performance at

tasks in T, as measured by P, improves with

experience E.

WHY IS MACHINE LEARNING IMPORTANT?

Some tasks cannot be defined well, except by examples (e.g., recognizing people).

Relationships and correlations can be hidden within large amounts

of data. Machine Learning/Data Mining may be able to find these

relationships.

Human designers often produce machines that do not work as well as desired in the environments in which they are used.

Why is Machine Learning

Important (Cont’d)?

The amount of knowledge available about certain tasks might be too large for explicit encoding by humans (e.g., medical diagnostic).

Environments change over time. New knowledge about tasks is constantly being discovered by

humans. It may be difficult to continuously re-design systems “by hand”.

WHY “LEARN”?

Machine learning is programming computers to optimize a performance criterion using example

data or past experience.

There is no need to “learn” to calculate payroll

Learning is used when:

  • Human expertise does not exist (navigating on Mars),

  • Humans are unable to explain their expertise (speech recognition)

  • Solution changes in time (routing on a computer network)

  • Solution needs to be adapted to particular cases (user biometrics)

WHY MACHINE LEARNING?

No human experts

  • industrial/manufacturing control

  • mass spectrometer analysis, drug design, astronomic discovery

Black-box human expertise

  • face/handwriting/speech recognition

  • driving a car, flying a plane

Rapidly changing phenomena

  • credit scoring, financial modeling

  • diagnosis, fraud detection

Need for customization/personalization

  • personalized news reader

  • movie/book recommendation

RELATED FIELDS

data mining control theory statistics decision theory machine information theory learning cognitive science databases evolutionary psychological
data
mining
control theory
statistics
decision theory
machine
information theory
learning
cognitive science
databases
evolutionary
psychological models
neuroscience
models

Machine learning is primarily concerned with the accuracy and effectiveness of the computer system.

WHAT WE TALK ABOUT WHEN WE TALK

ABOUT“LEARNING”

Learning general models from a data of particular examples

Data is cheap and abundant (data warehouses, data

marts); knowledge is expensive and scarce.

Example in retail: Customer transactions to consumer behavior:

People who bought “Da Vinci Code” also bought “The Five People You Meet in Heaven” (www.amazon.com)

Build a model that is a good and useful approximation to the data.

DATA MINING/KDD

Definition := “KDD is the non-trivial process of identifying valid, novel, potentially useful, and

ultimately understandable patterns in data” (Fayyad)

Applications:

Retail: Market basket analysis, Customer relationship management (CRM) Finance: Credit scoring, fraud detection Manufacturing: Optimization, troubleshooting Medicine: Medical diagnosis Telecommunications: Quality of service optimization Bioinformatics: Motifs, alignment Web mining: Search engines ...

WHAT IS MACHINE LEARNING?

Machine Learning

  • Study of algorithms that

  • improve their performance

  • at some task

  • with experience

Optimize a performance criterion using example data or past

experience. Role of Statistics: Inference from a sample

Role of Computer science: Efficient algorithms to

Solve the optimization problem Representing and evaluating the model for inference

GROWTH OF MACHINE LEARNING

Machine learning is preferred approach to

  • Speech recognition, Natural language processing

  • Computer vision

  • Medical outcomes analysis

  • Robot control

  • Computational biology

This trend is accelerating

  • Improved machine learning algorithms

  • Improved data capture, networking, faster computers

  • Software too complex to write by hand

  • New sensors / IO devices

  • Demand for self-customization to user, environment

  • It turns out to be difficult to extract knowledge from human expertsfailure of expert systems in the 1980’s.

Alpydin & Ch. Eick: ML Topic1

11

APPLICATIONS

Association Analysis Supervised Learning

  • Classification

  • Regression/Prediction

Unsupervised Learning

Reinforcement Learning

LEARNING ASSOCIATIONS

Basket analysis:

P (Y | X ) probability that somebody who buys X also buys Y where X and Y are products/services.

Example: P ( chips | beer ) = 0.7

Market-Basket transactions

TID Items

TID

Items

 

2

Bread, Diaper, Beer, Eggs

3

Milk, Diaper, Beer, Coke

4

Bread, Milk, Diaper, Beer

5

5 Bread, Milk, Diaper, Coke

Bread, Milk, Diaper, Coke

  • 1 Bread, Milk

CLASSIFICATION

Example: Credit scoring

Differentiating between low- risk and high-risk customers from their income and savings

CLASSIFICATION Example: Credit scoring Differentiating between low- risk and high-risk customers from their income and savings

Discriminant: IF income > θ 1 AND savings > θ 2 THEN low-risk ELSE high-risk

CLASSIFICATION Example: Credit scoring Differentiating between low- risk and high-risk customers from their income and savings

Model

14

CLASSIFICATION: APPLICATIONS

Aka Pattern recognition

Face recognition: Pose, lighting, occlusion (glasses, beard), make-up, hair style

Character recognition: Different handwriting styles. Speech recognition: Temporal dependency.

  • Use of a dictionary or the syntax of the language.

  • Sensor fusion: Combine multiple modalities; eg, visual (lip image) and acoustic for speech

Medical diagnosis: From symptoms to illnesses Web Advertizing: Predict if a user clicks on an ad on the Internet.

FACE RECOGNITION

Training examples of a person

FACE RECOGNITION Training examples of a person Test images AT&T Laboratories, Cambridge UK http://www.uk.research.att.com/facedatabase.html 16
FACE RECOGNITION Training examples of a person Test images AT&T Laboratories, Cambridge UK http://www.uk.research.att.com/facedatabase.html 16
FACE RECOGNITION Training examples of a person Test images AT&T Laboratories, Cambridge UK http://www.uk.research.att.com/facedatabase.html 16
FACE RECOGNITION Training examples of a person Test images AT&T Laboratories, Cambridge UK http://www.uk.research.att.com/facedatabase.html 16

Test images

FACE RECOGNITION Training examples of a person Test images AT&T Laboratories, Cambridge UK http://www.uk.research.att.com/facedatabase.html 16
FACE RECOGNITION Training examples of a person Test images AT&T Laboratories, Cambridge UK http://www.uk.research.att.com/facedatabase.html 16
FACE RECOGNITION Training examples of a person Test images AT&T Laboratories, Cambridge UK http://www.uk.research.att.com/facedatabase.html 16
FACE RECOGNITION Training examples of a person Test images AT&T Laboratories, Cambridge UK http://www.uk.research.att.com/facedatabase.html 16

AT&T Laboratories, Cambridge UK

http://www.uk.research.att.com/facedatabase.html

16

SUPERVISED LEARNING: USES

Example: decision trees tools that create rules

Prediction of future cases: Use the rule to predict the output for future inputs

Knowledge extraction: The rule is easy to understand Compression: The rule is simpler than the data it explains Outlier detection: Exceptions that are not covered by the rule, e.g., fraud

17

UNSUPERVISED LEARNING

Learning “what normally happens”

No output

Clustering: Grouping similar instances

Other applications: Summarization, Association Analysis

Example applications

  • Customer segmentation in CRM

  • Image compression: Color quantization

  • Bioinformatics: Learning motifs

REINFORCEMENT LEARNING

Topics:

  • Policies: what actions should an agent take in a particular situation

  • Utility estimation: how good is a state (used by policy)

No supervised output but delayed reward

Credit assignment problem (what was responsible for the outcome)

Applications:

  • Game playing

  • Robot in a maze

  • Multiple agents, partial observability, ...

ARCHITECTURE OF A LEARNING SYSTEM

feedback

 
 

critic

changes

performance standard

ARCHITECTURE OF A LEARNING SYSTEM feedback critic changes performance standard percepts ENVIRONMENT learning element performance element

percepts

ENVIRONMENT

learning element
learning
element

performance

element

ARCHITECTURE OF A LEARNING SYSTEM feedback critic changes performance standard percepts ENVIRONMENT learning element performance element
ARCHITECTURE OF A LEARNING SYSTEM feedback critic changes performance standard percepts ENVIRONMENT learning element performance element

actions

ARCHITECTURE OF A LEARNING SYSTEM feedback critic changes performance standard percepts ENVIRONMENT learning element performance element

knowledge

learning goals

problem

 

generator

 

LEARNING ELEMENT

Design affected by:

performance element used

  • e.g., utility-based agent, reactive agent, logical agent

functional component to be learned

  • e.g., classifier, evaluation function, perception-action function,

representation of functional component

  • e.g., weighted linear function, logical theory, HMM

feedback available

  • e.g., correct action, reward, relative preferences

DIMENSIONS OF LEARNING SYSTEMS

type of feedback

  • supervised (labeled examples)

  • unsupervised (unlabeled examples)

  • reinforcement (reward)

representation

  • attribute-based (feature vector)

  • relational (first-order logic)

use of knowledge

  • empirical (knowledge-free)

  • analytical (knowledge-guided)

DESIGNING A LEARNING SYSTEM:

AN EXAMPLE

1. Problem Description

  • 2. Choosing the Training Experience

  • 3. Choosing the Target Function

  • 4. Choosing a Representation for the Target

Function

  • 5. Choosing a Function Approximation Algorithm

  • 6. Final Design

1. PROBLEM DESCRIPTION:

A

CHECKER LEARNING PROBLEM

Task T: Playing Checkers

Performance Measure P: Percent of games won against opponents

Training Experience E: To be selected ==> Games Played against itself

2. CHOOSING THE TRAINING EXPERIENCE

Direct versus Indirect Experience [Indirect Experience gives rise to the credit assignment problem and is thus more difficult]

Teacher versus Learner Controlled Experience

[the teacher might

provide training examples; the learner might suggest interesting examples and ask the teacher for their outcome;

or the learner can be completely on its own with no access to

correct outcomes]

How Representative is the Experience? [Is the training experience

representative of the task the system will actually have to

solve? It is best if it is, but such a situation cannot

systematically be achieved]

25

3. CHOOSING THE TARGET FUNCTION

Given a set of legal moves, we want to learn how to choose the

best move [since the best move is not necessarily known, this is an optimization problem]

ChooseMove: B --> M is called a Target Function [ChooseMove,

however, is difficult to learn. An easier and related target function to learn is V: B --> R, which assigns a numerical score to each board. The better the board, the higher the score.]

Operational versus Non-Operational Description of a Target

Function [An operational description must be given]

Function Approximation [The actual function can often not be learned and must be approximated]

4. CHOOSING A REPRESENTATION FOR THE TARGET FUNCTION

Expressiveness versus Training set size [The more expressive the

representation of the target function, the closer to the “truth”

we can get. However, the more expressive the representation, the more training examples are necessary to choose among

the large number of “representable” possibilities.]

Example of a representation:

  • x1/x2 = # of black/red pieces on the board

 x3/x4 = # of black/red king on the board  x5/x6 = # of black/red
x3/x4 = # of black/red king on the board
x5/x6 = # of black/red pieces threatened by
red/black
wi’s are adjustable
or “learnable”
coefficients
V(b) = w0+w1.x1+w2.x2+w3.x3+w4.x4+w5.x5+w6.x6
^

27

5. CHOOSING A FUNCTION APPROXIMATION ALGORITHM

Generating Training Examples of the form <b,Vtrain(b)> [e.g. <x1=3,

x2=0, x3=1, x4=0, x5=0, x6=0, +100 (=blacks won)]

  • Useful and Easy Approach: Vtrain(b) <- V(Successor(b))

Training the System

^

  • Defining a criterion for success [What is the error that needs to be minimized?]

  • Choose an algorithm capable of finding weights of a linear function that minimize that error [e.g. the Least Mean Square (LMS) training rule].

6. FINAL DESIGN FOR CHECKERS LEARNING

The Performance Module: Takes as input a new board and outputs a trace of the game it played against itself.

The Critic: Takes as input the trace of a game and outputs a set of training examples of the target function

The Generalizer: Takes as input training examples and outputs a hypothesis which estimates the target function. Good generalization to new cases is crucial.

The Experiment Generator: Takes as input the current hypothesis (currently learned function) and outputs a new problem (an initial

board state) for the performance system to explore

29

ISSUES IN MACHINE LEARNING (I.E., GENERALIZATION)

What algorithms are available for learning a concept?

How well do they perform?

How much training data is sufficient to learn a concept with high confidence?

When is it useful to use prior knowledge? Are some training examples more useful than others?

What are best tasks for a system to learn?

What is the best way for a system to represent its knowledge?