Anda di halaman 1dari 17

Psychology of Learning PSY211 Operant/Instrumental Conditioning: Reinforcement

B. Charles Tatum

Operant/Instrumental Conditioning
Definition: A form of learning (conditioning) in which the organism is free to respond to (operate on) the environment and changes in behavior occur as the result of the stimulus consequences (reinforcement/punishment) of the spontaneous actions. Trial and Error (Trial and Success) Learning Thorndikes Puzzle Box/Skinners Operant Chamber/Tolmans Maze Law of Effect

Reinforcement and Punishment


Reinforcement: Any stimulus event that increases the likelihood of a preceding response Positive Reinforcement: The presence of a stimulus increases the likelihood of the preceding response (e.g., food, money, praise, drugs, electrical stimulation of pleasure centers in the brain). Sometimes called Reward. Negative Reinforcement: The removal of a stimulus increases the likelihood of the preceding response (e.g., remove hand from a warm stove, improve grades to lift restriction, work hard not to get fired). Punishment: Any stimulus event that decreases (suppresses) the likelihood of a preceding response Positive Punishment: The presence of a stimulus (usually aversive such as slap, a scolding, or a dirty look) decreases (suppresses) the likelihood of a preceding response. Negative Punishment: The removal of a stimulus (usually something pleasant such as TV privileges or a desirable object) decreases (suppresses) the likelihood of a preceding response. When the stimulus that is removed is a reinforcer, we call this extinction.

Reinforcer (Positive/Negative) Response

Stimulus Consequence (Onset/Offset) Punisher (Positive/Negative/Extinction)

Future Responses Increase Decrease

Stimulus Consequence

Produce (Onset)

Positive Reinforcement Reward (e.g., praise) Negative Reinforcement (e.g., nagging)

Positive Punishment (e.g., spanking) Negative Punishment Extinction (e.g., time out)

Remove (Offset)

Reinforcement and Punishment as Reciprocal Processes


The Problem Employee Attending Meetings + Doughnuts = Increase Attendance (Positive Reinforcement) Playing Computer Games + No Doughnuts = Reduce Game Playing (Negative Punishment/Extinction)

The Confused Soldier Hey Dude to Officer + Criticism = Reduce Verbal Salutations (Positive Punishment) Saluting an Officer + No Criticism = Increase Saluting as Form of Address (Negative Reinforcement)

Reinforcement and Punishment as Dynamic Processes


Child Response = Dirty Words Parent Stimulus = Spanking Future Result Reduce Foul Language (Positive Punishment) Future Result Increase Use of Spankings (Positive Reinforcement)

Parent Response = Spanking

Child Stimulus = Nice Words

Parental Dynamics

Husband Response = Bring Gifts

Wife Stimulus = Stop Sulking

Future Result Increased Gift Giving (Negative Reinforcement) Future Result Increased Sulking (Positive Reinforcement)

Wife Response = Sulking

Husband Stimulus = Gifts

Marital Dynamics

Primary versus Secondary Reinforcement


Primary: Naturally or innately reinforcing stimuli (e.g., food, water, sex) Secondary (Conditioned): Reinforcers that are dependent on their association with other reinforcers (e.g., praise, recognition, money).
UCS (Food) CS (Money) UCR (Satisfaction) CR (Satisfaction)

Generalized Reinforcer: Secondary reinforcers that have been paired with a wide variety of primary reinforcers (e.g., money, praise)

Comparison Between Classical (Pavlovian) and Operant (Instrumental) Conditioning


Classical/Pavlovian Operant/Instrumental
Responses Stimuli
Elicited (Reflex) Unconditioned (UCS) Conditioned (CS) Emitted (Spontaneous) Unobserved (Internal) Reinforcing/Punishing Discriminative (External) Somatic (Voluntary) S-R-S Snap Fingers - Roll over - Treat Deadline - Work late - Bonus Exam - Study - Good grades

Peripheral Autonomic Nervous System (Involuntary) Association Examples


S-S Light - Air Puff Tone - Knee Tap Bell - Food Powder

Acquisition, Extinction, and Spontaneous Recovery of and Operantly/Instrumentally Conditioned Response


ACQUISITION EXTINCTION
Extinction Burst

SPONTANEOUS RECOVERY

SPONTANEOUS RECOVERY

Break (Interruption)

REINFORCED TIME OR TRIALS

NONREINFORCED TIME OR TRIALS

Break (Interruption)

Phases and Principles of Operant/Instrumental Conditioning


Acquisition: Gradual increase in responding when reinforcing stimulus follows the behavior (e.g., toilet training, athletic skills, stupid pet tricks) Successive Approximation (Shaping) Chaining: Performing behaviors in a sequence (e.g., ordering take-out) Forward Chaining: Train first-to-last Backward Chaining: Train last-to-first Superstitious Behavior Conditions of Reinforcement Reward Delay (delayed gratification) Reward Contingency (predictability is good) Reward Preference (chocolate better than raisins) Reward Amount Diminishing returns Contrast effects Frequency effects

Changes in Effectiveness of Reward Amount

100 90

80

Work Proficiency

70 60 50 40

30
20 10 0

10

20

30

40

50

60

70

80

90

100

Salary Increase

Phases and Principles of Operant/Instrumental Conditioning (Continued)


Acquisition (continued) Response Characteristics Skeletal muscles (voluntary [somatic] nervous system) easier to condition than smooth muscles and glands (involuntary [autonomic] nervous system) Simple responses easier to condition than complex responses Motivational Level: Learning is faster and stronger when learner is deprived of rewards (better for primary than secondary rewards) Competing Rewards: Conditioning is slow and weak if other (competing) behaviors are also being rewarded Awareness Not necessary for conditioning Leads to faster conditioning

Phases and Principles of Operant/Instrumental Conditioning (Continued)


Extinction: Reduced responding when the reinforcing stimulus is removed (e.g., ignore bed-time tantrums) Extinction Burst Reinforcement Variability (Schedules of Reinforcement) Continuous Intermittent (partial) Stimulus Variability (e.g., extinguishing smoking habit) Response Variability (e.g., extinguishing athletic skills) Spontaneous Recovery: Return of the extinguished behavior following an interruption (e.g., child stays with grandma) Resurgence: Return of a behavior following the extinction of another behavior (e.g., extinction of day-care tantrums produces return to bed-time tantrums). Similar to regression, but not always a return to a more primitive behavior.

Hedonic Theory Reinforcement strengthens behavior because it produces pleasurable sensations Problems with theory Masochism: Pain (unpleasant sensations) are reinforcers Negative reinforcement: Removal of aversive stimuli reduces discomfort but is not pleasurable Tautology: If it makes you feel good, its a reinforcer. If its a reinforcer, it makes you feel good (circular reasoning). Drive Reduction (Hull) Drive: A motivational force. Tension from unfulfilled needs or desires Primary Drives (e.g. hunger, thirst) Secondary Drives (e.g., success, popularity) Reinforcer: Any stimulus that reduces drive by fulfilling the needs and desires (e.g., food, water, money) Difficulties with the theory Some reinforcers do not reduce drives (e.g., electrical stimulation of the brain, copulation without ejaculation) Some motivations do not create states of tension that need to be reduced (e.g., exploratory behavior)

Theories of Reinforcement

Theories of Reinforcement (continued)


Relative Value (Premack) Reinforcers viewed as behaviors (e.g., food smell vs. chewing behavior) Relative value: Some behaviors are more probable (more preferred) than others (e.g., partying vs. studying) Premack Principle: High probability (preferred) behavior reinforces low probability (non-preferred) behavior Problems with theory How to explain strong secondary reinforcers (e.g., why is verbal praise such a powerful reward?) Sometimes low probability behavior reinforces high probability behavior if the less likely behavior has been prevented (e.g., deprivation of study time) Response Deprivation (Timberlake & Allison): Relative value of responses depends on relative deprivation. Behaviors that are not allowed to occur will reinforce other, less deprived, behaviors (e.g., Prohibition in the 1920s made drinking booze a much stronger reward).

Escape Conditioning
Example # 1 UCS (hot water) Example # 2 UCS (shock) UCR (pain reaction) Operant Response (jump hurdle) Negative Reinforcer (remove shock) UCR (pain reaction) Operant Response (turn nozzle) Negative Reinforcer (remove hot water)

Avoidance Conditioning
Example # 1 Warning Signal (flushing toilet) Example # 2 Warning Signal (ringing bell) Operant Response (jump hurdle) Negative Reinforcer (avoid shock) Operant Response (turn nozzle) Negative Reinforcer (avoid hot water)

Theories of Avoidance
Two Processes Classical Conditioning UCS: A noxious stimulus that produces an unpleasant reaction (e.g., flinch, startle reaction) or an escape response (e.g., jump aside, run away) CS: Some signal that precedes the noxious stimulus (light, bell, flush) Operant Conditioning Operant Response: Response that removes the noxious stimulus Negative Reinforcer: Termination of a noxious stimulus Explanation: The CS becomes noxious and the animal learns to escape the noxious CS Problems Avoidance continues even after CS loses its aversive qualities Avoidance response does not extinguish even though CS is no longer paired with UCS

UCS (hot water) CS (flush)

UCR (jump aside)

Operant Response (turn nozzle)

Negative Reinforcer (remove hot water)

One Process: Only operant conditioning is involved in avoidance. The warning signal (CS) becomes a discriminative stimulus.

Anda mungkin juga menyukai