Page 2
PricewaterhouseCoopers May 05
Text mining is a new area and many questions remain
unanswered.
For example, further work is needed to understand:
Page 3
PricewaterhouseCoopers May 05
Text analysis has a huge potential for insurance as
shown by research and industrial studies.
.
There has been recognition for some time now that data about
incidents contain information which allows for a proactive risk
management approach (Feyer and Williamson 1998).
Page 4
PricewaterhouseCoopers May 05
PwC Australia Case Study. Using Text Mining in Insurance.
Client: Major Australian Insurer
Revisit data & assumptions
Analysis,
Data design, Implement,
Client
issue
Client Issue:
Perceived inadequacies in the level of information captured by current injury coding
system led to the need to assess the potential value that using textual information and
text mining facilities could add to the organisation:
To explore the possibilities and benefits of augmenting their existing accident coding
system using free text
To see if adding textual information would result in increased precision of claim cost
prediction
To suggest how text mining could be used for improvement in other areas of the
business
To assist in making decision regarding investing in a commercial text mining software
package that would suit clients needs best.
Page 5
PricewaterhouseCoopers May 05
Assessing value of textual information for the client.
.
Page 6
PricewaterhouseCoopers May 05
Data Description.
.
Page 7
PricewaterhouseCoopers May 05
Text Mining Process.
10. Compare
results and
conclude
Page 8
PricewaterhouseCoopers May 05
Text Mining Process.
Stage 1. Does textual information have predictive value?
1. Prepare 2. Discover
3. Reduce
TextData concepts
concepts
(text mining)
Step 2. Discover concepts
Concepts are words or word Prepared
TransData
combinations resident in the
text. 6. Predictive
5. Derive domain 4. Select predictive
The mining of TextData required modelling
with concepts only
relevant concepts concepts
Page 9
PricewaterhouseCoopers May 05
Text Mining Process.
Stage 1. Does textual information have predictive value?
1. Prepare 2. Discover
3. Reduce
TextData concepts
concepts
(text mining)
Step 3. Reduce the number of
concepts. Prepared
TransData
About 8000 concepts were
discovered in the Discover 6. Predictive
concepts phase. modelling
with concepts only
5. Derive domain
relevant concepts
4. Select predictive
concepts
Page 10
PricewaterhouseCoopers May 05
Text Mining Process.
Stage 1. Does textual information have predictive value?
predictability
LEG
by100using
LACERATED 99.43
TreeNet
FRACTURE
to identify
92.56
the 6. Predictive
modelling
5. Derive domain 4. Select predictive
most predictive
STRESS concepts
92.27
with concepts only
relevant concepts concepts
Page 11
PricewaterhouseCoopers May 05
TreeNet Overview
where each Ti is a small tree. The first tree in the series contributes a relatively
large amount to the model, while subsequent trees contribute successively
smaller corrections. A model normally consists of 400 to 800 small
trees, each typically no larger than four to eight terminal nodes.
The final model is a collection of weighted and summed trees.
Page 12
PricewaterhouseCoopers May 05
TreeNet vs boosting
Page 13
PricewaterhouseCoopers May 05
TreeNet
The first tree in the series contributes a relatively large amount to the model,
while subsequent trees contribute successively smaller corrections.
A model normally consists of 400 to 800 small trees, each typically no larger
than four to eight terminal nodes.
The final model is a collection of weighted and summed trees.
Page 14
PricewaterhouseCoopers May 05
MART or TreeNet
In any Predictive Modeling situation:
Y Target or Response Variable
X Inputs or Predictors
F( X) -values predicted by the model.
Loss Function L( Y, F) measures errors between Y and F( X). Typical choices of L( Y,
F) are
Squared error L( Y, F) =(Y-F(X))2
Absolute error L( Y, F) =|(Y-F(X)|
Page 15
PricewaterhouseCoopers May 05
TreeNet Optimization Strategy:
Page 16
PricewaterhouseCoopers May 05
MART or TreeNet
Page 17
PricewaterhouseCoopers May 05
MART or TreeNet
Page 18
PricewaterhouseCoopers May 05
MART or TreeNet
Page 19
PricewaterhouseCoopers May 05
MART Algorithm for the given estimate of loss function, K and M
Page 20
PricewaterhouseCoopers May 05
MART Algorithm . Example for Least Squares Loss
function (linear regression):
Initial guess {F 0 (Xi)}={ mean( Yi)}
FOR m = 1 TO M
Negative gradient gm is the vector of residuals, {Yi Fm- 1(Xi)} = {Residuali}
Fit an K-node regression tree to the current residuals.This will partition
observations into K mutually exclusive groups
For each given node: hm(X i ) = within-node mean( Residuali )
Update: {Fm(X i )} = {Fm- 1(X i )} + hm (X i )
END FOR
Page 21
PricewaterhouseCoopers May 05
TreeNet: further guard against overfitting
It turns out that it is beneficial to by slowing down the learning rate
and introducing the shrinkage parameter v, 0< v <1 into the update
step:
{Fm(X i )} = {Fm-1(X i )}+ v hm(X i )
Parameters v and M are connected: for the same level of
accuracy,small v require larger M. The best strategy appears to be to
set v to be less than 0.1 and choose M by early stopping (Friedman,
2001).
Page 22
PricewaterhouseCoopers May 05
MART or TreeNet
Accuracy of MART.
(Hastie, Friedman, Tibshirani, 2001):
Classification problem: spam vs email
CART: 8.7% error rate
MARS: 5.5% error rate
MART: 4% error rate
Page 23
PricewaterhouseCoopers May 05
TreeNet advantages:
Page 24
PricewaterhouseCoopers May 05
TreeNet disadvantages:
Page 25
PricewaterhouseCoopers May 05
Text Mining Process.
Stage 1. Does textual information have predictive value?
deriving LEG
additional
100
features at
LACERATED 99.43
point FRACTURE
five. 92.56
6. Predictive
modelling
5. Derive domain 4. Select predictive
relevant concepts concepts
with concepts only
This encompassed
STRESS
the grouping and
92.27
EYE 86.56
combining
HERNIA
of concepts
84.11
so that the
most predictive
TRUCK concepts were
82.62
combined BURN with 73.06those similar in the
meaningLADDER
(eg, stress
58
and anxiety, 7. Evaluate results
8. Predictive
modelling
9. Predictive
modelling
... ...
laceration and abrasion) to with features
only
with concepts
and features
increase frequencies
10. Compare
results and
conclude
Page 26
PricewaterhouseCoopers May 05
Text Mining Process.
Stage 1. Does textual information have predictive value?
concepts only.
LACERATED 99.43
6. Predictive
5. Derive domain 4. Select predictive
FRACTURE 92.56 modelling
Build CART predictive
STRESS 92.27
with concepts only
relevant concepts concepts
concepts identified
by TreeNet and the
derived concepts
10. Compare
results and
conclude
Page 27
PricewaterhouseCoopers May 05
Text Mining Process.
Stage 1. Does textual information have predictive value?
Prepared
referring LEG
to
Concept name importance
100
gains charts 99.43
LACERATED 7. Evaluate results
8. Predictive
modelling
with features
only
9. Predictive
modelling
with concepts
and features
FRACTURE 92.56
modelSTRESS
precision.92.27
EYE 86.56
10. Compare
HERNIA 84.11 results and
conclude
The TreeNet
TRUCK model
82.62 using
concepts
BURN
LADDER
only was
73.06
58
75.7%
precise...on test ...data.
Page 28
PricewaterhouseCoopers May 05
Text Mining Process.
2. Discover
Stage 2. Does textual information 1. Prepare
TextData concepts
3. Reduce
concepts
(text mining)
add value to existing injury
codings?
Prepared
Created models with demographic TransData
information.
8. Predictive 9. Predictive
7. Evaluate results modelling modelling
with features with concepts
only and features
10. Compare
results and
conclude
Page 29
PricewaterhouseCoopers May 05
Textual information adds predictive power to the model:
Medical Benefits claim cost for sprains for the next 6 months (top 5%)
Page 30
PricewaterhouseCoopers May 05
i
2004 PricewaterhouseCoopers. All rights reserved. PricewaterhouseCoopers refers to the network of member firms of
PricewaterhouseCoopers International Limited, each of which is a separate and independent legal entity.