Anda di halaman 1dari 40

Electronic copy available at: http://ssrn.

com/abstract=1533346
Improving Auditors Fraud Judgments Using a Frequency Response Mode






Natalia Kochetova-Kozloski
Saint Marys University

William F. Messier, Jr.
University of Nevada, Las Vegas
Norwegian School of Economics and Business Administration (NHH)

Aasmund Eilifsen
Norwegian School of Economics and Business Administration








Forthcoming in Contemporary Accounting Research.





*Corresponding Author:
Sobey School of Business, Saint Marys University
SB 318, 903 Robie St., Halifax, Nova Scotia, B3H 3C3 Canada
Tel (902) 420-5800
E-mail natalia.k@smu.ca



We thank Liesbeth Bruynseels, Gerd Gigerenzer, Derek Koehler, Susan McCracken, Steve Salterio, Hun-
Tong Tan, and Scott Vandervelde for their helpful comments on the paper. We appreciate comments
provided by two anonymous reviewers, the participants of the midyear meeting of the Auditing Section,
the annual meeting of AAA, the CAAA conference, the Spring Camp at Tilburg University, and the
workshops at Georgia State University and Waterloo Decision Research group at the University of
Waterloo. The paper benefited from capable research assistance provided by Andrei Atapin, Pako Chan,
and Aly Habdihai. Professor Messier received financial support for this research from the Kenneth and
Tracy Knauss Endowed Chair in Accounting at UNLV and the PricewaterhouseCoopers Professor II
position at NHH.
Electronic copy available at: http://ssrn.com/abstract=1533346

1
Improving Auditors Fraud Judgments Using a Frequency Response Mode


Abstract
One hundred and fifty auditors participated in a study that examines whether auditors
probabilistic judgments are closer to a Bayesian benchmark when auditors make judgments using
a frequency response mode versus a probability response mode. We test a series of hypotheses
that examine the effect of using a frequency response mode by professional auditors both within
and outside their knowledge domain (fraud or medical case context) on assessing the likelihood
of rare events. The results show that the auditors responses across the two case contexts (fraud
and medical case) using a frequency response mode are closer to the Bayesian benchmark. In
addition, we find that (1) the deviations in the auditors responses from the Bayesian benchmark
across both response modes are significantly smaller for the fraud case in the low base rate
condition only and (2) the deviations in the auditors responses from Bayesian benchmark for the
fraud case using a frequency response mode relative to the probability response mode are smaller
in the low base rate condition than the other two base rate conditions. These findings contribute
to research on auditor judgment and decision-making, and demonstrate how the use of a
frequency response mode can improve auditors assessment of fraud.



Keywords: Base rate neglect; Fraud risk; Frequency response mode; Probabilistic judgment.

JEL: C11, M40, M42

2
Improving Auditors Fraud Judgments Using a Frequency Response Mode


1. Introduction
During the performance of an audit, an auditor obtains and evaluates evidence about the
likelihood of events that may impact the financial statements (e.g., Are accounts receivable fairly
stated? or Is management acting fraudulently?). Seldom is such evidence perfectly diagnostic of
the true state of the clients financial information. Thus, it is important for auditors to
formulate appropriate probabilistic judgments based on available evidence (Smith and Kida
1991). If the auditor makes an incorrect decision about the true state of the clients financial
statements (under-/or over-estimates the probability of an event, such as misstatement due to
error or fraud), it can result in loss of reputation and potential litigation, and/or inefficient use of
the firms resources. Prior research in auditing (e.g., Kinney 1984, 1989; Leslie 1984) has
proposed that a Bayesian approach be used to model the auditors probability judgment process
for such tasks as assessing the likelihood of a control failure, material misstatement, or
management fraud.
While a Bayesian approach has been proposed as a normative model of decision-making
under uncertainty, a stream of behavioral research in the early 1970s by Kahneman and Tversky
(e.g., 1972; 1973) concluded that individuals judgments of probabilistic events do not conform
to a Bayesian approach. Kahneman and Tversky (and numerous other researchers) find that
individuals are prone to a number of biases when solving Bayesian-type problems. Their
research suggests that such biases occur because of limited human information processing
abilities, and that individuals follow heuristics to cope.
One of the key probabilistic judgment tasks where auditors may fall prey to heuristics and
biases is the assessment of the likelihood of management fraud. Auditing researchers have

3
documented that auditor performance in assessing fraud likelihood is problem-laden (e.g., Asare
and Wright 2004; Wilks and Zimbelman 2004; see Nieschwietz, Schultz, and Zimbelman 2000
for a review). As accounting firms develop more sophisticated data-mining tools and profile
fraudulent clients in their decision support systems (e.g., KPMGs KRisk (Bell, Bedard,
Johnstone, and Smith 2002) and PwCs FRISK (Winograd, Gerson, and Berlin 2000)), one of the
important sub-tasks in assessing the probability of fraud is the auditors utilization of information
provided by such systems. The question arises: does an auditor normatively (in a Bayesian sense)
utilize statistical properties of a fraud-profiling decision aid, assuming that specific
characteristics of a given client (or a group of clients) have been documented to be associated
with management fraud in x% of cases in the firms database? If we can identify a way to help
auditors do so (i.e., reduce susceptibility to heuristics and biases in a probabilistic task of judging
the likelihood of fraud given internal firm knowledge), we can advance audit practice one step
further in pursing greater audit quality.
In this paper, we test an approach suggested by Gigerenzer and his colleagues (e.g.,
Gigerenzer, Hoffrage, and Kleinblting 1991; Gigerenzer and Hoffrage 1995; Cosmides and
Tooby 1994, 1996) that has the potential to move auditors judgments of uncertain events (e.g.,
management fraud) closer to the Bayesian benchmark. Their research shows that statistical
reasoning within a Bayesian framework can be improved when information is presented as
natural frequencies instead of probabilities.
1
This approach may assist auditors in being able to
better use information about the client with red flag-type characteristics and utilize a firms
decision support system output in making judgments about the probability of fraud for a given
client or group of clients (Bell et al. 2002). In doing so, we also test the boundary conditions of

1
Natural frequencies are absolute frequencies as encoded through direct experience and have not been
normalized with respect to the base rates of the event or non-event (Hoffrage and Gigerenzer 2004,
251).

4
theories developed in psychology in their application to professional judgment, which has been a
long-standing and fruitful tradition among judgment and decision-making researchers in
accounting and auditing (Kotchetova and Salterio 2004, 562).
We test three hypotheses related to the use of a frequency response mode. Our first
(overall) hypothesis examines whether auditors judgments are closer to a benchmark based on
Bayes Theorem when they receive case information and make judgments using a frequency
response mode versus a probability response mode. The results for the test of the overall
hypothesis show that the auditors judgments across the fraud and medical cases using a
frequency response mode, relative to the probability response mode, are closer to the Bayesian
benchmark. The findings for the other two hypotheses are as follows. First, the deviations in the
auditors responses from Bayesian benchmark across both response modes are significantly
smaller for the fraud case but only when the base rate is low. Second, the deviations in the
auditors responses from Bayesian benchmark for only the fraud case using a frequency response
mode, as compared to the probability response mode, are smaller in the low (rare) base rate
condition than the other two base rate conditions (moderate and high).
These findings contribute to research on auditor judgment and decision-making, and on
assessing fraud. First, we show, in general, that the use of a frequency response mode moves
auditor judgments closer to a normative Bayesian benchmark. Second, we demonstrate that the
use of a frequency response mode is most effective within the auditors professional problem
domain (assessment of the likelihood of fraud), as compared to outside their professional
problem domain (assessment of the likelihood of a disease). Finally, and perhaps most
importantly, we demonstrate how a frequency response mode, relative to a probability response
mode, helps auditors assess low base rate events (i.e., fraud) (Deloitte 2009; Francis 2004;

5
Beasley, Carcello, Hermanson and Neal 2010). These findings are in contrast to prior research in
auditing (Holt 1987; Joyce and Biddle 1981b). Based on these findings, auditing firms can
implement training strategies that will allow for a more effective use of fraud databases and
outputs from decision support systems in assessing the likelihood of fraud (Bell et al. 2002;
Winograd et al. 2000).
In the next section of the paper we provide background for our study and the hypotheses
tested. A section that presents the methodology follows. The results of the study are presented
next. The last section contains a discussion of the results and limitations.
2. Background and Hypotheses Development

Probabilistic Judgment in Accounting and Auditing
Prior research has examined auditors ability to use base rates and individuating
information in making probabilistic judgments (Holt 1987; Johnson 1983; Joyce and Biddle
1981a, b; Kida 1984). Generally, the results show that auditors do not respond sufficiently to
base rate information in their judgments across a variety of contexts (e.g., likelihood of
collectability of accounts receivable, fraud, and corporate bankruptcy). However, auditors do not
appear to ignore base rates (Smith and Kida 1991, 481).
For example, Joyce and Biddle (1981b) tested for auditors base rate sensitivity in a
series of experiments using a fraud prediction task (experiments 1, 2a-c). In their experiments,
auditors were presented with a diagnostic task requiring them to assess the probability of
management fraud given individuating data about the client managers personality profile and a
description of a decision support system used to test the existence of management fraud.
Consistent with normative Bayesian principles, the auditors judgments regressed towards the
base rate (correct direction), but the magnitude of the revision was insufficient, especially when

6
the base rate was low (Joyce and Biddle 1981b, 347). More recent research using professionals,
such as human resource managers (Whyte and Sue-Chan 2002) and commodity and stock traders
(Anderson and Sunder 1995), has produced similar results in highly familiar tasks.
Base rate neglect is usually explained by the representativeness heuristic (Gigerenzer et
al. 1988; Kahneman and Tversky 1972). The representativeness heuristic is used when an
individual assesses the probability that an item belongs to a population based on the extent to
which the item is similar in its essential properties to that population. Events that are more
similar to the population tend to be judged to have a greater probability of occurrence than less
similar events, independent of their actual base rate in a relevant population (Kahneman and
Tversky 1972; Libby 1981). However, the representativeness heuristic does not explain why
even in the absence of item description (i.e., in the situations where individuating information is
presented in the form of a hit rate and false positive statistics), experienced auditors do not use
base rates in a Bayesian fashion (Holt 1987). In addition, various interventions, such as increased
level of data specificity (Kida 1984), do not improve attention to base rates in probabilistic
judgments.
Probability versus Frequency Response Format
There is a stream of research in psychology that argues that there is a fundamental
distinction between frequency and single-event probability judgments. In contrast to Kahneman
and Tverskys heuristics and biases paradigm, Gigerenzer and his colleagues proposed that if
people are asked to estimate the probability of a single event, the question does not connect to
probability theory in their minds, whereas the frequency of such an event does (Gigerenzer, Hell,
and Blank 1988, 1991; Gigerenzer, Hoffrage, and Kleinblting 1991; Gigerenzer and Hoffrage
1995; Gigerenzer and Goldstein 1996; Gigerenzer 2004). They argue that this is due to two

7
reasons. First, Bayesian computations are simpler when information is encoded in a frequency
format rather than in a probability format (Gigerenzer and Hoffrage 1995; Hoffrage and
Gigerenzer 2004). Second, the estimation of the likelihood of a single-event and the judgment of
frequency are cognitively different processes (Cosmides and Tooby 1994, 1996; Gigerenzer,
Hoffrage, and Kleinblting 1991).
2

It is easy to demonstrate the first reason using Bayes Theorem. Lets use the low base
rate medical case example included in our study. The medical case, presented in standard
probability format (cf., Gigerenzer and Hoffrage 1995), is as follows:
The prevalence of breast cancer among women at age forty who participate in routine
screening is 1%. If a woman has breast cancer, the probability that a mammography is
positive is 79%. If a woman does not have breast cancer, the probability is 9.6% that she
will also get a positive mammography.

A forty year-old woman has a positive mammography during routine screening.

What is the probability that she actually has breast cancer? ______%

The information presented is in terms of a single-event probability. The normal Bayesian
algorithm for computing the posterior probability p(H|D) with the values included in the medical
case is

The same case presented in standard frequency format (cf., Gigerenzer and Hoffrage 1995) is as
follows:

2
There are certainly disagreements between Kahneman and Tversky (1996) and Gigerenzer (1996) over
the validity of each others approach. There are also numerous reviews in psychology that touch upon the
important issues in this debate (see Mellers et al. 1998; Payne et al. 1992; Shafir and LaBoeuf 2002).
From a practical perspective, we are simply testing whether the use of a frequency response mode can
improve auditors judgments.

8
Ten (10) out of every 1000 women at age forty who participate in routine screening has
breast cancer. Eight (8) of every 10 women with breast cancer will get a positive
mammography. Ninety-five (95) out of every 990 women without breast cancer will also
get positive mammography.

You have a sample of 100 women in this age group who had a positive mammography
during routine screening.

How many of these women out of 100 actually have breast cancer? ____ out of 100

The Bayesian algorithm for computing the posterior probability p(H|D) from the
frequency format requires solving the following equation:

where d (data) & h (hypothesis) is the number of cases with the symptoms and disease and
d&h is the number of cases having the symptom but not the disease. Thus, the answer using the
frequency response mode is 7.776 (or 8) out of 100. It is obvious that the algorithms yield
basically the same result (approximately 7.7%). However, the calculation of the answer is
cognitively easier in the frequency format because it involves calculations with natural numbers,
whereas the probability format involves fractions. Also, as documented in a series of
experiments by Gigerenzer and Hoffrage (1995), a frequency format enables subjects to invoke
computational shortcuts that produce results that are very close to a Bayesian benchmark. In our
example, a more parsimonious menu of information (number of cases with both symptom and
disease and number of false positives) is used, relative to the informational requirement of a
standard Bayesian probability format that entails an additional piece of information: the base rate
of symptoms in a population.
3


3
Other shortcut algorithms that produce an outcome very close to Bayesian include pictorial beam
analysis, rare event shortcut, big hit-rate shortcut, and comparison shortcut (Gigerenzer and Hoffrage

9
Cosmides and Tooby (1994; 1996) offer an evolutionary argument to support the second
reason for why frequency connects better with the human mind. They argue that human
inductive reasoning is developed through the observation of frequencies (real, discrete numbers)
in the natural environment. Cosmides and Tooby (1996) propose that encountered frequencies of
actual events were the only form of data available to humans throughout the history of mankind;
single-event probabilities were intrinsically unobservable because they have a format that did not
exist in a natural environment. As a result, cognitive mechanisms that evolved for adaptive
decision-making are better able to accept frequencies as input, maintain information in frequency
representations, and use these representations for effective inductive reasoning.
Gigerenzer, Hoffrage and Kleinblting (1991) also provide support for cognitive
differences in estimating the likelihood of a single-event and judging frequency. They
hypothesize that the structure of a probabilistic mental model is different from the structure of
the frequency estimation task on the following parameters: reference class and relevant cues. In a
frequency estimation task, the reference class is represented by a series of similar experiences or
event occurrences in similar situations, and the relevant cues are base rates known from
experience or offered by the problem information. In a probability estimation task, a general type
of objects or events in the task represents the reference class, and the relevant cues are
characteristics on which members of the reference class differ. Gigerenzer, Hoffrage and
Kleinbltings (1991) argument suggests that the reference class and relevant cues are not the
same between frequency and probability tasks, thereby leading to different responses.
4


1995).
4
While Dougherty, Franco-Watkins, and Thomas (2008) have questioned the psychological plausibility
of the assumptions that underlie probabilistic mental models, Gigerenzer, Hoffrage and Goldstein (2008)
maintain that the structure of the frequency estimation task is as described above.

10
Lastly, Cosmides and Tooby (1996) point out the following advantages to storing and
operating with information in frequency format. First, a frequency format preserves the number
of events on which the judgment is based on (i.e., the size of the reference class and reliability
index of the information are known). A probability format, on the other hand, does not preserve
the number of events on which probability is based (i.e., there is no n). If there is no n, the
reliability of probability information is uncertain. Second, frequency representations can be
easily updated with each new instance because the reference class and categorizations of events
are preserved; single-event probabilities have to be re-calculated, thereby creating an additional
cognitive step and increasing complexity of the task. For example, if a new instance of an event
(e.g., a disease) is encountered in the population of interest (e.g., person with certain symptoms),
the reference class of persons with symptoms and the number of events with relevant cues are
expanded by a value of 1, without additional computation (i.e., from 5 out of 20 becomes 6
out of 21 compared to the computational transfer from .25 to .286). Third, a frequency format
allows for reference classes to be easily reconstructed in accordance with the new criteria,
whereas a probability format would involve a recalculation of the ratio or a percentage and will
make a decision makers task more complex.
The research by Gigerenzer and others indicates that using a frequency response mode
results in judgments that are closer to a Bayesian benchmark. However, none of this research
uses professional accountants or managers in a task that has a degree of on-the-job realism,
thereby leaving applicability of their results to an audit setting unknown. Since Joyce and Biddle
(1981a, b) and others have established that auditors indeed fall prey to base rate neglect bias, it is
logical to investigate whether auditors performance can be improved using a frequency response
mode. Therefore, we propose the following overall hypothesis:

11
HYPOTHESIS 1: Auditors judgments will be closer to the Bayesian benchmark when they
receive case information and make required judgments using a frequency response
format versus a probability response format.

While this hypothesis replicates results in psychology, we believe that it is important to test this
hypothesis for two reasons. First, to our knowledge, no research in auditing has examined the
effect of using a frequency response mode in audit settings. Thus, we are testing the boundary
conditions of a de-biasing approach that has been developed using a probabilistic judgment task
in a non-professional context by non-professional participants. Second, if a frequency response
mode appears to be successful in reducing base-rate neglect, accounting firms may be able to
improve auditors performance by training them to use frequencies (Larrick 2004; Sedlmeier
1999).
Role of Case Context
Smith and Kida (1991, 480) state that a more central issue is whether experienced
professionals performing familiar tasks will fall prey to this (base rate neglect) bias (italics
added). Much of Smith and Kidas conclusion is based on their review of the Joyce and Biddle
(1981a, b) experiments. Yet other studies report mixed results about experts performance in
familiar versus unfamiliar problem domains. For example, Holt (1987, 573) reported no
difference between an audit task and the general-purpose task (the cab problem) for the
auditors included in her study.
5
Nelson (1993, 1996), in several experiments, observed a strong
inverse base rate effect for auditors in a medical diagnosis setting but not in a financial statement
auditing context.
6
The auditing literature suggests that contextual features of the task may give

5
The only experiment in Holts study (1987, 573) that varied case context was Experiment 2, and it was
performed using accounting students, rather than practicing auditors. The auditing literature since has
argued that auditing expertise is a task-level variable (Bonner and Pennington 1991). In Holts (1987)
Experiment 2 the participants do not have the task expertise for risk assessment, and, specifically, for
fraud risk assessment. We believe her result is driven by the nature of the subject pool she used.
6
The setting examined by Nelson (1993, 1996) included diagnostic and non-diagnostic cues (i.e.,

12
rise to a salience effect (Haynes and Kachelmeier 1998, 107-112); similar to that observed by
Nelson (1993, 1996). This effect impacts auditors performance in a variety of judgment and
decision-making tasks, including assessment of fraud probabilities (and other risks). Since
auditor judgment performance is related to task-specific expertise (Bdard 1991; Brown and
Solomon 1991; Choo 1996), we predict that auditors will perform more normatively (closer to
the Bayesian benchmark) in a task that is within their expertise problem domain, as opposed to
outside their problem domain (see Bonner and Pennington (1991) for a review of this literature).
Further, research on effects of task domain on judgment performance in psychology and
information technology has shown that participants invoke their expertise if the task is set in
their domain knowledge. For example, Ranyard and Charlton (2006) and Windschitl and Weber
(1999) show that the context of the task and participants knowledge jointly affected sports
gambling choices and interpretation of probability phrases. With respect to task domain, Ranyard
and Charlton (2006, 25) conclude, integration of stated probabilities and background
information is ubiquitous in important real life decisions under uncertainty. In an IT context,
Mao and Benbasat (2001) demonstrate that contextualized access to knowledge contained in an
expert system (i.e., the ability to access task domain knowledge within the context of problem-
solving) leads to a greater utilization of the expert system and a greater degree of congruence
between users judgment and the system (as a measure of judgment performance). Taken
together, research in and outside accounting suggests that auditors will make more normative
judgments (closer to the Bayesian benchmark) when they are presented with a case about the
probability of fraud (i.e., their professional domain), as opposed to probability of a disease (i.e.,

individuating information) and experiential learning of event frequencies. Our setting is more abstract
because we do not use diagnostic/non-diagnostic cues. We simply describe the problem in a medical or
fraud setting (see Appendix for sample experimental cases). This works against finding an effect for H2
because our problem contains fewer [un]familiar cues. Nelson (1996) observed a marginally significant
inverse base rate in an abstract setting.

13
outside of their professional domain), even in the absence of company/patient-specific diagnostic
cues.
Finally, there is no indication in the various research studies by Gigerenzer and his
colleagues that problem domain interacts with a probability vs. frequency response mode in its
effect on subject performance. As a matter of fact, none these studies vary levels of case context
within vs. out-of participants background. Thus, we do not have a basis to expect that case
context effect would not hold in the frequency response mode. Therefore, we propose the
following hypothesis:
HYPOTHESIS 2: Auditors judgments will be closer to the Bayesian benchmark in the
fraud case than the medical case regardless of response mode.

Response Mode and Rare Events
With respect to probabilistic tasks involving rare events (e.g., fraud, certain diseases),
Mellers and McGraw (1999) suggested that a frequency format for the task, even if base rates
and individuating information are provided in a summary form, makes it easier for participants to
understand the problem because rare events are represented as elements of a set. That is,
frequency format allows people to visualize nested sets of events (following the earlier example,
number of people with disease, without disease, with symptoms, without symptoms), extract
joint events (disease given symptom; without disease given symptom), and compute the
frequency of the event given symptoms (disease given symptom relative to symptom cases).
Mellers and McGraw (1999) used a mammography problem and a cab problem to
demonstrate that frequency format facilitated Bayesian reasoning when events were rare and
individuating information is presented in statistical summary form.
Prior research (e.g., Holt 1987; Joyce and Biddle 1981b; Nelson 1993, 1996) has found
that auditor/participants performed poorly in the low base rate condition. It is possible that the

14
auditors in these studies perceived the statistical information in their fraud/auditing tasks as
inconsistent with their own first hand experiences, prior training, or decision support system
information they had seen within their firms. In recent years, the profession has issued new
standards on fraud, and public accounting firms have increased fraud training for their auditors.
In addition, the large public accounting firms have been conducting surveys on the occurrence of
fraud.
7
Based on this increased awareness of fraud by practicing auditors, we examine this issue
by setting base rates at three levels that correspond to low (1%), moderate (7%), and high (25%)
base rates.
In order to ensure congruity with prior research, and to incorporate more realism in our
manipulated levels of fraud base rate, we included three base rates (1%, 7% and 25%) in our
experimental design. The low (1%) base rate was set to be consistent with prior research in
auditing (Holt 1987; Joyce and Biddle 1981b) and psychology (e.g., Mellers and McGraw 1999).
We also believe that the low base rate approximates the detected rate of fraudulent financial
reporting. Based on a review of events related to audit failure (i.e., audit litigation, business
failures, SEC enforcement actions, and earnings restatements), Francis (2004) concluded that the
failure rate is less than 1%. Two recent studies (Deloitte 2009; Beasley, Carcello, Hermanson
and Neal 2010) of SEC Accounting and Enforcement Releases (AAERs) support Francis
estimate. For example, Beasley et al. (2010) examine AAERs issued during the ten-year period
between January 1998 and December 2007 and find 347 companies were cited for fraudulent
financial reporting. Given that there are more than 15,000 publicly traded companies in the US,
this results in a base rate lower than 1%. The moderate (7%) base rate was set based on the rate
of financial reporting fraud in the KPMG Fraud Survey (2003, 3-4). It was the only publicly

7
For example, KPMG Forensic conducts a fraud survey every 2 or 3 years (KPMG 2003, 2006, 2009)
and PwC conducts a biennial Global Crime Survey (PwC 2005, 2007).

15
available fraud rate from a public accounting firm that we could find at the time we conducted
the experiment. However, we doubt that 7% is a realistic rate for financial reporting fraud for a
number of reasons. First, KPMG Forensic commissioned telephone interviews with executives
from 459 public companies and state and federal government agencies. Thus, the data is subject
to the usual limitations that accompany such survey methods. Second, we believe that including
government agencies may bias the response rate for fraudulent financial reporting since
materiality levels are likely to be much lower for government agencies than public companies.
Third, this telephone survey was conducted at the height of the crisis surrounding Enron,
WorldCom, and other frauds. This may have resulted in an overestimate of the actual financial
reporting fraud rate. The reported rate of financial reporting fraud was only 3% in the 1998
survey. Thus, the 2003 Survey reported more than a 130% increase in financial reporting fraud.
We also used a high base rate of 25% in order to provide for comparability to prior
studies in auditing (Holt 1987; Joyce and Biddle 1981b). However, we believe that a 25% base
rate for fraud is very unrealistic. The under-weighting of base rate information in Joyce and
Biddle (1981b) and Holt (1987) could be due to the arbitrary manipulation of the base rates of
fraud in their experiments (i.e., levels of 10%, 30%, 50%, and 70%). If base rates differed from
participants real world experience, they could have raised doubt in participants minds about the
realism of the experimental task.
Research in psychology has shown that the perceived base rate of an event affects the
interpretation of probability statements (Wallsten, Fillenbaum, and Cox 1986; Weber and Hilton
1990). Based on characteristics of the three base rates above, we believe that the low (1%) base
rate will be perceived as the most realistic base rate for fraudulent financial reporting (i.e.,
management fraud). Thus, if H1 holds and if practicing auditors are aware that 1% is a realistic

16
base rate of fraudulent financial reporting, we should observe the benefits of frequency response
mode, relative to the probability response mode, to be greater in the low (1%) base rate condition
than in other base rate conditions (7% and 25%).
8
We propose the following hypothesis:
HYPOTHESIS 3: The auditors judgments for the fraud case using a frequency response
mode as compared to a probability response mode, will be closer to the Bayesian
benchmark in the low base rate condition than in the moderate or high base rate
conditions.

3. Methodology
Experimental Design and Independent Variables
The experimental design is a 2 x 3 x 2 mixed factorial design with two factors
manipulated between subjects and one factor manipulated within subjects. The between-subject
factors are: (1) response mode (RESPONSE MODE) (probability response mode vs. frequency
response mode) and (2) the base rate (BASE RATE) of fraud or breast cancer (low at 1% or 10
out of 1000 vs. moderate at 7% or 70 out of a 1000 vs. high at 25% or 250 out of 1000). The
within-subject factor is case context (CASE) (a fraud case vs. medical case).
9
Response mode was manipulated by using appropriate language in the body of the experimental
problems and in the outcome question requesting assessment of the likelihood of fraud (breast
cancer) for a client (patient) given information provided in the case (see the Appendix for exact
wording). In the probability response mode condition, participants received base rates and false
positive rates stated in percentages, and answered the following outcome question: What is your
assessment of the probability that the client (woman) was engaged in fraudulent activity (has
breast cancer)? _____%. In the frequency response mode condition, participants received the

8
Since we do not use individuating (diagnostic cue) information in our case, we are not concerned about
inverse base rate effect in our particular setting (see Nelson 1993, 1996).
9
We also manipulate the order in which participants viewed the two cases as a third between-subject
variable. The two orders were: (1) fraud problem presented first, followed by the medical problem and (2)
medical problem presented first, followed by the fraud problem. Order was not significant in any of our
analyses, and is therefore not included as part of the design or analysis.

17
same base rates and false positive rates, stated as frequencies, and answered the following
question about the likelihood of fraud: In your opinion, of the 100 randomly selected clients
(women), how many were engaged in fraudulent activity (have breast cancer)? ____. False
positive rates did not vary across conditions and were stated at 9.6%. The second factor
manipulated was base rate. Base rates were set at low (1%), moderate (7%), and high (25%).
Finally, the case context included two cases: a fraud case and a medical case. The fraud
case was largely based on Joyce and Biddles (1981b) management fraud case, with appropriate
modifications for our manipulations. For the medical case, we used a breast cancer problem
developed by Eddy (1982) to examine physicians reasoning about single-event probabilities.
The fraud and medical cases were standardized with respect to the information menu: their
length and format were as similar as possible given the context of respective scenarios. Panel A
of Table 1 summarizes the experimental design.
[Insert Table 1 here]
Participants and Procedure
The participants in the experiment were auditors in the graduate program at the
Department of Accounting, Auditing and Law, Norwegian School of Economics and Business
Administration in Bergen, Norway. The director of the program and one of the researchers
administered the experiment during a class session for two annual classes. To enter the graduate
program in accounting and auditing, the participants must have completed a four-year business
administration degree or passed the examinations required for registered auditors (the first level
of certification in Norway). Completion of the program, passing a rigorous examination, and
three years of practical training, including a final test thereof, allows a candidate to be eligible
for state authorization (the highest level of certification).

18
One hundred and fifty fully completed questionnaires were used in our analyses.
10
All
participants were employed as auditors when the experiment was administered with an average
audit experience of 27.5 months. Of the 150 participants, 50 (33%) were staff or associates, 89
(59%) were senior auditors, and 11 participants (7%) did not report their level within the firm
(but reported greater than zero audit experience).
11
Twenty-eight participants (19%) reported
that they had encountered a fraudulent client in their audits.
12
In summary, participants in the
experiment are representative of the target population - staff and senior auditors. Auditors at this
level should be familiar with the fraud risk assessment task based on their education, firm
training, and participation in fraud brainstorming meetings (SAS 99 and ISA 240).
Since our experiment included the assessment of single-event probabilities, participants
were asked to report their knowledge of statistics. The mean training in statistics was 52.7
contact hours; the average self-rated proficiency in statistics was 3.23 on a scale from 1 (very
low) to 7 (very high). Forty-two auditors (28%) reported familiarity with Bayes Theorem.
Participants confidence in their probability or frequency assessments was higher in the fraud
cases than in medical cases: the mean confidence rating was 3.61 for the fraud case and 1.70 for
the medical case, on a scale from 1 (very low) to 5 (very high) (p=.000).
Because we conducted our experiment in Norway and experimental materials were

10
A small percentage of the students in the program have no prior audit experience. Those participants
were excluded from the dataset.
11
Results are not sensitive to excluding these 11 participants from our analyses.
12
We suspect that the rate of fraud (19%) reported by our participants is high for the following reasons.
In the debriefing questionnaire we asked the participants Have you ever encountered a client whose
management was involved in fraud? We should have been more precise in requesting this information.
First, we did not distinguish between fraudulent financial reporting and misappropriation of assets. Thus,
the participants responses may have included fraud related to misappropriation of assets. Second, the
question did not ask about the materiality of the fraud. Thus, it is possible that the participants have seen
immaterial fraud and reported such in responding to the question. Third, we did not specify whether the
question related to public company audit clients, private entities, or government and non-profit entities.
Thus, our participants may have thought of fraud in the overall population of clients, not limited to public
or private company clients.


19
written in English, we asked participants to self-rate their knowledge of the English language.
The participants mean self-reported rating of English proficiency was 4.00 on a scale from 1
(very low) to 7 (very high) (p=.003 for the mean above 3.5 as a test value), with a standard
deviation of 1.2. Finally, participants found the case materials realistic: 82% reported the audit
case as realistic; and 92% reported the medical case as realistic. These frequencies are
significantly above 50% at p<.001 using binomial tests for proportions.
Dependent Variables
We used the absolute value of the deviation of the participants response from the
respective Bayesian response: F-dev = |Auditors Fraud Response Fraud Bayesian Response|
and M-dev = |Auditors Medical Response Medical Bayesian Response|. In order to have a
uniform scale measure of the dependent variable for statistical tests, we converted percentages
and frequencies to proportions in probability and frequency conditions, respectively. Table 1,
Panel B provides the normative Bayesian responses (benchmarks) for each cell in the
experimental design.
4. Results
Descriptive Statistics
Table 2 presents descriptive statistics for participants likelihood assessments and the
absolute deviation of the participants assessments from the Bayesian benchmark by each
treatment condition. The third and fourth columns show the absolute deviations of the auditors
assessments from the Bayesian benchmark.
[Insert Table 2 here]
Hypotheses Tests
Overall Test of Response Mode

20
The first (overall) hypothesis examines whether auditors judgments are closer to the
Bayesian benchmark when they receive case information and make judgments using a frequency
response format versus a probability response format. Table 3, Panel A, reports the results of
testing this hypothesis using the deviations from the Bayesian benchmark as a dependent
variable.
13
In this analysis, RESPONSE MODE is marginally significant (F=1.870, p=.087, one-
tailed). As shown in Table 3, Panel B, the mean for the frequency response mode was .335,
compared to .381 in the probability response mode. Thus, our results support H1 in that the
auditors responses are closer to the Bayesian benchmark using a frequency response mode.
[Insert Table 3 here]
The Effect of Case Context
H2 examines whether the auditors responses are closer to the Bayesian benchmark in the
fraud case or in the medical case, regardless of response mode. As shown in Table 3, there is a
significant main effect for CASE (F=4.817, p=.015, one-tailed). The adjusted means in the fraud
and medical cases are 0.336 and 0.379, respectively (Table 3, Panel C). However, the CASE x
BASE RATE interaction is significant (F=5.353, p=.006). Simple effects tests show that
participants deviations from the Bayesian benchmark are significantly smaller in the fraud case
than the medical case when the base rate is low (0.345 vs. 0.479; F=5.817, p=.010, one-tailed),
but not when the base rate is moderate (0.372 vs. 0.364; F=0.398, p=.266, one-tailed) or high
(0.291 vs. 0.295; F=0.43, p=.418, one-tailed). Figure 1 shows this interaction. Thus, across both
response modes, the auditors responses are closer to the Bayesian benchmark for the fraud case
but only in the low base rate condition. Based on this result, one could argue that a realistic base
rate must be part of the problem domain in order auditors to accurately judge fraud.

13
Prior research has used the auditors raw judgments to test their hypotheses. When we use the raw
judgments, the results are essentially the same as the results using the deviation data.

21
[Insert Figure 1 here]
The Effect of Response Mode on Rare Events
H3 examines whether the auditors responses to the fraud case using a frequency
response mode relative to a probability response mode are closer to the Bayesian benchmark in
the low (1%) base rate condition than in the moderate (7%) or high base rate (25%) conditions.
We test this hypothesis by performing a 2 x 3 ANOVA with RESPONSE MODE (frequency
versus probability) and BASE RATE (low, moderate, and high) as the independent variables;
and the deviations of the participants likelihood assessments from the Bayesian benchmark in
the fraud case as the dependent variable. Table 4, Panel A reports the results. As expected, there
is a significant RESPONSE MODE x BASE RATE interaction (F=4.784, p=.010, two-tailed).
Figure 2 plots the interaction. We use simple effects tests to analyze the interaction and test H3.
For the low base rate (1%), we find that the deviations from the Bayesian benchmark are smaller
in the frequency response mode (mean = .236) than in the probability response mode (mean =
.454) (F=5.409, p=.012, one-tailed). In the moderate base rate (7%), the effect of response mode
is marginally significant (F=1.999, p=.082, one-tailed): the mean deviation from the Bayesian
benchmark in the frequency response mode (.351) is smaller than the mean in the probability
response mode (.394).
14
Finally, the means in the high base rate (25%) condition are not
significantly different (F=1.398, p=.122, one-tailed), although the mean in the frequency
response mode (.330) is larger than the mean in the probability response mode (.252).
15


14
The effects of response mode differ between low and moderate base rate conditions (F=3.297,
p=.0072, two-tailed). See Additional Analysis for further discussion.
15
We can only speculate on why this result occurred. It is difficult to predict how an auditor might
respond when using the frequency response mode with an unrealistic situation (a high base rate for fraud).
It is possible that the auditors notice the unrealistic nature of the data when the base rate is high and it is
specified in a frequency response mode. In other words, the data does fit their mental calculator
(Cosmides and Tooby 1996) and they may be lost or confused by such an unrealistic base rate.


22
We reran the ANOVA combining the two unrealistic (moderate and high) base rate
conditions (not tabulated). The RESPONSE MODE x BASE RATE interaction is significant
(F=7.641, p=.001, two-tailed). Consistent with the prior analysis, the simple effects tests showed
that the deviations from the Bayesian benchmark are, on average, significantly lower in the
frequency mode (mean =. 236) than in probability response mode (mean =.454) for the low base
rate condition (F=5.409, p=.014, one-tailed). However, there is no significant difference between
the two response modes when the moderate and high base rate conditions are combined (F=.129,
p=.361, one-tailed). As shown in Table 4, Panel B, the means for moderate and high base rate
conditions combined are .327 in probability response mode and .341 in frequency response
mode.
These findings support H3 and indicate that the benefits of frequency response mode in
fraud case are the greatest in the low base rate condition. This result is in contrast to the findings
of Joyce and Biddle (1981b) and Holt (1987). Our findings may be the result of new fraud
standards, increased awareness of actual fraud rates, and increased fraud training by firms.
[Insert Table 4 and Figure 2 here]
Additional Analysis
To provide for further understanding of the patterns of means in the fraud case, we also
looked at differences between base rates by each level of response mode (not tabled). In the
frequency response mode, the deviations from the Bayesian benchmark for the low base rate
condition (mean = .236) are, on average, lower than either of the means for the other base rates:
moderate base rate condition (mean = .351) (F=3.043, p=.044, one-tailed) and high base rate
condition (mean = .330) (F=1.862, p=.089, one-tailed). The mean for the combination of the two
unrealistic base rates (.341) is also greater than the low base rate mean (F=3.101, p=.041, one-
tailed). This pattern of results is consistent with H3.

23
In the probability response mode, we find a different pattern of results. There is no
significant difference between the mean deviations from the Bayesian benchmark in the low base
rate condition (.454) and the moderate base rate condition (.394) (F=.729, p=.199, one-tailed).
However, the low base rate condition is significantly higher than the mean in the high base rate
condition (.253) (F=9.078, p=.002, one-tailed) and the mean of the combination of the moderate
and high base rate conditions (.327) (F=4.625, p=.018, one-tailed). Finally, the mean in the
moderate base rate condition (.394) is significantly higher than the mean in the high base rate
condition (.252) (F= 9.600, p=.002, one-tailed). This pattern of means indicates that participants
in the probability response mode performed best in the high base rate condition and worst in the
low base rate condition. We believe this finding may be an indication of the cognitive difficulties
of processing base rates and estimating probabilities when base rates are very low (c.f., Joyce
and Biddle 1981a, b). It is possible that estimating the probability of a rare event is more
challenging than estimating the probability of a higher base rate event.
Base Rate Neglect: A Caveat
While we generally find support for our hypotheses related to the frequency response
mode, it is important to note that the auditors in our study continue to show base rate neglect.
Table 2 shows the deviations from the Bayesian benchmark for each response mode and base
rate level. Each of these deviations is significantly different from zero at p<.001.
5. Discussion and Limitations
In this paper, we test an approach suggested by Gigerenzer and his colleagues that shows
that statistical reasoning within a Bayesian framework can be improved when information is
presented as natural frequencies instead of probabilities. Such an approach has the potential to
help auditors improve their judgments about important events such as fraud. We test three

24
hypotheses related to the use of a frequency response mode.
Our results for the overall hypothesis (H1) show that the auditors responses using the
frequency response mode were closer to the Bayesian benchmark. We also find the following
results. First, the auditors responses across both response modes were significantly smaller for
the fraud case but only when the base rate was low. Second, the auditors responses using the
frequency response mode for the low (1%) base rate condition, as compared to the probability
response mode, are closer to the Bayesian benchmark. This result reinforces the overall finding
for the main hypothesis and shows that the frequency response mode has its best effect in the low
(rare, which is more realistic) base rate condition. We believe this is the first paper in accounting
to document the effect of frequency response mode on judgment performance in the context of
such a rare event. This result may be due increased awareness by auditors that fraud is a rare
event. The ASB, IAASB, and PCAOB have substantially revised the auditing standards on fraud
(PCAOB 2008: AU 316, based on SAS Nos. 82 and 99; IAASB 2008: ISA 240). Firms have also
gathered information on the occurrence of fraud (KPMG 2003, 2006, 2009; PwC 2005, 2007),
and increased fraud awareness training. Such guidance and related training by audit firms may
make auditors more aware of the base rate of fraud.
Overall, our findings suggest that using a frequency response mode may help auditors
better assess rare events such as fraud (Francis 2004; Loebbecke et al. 1989). The results also
suggest that some consideration should be given to training auditors in acquiring low base rate
information in summary form using a frequency response mode. For example, Sedlmeier and
Gigerenzer (2001) offer a Bayesian reasoning training method that is based on a frequency
response mode, which they termed representation training. In two experiments, they compared
representation (frequency) training with rule (Bayesian formula) training. Their performance

25
criteria were an immediate learning effect, transfer to new problems, and long-term temporal
stability. They found that representation training had a higher immediate learning effect and
greater temporal stability than rule training. Based on Sedlmeier and Gigerenzer (2001) and on
our results, we believe that a frequency format has the potential to overcome time-related
deterioration of statistical skills, and facilitate auditors in successful acquisition and Bayesian
integration of base rate information.
Our study has a number of limitations. For example, we do not distinguish between
frequency of causes and frequency of effects of fraud. However, this lack of distinction is by
design because we concentrate mainly on the effect of internal problem representation, driven by
response format on probability judgment and less so on the nature of the problem itself. Also, we
do not examine sub-population frequency perceptions and do not differentiate between the use of
sub-population and population frequency knowledge as was done in the literature on recall and
evaluation of error explanations (Tuttle 1996). The base rates in this study apply to the general
population of clients; no special sub-population of frauds is assumed. Further, we use a task
where no individuating (client-specific, red-flag style) information is provided to the
participants. We do so to provide a direct continuity with respect to the findings in prior research
upon which we built in this paper (Joyce and Biddle 1981a,b; Gigerenzer et al. studies).
16

The aforementioned limitations should provide fruitful ground for future research. In
addition, future research may focus on summary form training methods, including frequency
representation that would help auditors to integrate information in a Bayesian fashion in the
context of events with rare incidence in the client population.

16
Including such information would also introduce the issue of participants perceptions of how
diagnostic individuating information is and potentially give rise to a dilution effect and/or inverse base
rate effect. Since this is the first paper in auditing to examine the effectiveness of using frequency
response mode, we wanted to avoid these potential complications.

26
References
Anderson, M. J., and S. Sunder. 1995. Professional traders as intuitive Bayesians. Organizational
Behavior and Human Decision Processes 64(2): 185-202.

Asare, S. K., and A. M. Wright. 2004. The effectiveness of alternative risk assessment and
program planning tools in a fraud setting. Contemporary Accounting Research 21(2): 325-351.

Ashton, R. H. 1990. Pressure and performance in accounting decision settings: paradoxical
effects of incentives, feedback, and justification. Journal of Accounting Research 28
(Supplement): 148-180.

Bdard, J. 1991. Expertise and its relation to audit decision quality. Contemporary Accounting
Research 8(1): 198-222.

Bell, T. B., J. C. Bedard, K. M. Johnstone, E. F. Smith. 2002. KRisk
(SM)
: A computerized
decision aid for client acceptance and continuance risk assessments. Auditing: A Journal of
Practice & Theory 21 2): 97-114.

Bonner, S. E., and N. Pennington. 1991. Cognitive processes and knowledge as determinants of
auditor expertise. Journal of Accounting Literature 10:1-50.

Brown, C. E., and I. Solomon. 1991. Configural information processing in auditing: The role of
domain-specific knowledge. The Accounting Review 66(10: 100-119.

Beasley, M. S., J. V. Carcello, D. R. Hermanson and T. L. Neal. 2010. Fraudulent financial
reporting: 1998-2007. New York, NY: The Committee of Sponsoring Organizations of the
Treadway Commission (COSO).

Choo, F. 1996. Auditors knowledge content and judgment performance: A cognitive script
approach. Accounting Organizations, and Society 21(4): 339-359.

Cosmides, L., and J. Tooby. 1994. Origins of domain specificity: the evolution of functional
organization. In L. Hirschfeld & S. Gelman (Eds.), Mapping the mind: Doman-specificity in
cognition and culture. New York, NY: Cambridge University Press.

Cosmides, L., and J. Tooby. 1996. Are humans good intuitive statisticians after all? Rethinking
some conclusions from the literature on judgment under uncertainty. Cognition 58: 1-73.

Deloitte. 2009. Ten things about financial statement fraud - Third Edition. New York, NY:
Deloitte.

Dougherty, M. R., A. M. Franco-Watkins, and R. Thomas. 2008. Psychological plausibility of
the theory of probabilistic mental models and the fast and frugal heuristics. Psychological
Review 115(1): 191-213.


27
Eddy, D. M. 1982. Probabilistic reasoning in clinical medicine: Problems and opportunities. In
D. Kahneman, P. Slovic, and A. Tversky (Eds.), Judgments under Uncertainty: Heuristics and
Biases. Cambridge, MA: Cambridge University Press.

Emby, C., and D. Finley. 1997. Debiasing framing effects in auditors' internal control judgments
and testing decisions. Contemporary Accounting Research 14( 2): 55-78.

Francis, J. R. 2004. What do we know about audit quality? British Accounting Review 36: 345-
368.

Gigerenzer, G. 1996. On narrow norms and vague heuristics: A reply to Kahneman and Tversky.
Psychological Review 103(3): 592-596.

__________. 2004. Fast and frugal heuristics: The tools of bounded rationality. In D. Koehler &
N. Harvey (Eds.), Blackwell Handbook of Judgment and Decision Making. Oxford, UK:
Blackwell.

__________, W. Hell, and H. Blank. 1988. Presentation and content: the use of base rates as a
continuous variable. Journal of Experimental Psychology 14(3): 513-525.

__________. 1991. How to make cognitive illusions disappear. European Review of Social
Psychology 2: 83-111.

__________, U. Hoffrage, and H. Kleinblting. 1991. Probabilistic mental models: a
Brunswikian theory of confidence. Psychological Review98(4): 506-528.

Gigerenzer, G., U. Hoffrage, and D. G. Goldstein. 2008. Fast and frugal heuristics are plausible
models of cognition: Reply to Dougherty, Franco-Watkins, and Thomas. Psychological Review
115(1): 191-213.

__________, and U. Hoffrage. 1995. How to improve Bayesian reasoning without instruction:
frequency formats. Psychological Review 102(4): 684-704.

__________, and D. G. Goldstein. 1996. Reasoning the fast and frugal way: models of
bounded rationality. Psychological Review 103(4): 650-669.

Haynes, C. M., and S. J. Kachelmeier. 1998. The effects of accounting contexts on accounting
decisions: A synthesis of cognitive and economic perspectives in accounting experimentation.
Journal of Accounting Literature 17: 97-136.

Hoffrage, U. and G. Gigerenzer. 2004. Chapter 13. How to improve the diagnostic inferences of
medical experts. In Kurz-Milcke, E., and G. Gigerenzer (Eds.), Experts in Science and Society
New York, NY: Kluwer/Plenum.

Holt, D. 1987. Auditors base rates revisited. Accounting, Organizations, and Society 12(6): 571-
578.

28

IAASB. 2008. ISA 240 The Auditors Responsibilities Relating to Fraud in and Audit of the
Financial Statements (Redrafted), in 2008 Handbook of International Auditing, Assurance, and
Ethics Pronouncements - Part II. Retrieved on May 12, 2008, from
http://www.ifac.org/Store/Details.tmpl?SID=12048375762286923&Cart=1210626167400226

Johnson, W. B. 1983. Representativeness in judgmental predictions of corporate bankruptcy.
The Accounting Review 58(1): 78-97.

Joyce, E., and G. Biddle. 1981a. Anchoring and adjustment in probabilistic inference in auditing.
Journal of Accounting Research 19(1): 120-145.

______. 1981b. Are auditors judgments sufficiently regressive? Journal of Accounting Research
19(2): 323-349.

Kahneman, D., and A. Tversky. 1972. Subjective probability: a judgment of representativeness.
Cognitive Psychology 3: 430-454.

Kahneman, D., and A. Tversky. 1973. On the psychology of prediction. Psychological Review
80(4): 237-251.

_______. 1996. On the reality of cognitive illusions. Psychological Review 103(3): 582-591.

Kida, T. 1984. The effect of causality and specificity on data use. Journal of Accounting
Research 22(1): 145-152.

Kinney, Jr., W. R. 1984. Discussant's response to an analysis of the audit framework focusing on
inherent risk and the role of statistical sampling in compliance testing. Proceedings of the 1984
Touche Ross/University of Kansas Symposium on Auditing Problems. Lawrence, KS: School of
Business, University of Kansas, 126-32.

________. 1989. Achieved audit risk and the audit outcome space. Auditing: A Journal of
Practice & Theory 8 (Supplement): 67-97.

Kotchetova, N., and S. Salterio. 2004. Judgment and decision-making accounting research; A
quest to improve the production, certification, and use of accounting information. In Koehler, D.,
and N. Harvey (Eds.), Blackwell Handbook of Judgment and Decision Making, Oxford, UK:
Blackwell Publishing Ltd.

KPMG. 2003. KPMG Fraud Survey 2003. KPMG LLP.

________. 2006. KPMG Fraud Survey 2006. KPMG LLP.

________. 2009. KPMG Fraud Survey 2009.KPMG LLP.

Larrick, R. P. 2004. Debiasing. In Koehler, D., and N. Harvey (Eds.), Blackwell Handbook of

29
Judgment and Decision Making, Oxford, UK: Blackwell Publishing Ltd.

Leslie, D. A. 1984. An analysis of the audit framework focusing on inherent risk and the role of
statistical sampling in compliance testing. Proceedings of the 1984 Touche Ross/University of
Kansas Symposium on Auditing Problems. Lawrence, KS: School of Business, University of
Kansas, 89-125.

Libby, R. 1981. Accounting and human information processing: Theory and applications. Upper
Saddle River, NJ: Prentice Hall.

Loebbecke, J. K., M. M. Eining, and J. J. Willingham. 1989. Auditors' experience with material
irregularities: frequency, nature and detectability. Auditing: A Journal of Practice & Theory
(Fall): 1-28.

Mao, J. Y., and I. Benbasat. 2001. The effects of contextualized access to knowledge on
judgment. International Journal of Human-Computer Studies 55:787-814.

Mellers, B. A., A. Schwartz, and A. D. J. Cooke. 1998. Judgment and decision making. Annual
Review of Psychology 49: 447-477.

Mellers, B. A., and A. P. McGraw. 1999. How to improve Bayesian reasoning: Comment on
Gigerenzer and Hoffrage (1995). Psychological Review 106(2): 417-424.

Nelson, M. W. 1993. The effects of error frequency and accounting knowledge on error
diagnosis in analytical review. The Accounting Review 68(4): 804-824.

________. 1996. Context and the inverse base rate effect. Journal of Behavioral Decision
Making 9: 23-40.

Nieschwietz, R. N., J. J. Schutz, Jr., and M. F. Zimbelman. 2000. Empirical research on external
auditors detection of financial statement fraud. Journal of Accounting Literature 19: 190-246.

Payne, J. W., J. R. Bettman, and E.J. Johnson. 1992. Behavioral decision theory: a constructive
processing approach. Annual Review of Psychology 43: 87-131.

PCAOB. 2008. AU Section 316 Consideration of Fraud in a Financial Statement Audit. Retrieved
on May 8, 2008, from
http://www.pcaobus.org/standards/interim_standards/auditing_standards/index_au.asp?series=300&s
ection=300

Peecher, M. and M.D D. Piercey. 2008. Judging audit quality in light of adverse outcomes:
evidence of outcome bias and reverse outcome bias. Contemporary Accounting Research 25(1):
243-274.

PricewaterhouseCoopers. 2005. Global Economic Crime Survey 2005.
PricewaterhouseCoopers (www.pwc.com/crimesurvey).

30

________. 2007. Economic Crime: People, Culture and Control. PricewaterhouseCoopers
(www.pwc.com/crimesurvey).

Ranyard, R., and J. P. Charlton. 2006. Cognitive processes underlying lottery and sports
gambling decisions: The role of stated probabilities and background knowledge. European
Journal of Cognitive Psychology 18(2): 234-254.

Sedlmeier, P. 1999. Improving Statistical Reasoning: Theoretical Models and Practical
Implications. Mahwah, NJ: Erlbaum.

Sedlmeier, P. and G. Gigerenzer. 2001. Teaching Bayesian reasoning in less than two hours.
Journal of Experimental Psychology: General 130(3): 380-400.

Shafir, E., and R. A. LaBoeuf. 2002. Rationality. Annual Review of Psychology 53: 491-517.

Smith, J. F., and T. Kida. 1991. Heuristics and biases: expertise and task realism in auditing.
Psychological Bulletin 109(3): 472-489.

Tuttle, B. M. 1996. Using base rate frequency perceptions to diagnose financial statement error
causes. Auditing: A Journal of Practice & Theory 15(1): 104-121.

Tversky, A., and D. Kahneman. 1982. Evidential impact of base rates. In D. Kahneman, P.
Slovic, and A. Tversky (Eds.), Judgments under Uncertainty: Heuristics and Biases. Cambridge:
Cambridge University Press.

Wallsten, T. S., S. Fillenbaum, and J. A. Cox. 1986. Base rate effects on the interpretation of
probability and frequency expressions. Journal of Memory and Language 25: 571-587.

Weber, E. U., and D. J. Hilton. 1990. Contextual effects in the interpretations of probability
words: Perceived base rate and severity of events. Journal of Experimental Psychology: Human
Perception and Performance 16(4): 781-789.

Whyte, G., and C. Sue-Chan. 2002. The neglect of base rate data by human resources managers
in employee selection. Canadian Journal of Administrative Sciences 19(1): 1-11.

Wilks, T. J., and M. Zimbelman. 2004. Decomposition of fraud-risk assessments and auditors'
sensitivity to fraud cues. Contemporary Accounting Research 21(3): 719-746.

Windschitl, P. D., and E. U. Weber. 1999. The interpretation of likely depends on the context,
but 70%=70% - right? The influence of associative processes on perceived certainty. Journal
of Experimental Psychology: Learning, Memory, and Cognition 25:1514-1533.

Winograd, B. M., J. S. Gerson, and B. L. Berlin. 2000. Audit practices of
PricewaterhouseCoopers. Auditing: A Journal of Practice & Theory 19(2): 175-182.


31
Figure 1
Interaction Plot of Case and Base Rate

Dependent Variable: Deviations of the Participants Likelihood Assessments from the Bayesian
Benchmark (see Table 3)

CASE
Medical Fraud
0.50
0.45
0.40
0.35
0.30
0.25
Low (1%)
High (25%)
Moderate (7%)
BASE RATE
.35
.37
.36
.48
.29
.30
Estimated Marginal Means

32

Figure 2
Interaction Plot of Response Mode and Base Rate

Dependent Variable: Deviations of the Participants Likelihood Assessments from the Bayesian
Benchmark in the Fraud Case (see table 4)


.20
.25
.30
.35
.40
.45
.50
Low Base Rate (1%)
Moderate Base Rate (7%)
Estimated Marginal Means
.45

.24

Probability Frequency

Response Mode
High Base Rate (25%)
.39
.35
.34
.33
.25
Moderate (7%) and
High (25%) Base Rate
.33

33
TABLE 1
Experimental design and Bayesian benchmark information

Panel A: Experimental Design

Base Rate
Low (1%) Moderate (7%) High (25%)
Probability Fraud Case*
Medical Case
Fraud Case
Medical Case
Fraud Case
Medical Case
Response
Mode
Frequency Fraud Case
Medical Case
Fraud Case
Medical Case
Fraud Case
Medical Case
Notes:
Base Rate and Response Mode are manipulated between subjects;
Case type (CASE) is manipulated within subjects.
Additional between-subject factor:
* Order of cases (ORDER) was also manipulated at two levels: fraud case first and medical case
first.


Panel B: Bayesian Benchmark

Base Rate False
Positive
Test
Accuracy
Bayesian
Benchmark in
Probability Mode
Bayesian
Benchmark in
Frequency Mode
Low (1%) .096 .79 .0767 .0776
Moderate (7%) .096 .79 .3825 .3855
High (25%) .096 .79 .7328 .7353




34

TABLE 2
Descriptive Statistics for the Dependent Variables (Mean, Standard Deviation)

Raw Score
(F-raw, M-raw)
Absolute Deviation from
Bayesian Benchmark
(F-dev, M-dev)

Fraud Case Medical Case Fraud Case Medical Case
Low Base
Rate
n=24
.5059
(.3657)
.5945
(.3585)
.4537
(.3333)
.5399
(.3228)
Moderate
Base Rate
n=27
.7092
(.2655)
.7751
(.1289)
.3941
(.1370)
.3911
(.1283)
High Base
Rate
n=24
.6454
(.3064)
.6415
(.3050)
.2525
(.1879)
.2549
(.1842)
Probability
Response
Mode
Overall Mean
n=75
.6237
(3206)
.6746
(.2843)
.3679
(.2430)
.3951
(.2488)
Low Base
Rate
n=24
.2620
(.3547)
.4535
(.4062)
.2362
(.3215)
.4187
(.3604)
Moderate
Base Rate
n=27
.2731
(.3463)
.3669
(.3612)
.3509
(.0694)
.3370
(.1121)
High Base
Rate
n=24
.4962
(.3524)
.5484
(.3733)
.3302
(.2639)
.3353
(.2416)
Frequency
Response
Mode
Overall Mean
n=75
.3438
(.3630)
.4563
(.3829)
.3058
(.2454)
.3637
(.2582)
Column
Mean
n=150
.4837
(.3691)
.5654
(.3535)
.3368
(.2453)
.3794
(.2532)
Notes:
The values in each cell are the mean (standard deviation) for each of the following variables:
Raw Scores = the participants responses to the fraud or medical cases under each response
mode.
Absolute Deviation from Bayesian Benchmark = the absolute difference between the
participants raw score by response mode minus the appropriate Bayesian benchmark. See Table
1 for the appropriate Bayesian benchmark by treatment condition.

35
TABLE 3
Repeated Measures ANOVA using the Deviations of the Participants Likelihood
Assessments from the Bayesian Benchmark

Dependent variable: F-dev, M-dev (n = 150)

Panel A: Between-subjects and Within-subjects Effects

Source SS df F p value*
Between-Subjects Effects:
Intercept 38.363 1 446.649 .000
RESPONSE MODE .161 1 1.870 .087(H1)
BASE RATE .709 2 4.129 .018
RESPONSE MODE x BASE RATE .756 2 4.399 .014
Error 12.368 144

Within-Subjects Effects:
CASE .140 1 4.817 .015 (H2)
CASE x RESPONSE MODE .016 1 .556 .457
CASE x BASE RATE .311 2 5.353 .006
CASE x RESPONSE MODE x BASE RATE .042 2 .731 .483
Error(CASE) 4.179 144

Notes:
RESPONSE MODE = (probability vs. frequency)
BASE RATE = (low vs. moderate vs. high)
CASE = (fraud vs. medical case)
ORDER was included in the analysis as a fixed factor to account for order of case presentation,
and was not significant in any of the analyses reported in this and other tables
* p-values are one-tailed if hypothesized in a direction (H1, H2), and two-tailed otherwise

Panel B: Adjusted Means by Response Mode and Base Rate (Standard Error)

BASE RATE
1%
n=49
7%
n=52
25%
n=49
Row Mean
n=150
Probability
n= 75
.497
(.042)
.393
(.040)
.254
(.042)
.381
(.024)
Frequency
n= 75
.327
(.041)
.344
(.041)
.333
(.041)
.335
(.024)
RESPONSE
MODE
Column
Mean
n= 150
.412
(.030)
.368
(.029)
.293
(.030)
.358
(.017)




36
Panel C: Adjusted Means by CASE and Base Rate (Standard Error)

BASE RATE
1%
n=49
7%
n=52
25%
n=49
Row Mean
n=150
Fraud Case
n= 75
.345
(.034)
.372
(.033)
.291
(.034)
.336
(.019)
Medical Case
n= 75
.479
(.035)
.364
(.034)
.295
(.035)
.379
(.020) CASE
Column
Mean
n= 150
.412
(.030)
.368
(.029)
.293
(.030)
.358
(.017)


37

Table 4
ANOVA using the Deviations of the Participants Likelihood Assessments from the
Bayesian Benchmark in the Fraud Case

Dependent variable: F-dev (n=150)

Panel A: Between-subjects Effects

Source SS df F p-value
Corrected Model .846 5 3.001 .013
Intercept 16.936 1 300.240 .000
RESPONSE MODE .139 1 2.469 .118
BASE RATE .171 2 1.513 .224
RESPONSE MODE * BASE RATE .540 2 4.784 .010
Error 8.123 144
Total 25.986 150
Corrected Total 8.969 149
Notes:
RESPONSE MODE = (probability vs. frequency)
BASE RATE = (low vs. moderate vs. high)

Panel B: Adjusted Means by Response Mode and Base Rate (Standard Error)

BASE RATE
1%
n=49
7%
n=52
25%
n=49
7% and
25%
n=101
Row
Mean
n=150
Probability
n= 75
.454
(.048)
.394
(.046)
.252
(.048)
.327
(.034)
.367
(.027)
Frequency
n= 75
.236
(.048)
.351
(.048)
.330
(.048)
.341
(.034)
.306
(.027)
RESPONSE
MODE
Column
Mean
n= 150
.345
(.034)
.372
(.033)
.291
(.034)
.334
(.024)
.336
(.019)


38
Appendix
Sample Experimental Cases
(Base Rate conditions in italics)

Fraud Case Probability Response Format

A public accounting firm has assembled a large database of management descriptions as a part of
a study intended to determine whether management descriptions are helpful in detecting
management fraud. Upon completion of the database, the accounting firm has established that
the rate of material management fraud in the overall client base is [1%, 7% or 25%] In addition,
the accounting firm estimated that if a client is involved in material management fraud, the
description of the client management will match the fraudulent profile from a database in 79%
of cases. However, the firm estimates that in 9.6% of cases the descriptions in the database will
identify the client management as fraudulent when it is, in fact, honest.

A client was selected from the firms client base. The database indicates the description of this
clients management matches the fraudulent profile.

What is the probability that the management of this client is engaged in material fraud? ______%


Fraud Case Frequency Response Format

A public accounting firm has assembled a large database of management descriptions as a part of
a study intended to determine whether management descriptions are helpful in detecting
management fraud. Upon completion of the database, the accounting firm has established that
[ten (10), seventy (70), or 250 (250)]out of each 1000 clients in their customer base are involved
in material management fraud. In addition, the accounting firm estimated that if a client is
involved in material management fraud, the description of the client management will match the
fraudulent profile from a database in eight (8) cases out of ten (10). However, the firm
estimates that in ninety-five (95) out of 990 cases the descriptions in the database will identify
the client management as fraudulent when it is, in fact, honest.

100 clients were selected from the firms client base. The database indicates the descriptions of
all of these clients managements match the fraudulent profile.

How many clients out of these 100 are engaged in material management fraud? _____ out of 100



39
Breast Cancer Case Probability Response Format

The prevalence of breast cancer among women at age forty who participate in routine screening
is [1%, 7% or 25%]. If a woman has breast cancer, the probability that a mammography is
positive is 79%. If a woman does not have breast cancer, the probability is 9.6% that she will
also get a positive mammography.

A forty year-old woman has a positive mammography during routine screening.

What is the probability that she actually has breast cancer? ______%



Breast Cancer Case Frequency Response Format

[Ten (10), seventy (70), or 250 (250)] out of every 1000 women at age forty who participate in
routine screening has breast cancer. Eight (8) of every 10 women with breast cancer will get a
positive mammography. Ninety-five (95) out of every 990 women without breast cancer will also
get positive mammography.

You have a sample of 100 women in this age group who had a positive mammography during
routine screening.

How many of these women out of 100 actually have breast cancer? ____ out of 100

Anda mungkin juga menyukai