Anda di halaman 1dari 80


Concepts of testing, measurement, assessment and evaluation

An instrument or systematic procedure for measuring a sample of behaviour by posing a set of
questions in a uniform manner. Because a test is a form of assessment, tests also answer the
question, How well does the individual perform either in comparison with others or in comparison
with a domain of performance tasks? Measurement The process of obtaining a numerical description
of the degree to which an individual possesses a particular characteristic. Measurement answers the
question, How much?. Example:- When the teacher calculates the percentage of problems of
student Geetha has correct, and gives her a score of 70/100, that is measurement.
Test is used to gather information.
That information is presented in the form of measurement.
That measurement is then used to make evaluation.
Any of a variety of procedures used to obtain information about student performance includes
traditional paper and pencil tests as well as extended responses (e.g. essays) and performances of
authentic tasks (e.g. laboratory experiments). Assessment answers the question, How well does the
individual perform?. Note that term assessment is used to mean the same as evaluation.
Concept of Evaluation
Evaluation has a wider meaning. It goes beyond measurement. When from useful information
including measurement, we make a judgement, that is evaluation. Example:- The teacher may
evaluate the student Geetha that she is doing well in mathematics, because most of the class scored
50/100. This is an example of evaluation using quantitative data (measurable information). The
teacher might also make an evaluation based on qualitative data, such as her observations that
Geetha works hard, has an enthusiastic attitude towards mathematics and finishes her assignments
Evaluation is a Science of providing information for decision making.
It Includes measurement, assessment and testing
It is a process that involves
Information gathering
Information processing
Judgement forming
Decision making
From the above, we can arrive at the following concept of evaluation
Evaluation is a concept that has emerged as a prominent process of assessing, testing and
measuring. Its main objective is Qualitative Improvement.
Evaluation is a process of making value judgements over a level of performance or achievement.
Making value judgements in Evaluation process presupposes the set of objectives
Evaluation implies a critical assessment of educative process and its outcome in the light of the

Purposes of Evaluation
Evaluation is the process of determining the extent to which the objectives are achieved.
It is Concerned not only with the appraisal of achievement, but also with its improvement.
Evaluation is a continuous and a dynamic process. Evaluation helps in forming the following

Types of Decisions
Placement or Classification
Among the above decisions, we shall learn how evaluation assists a teacher in taking instructional
decisions. Evaluation assists in taking certain instructional decisions like:
1. to what extent students are ready for learning experience?
2. to what extent they can cope with the pace of Learning Experiences provided?
3. How the individual differences within the group can be tackled?
4. What are the learning problems of the students?
5. What is the intensity of such problems?
6. What modifications are needed in the instruction to suit the needs of students, etc.
Evaluation is an integral part of teaching and learning process. This is explained in the following

What should teachers learn about evaluation?
1. Choosing evaluation methods appropriate for instructional decisions;
2. developing methods appropriate for instructional decisions;
3. administering, scoring and interpreting the results of both externally produced and
teacher-produced evaluation methods;
4. using evaluation results when making decisions about individual students, planning
teaching, developing curriculum and school improvement,
5. developing valid pupil grading procedures, which use pupil assessments,
6. communicating evaluation results to students, parents, other lay audiences, and other
7. recognizing unethical, illegal and otherwise inappropriate evaluation methods and
uses of evaluation information.

Definition and Types of Evaluation
Evaluation consists of objective assessment of a project, programme or policy at all of its stages, i.e. planning,
implementation and measurement of outcomes. It should provide reliable and useful information allowing to
apply the knowledge thus obtained in the decision making process. It often concerns the process of
determination of the value or importance of a measure, policy or programme.
According to the above indicated Regulation, the aim of evaluation is to improve: the quality, effectiveness and
consistency of the assistance from the Funds and the strategy and implementation of operational programmes
with respect to the specific structural problems affecting the Member States and regions concerned, while taking
account of the objective of sustainable development and the relevant Community legislation concerning
environmental impact and strategic environmental assessment.
Evaluation as a process of systematic assessment of interventions financed from structural funds is continuously
gaining increasing importance. In the programming period 2007-2013 the results of evaluation studies will play
an important role in the process of shaping of the cohesion policy of the European Union, and during the debate
on the next budget following after the current financial perspective after 2013 will belong to key arguments in
favour of preserving it in the existing shape or of the verification of its presumptions.
Categories of Evaluation
According to the criterion of the purpose of evaluation, it is classified into the following categories:
Strategic evaluation (with the purpose to assess and analyse the evolution of NSRF and OP with respect
to national and Community priorities);
Operational evaluation (with the purpose to support the process of NSRF and OP monitoring).
Strategic evaluation concerns mainly the analysis and assessment of interventions at the level of strategic
goals. The object of strategic evaluation consists of the analysis and appraisal of the relevance of general
directions of interventions determined at the programming stage. One of the significant aspects of strategic
evaluation consists of the verification of the adopted strategy with respect to the current and anticipated social
and economic situation.
Operational evaluation is closely linked to the process of NSRF and OP management and monitoring. The
purpose of operational evaluation consists of providing support to the institutions responsible for the
implementation of NSRF and OP with regards to the achievement of the assumed operational objectives by
providing practically useful conclusions and recommendations. According to Regulation 1083/2006, operational
evaluation should be carried out, in particular, in the case when monitoring has revealed significant deviations
from the originally assumed objectives and when requests are submitted for the review of an operational
programme or its part.
From the point of view of timing of the performed evaluation, it is classified into the following types:
ex ante evaluation (prior to the launch of NSRF or OP implementation),
ongoing evaluation (in the course of NSRF or OP implementation),
ex post evaluation (after completion of NSRF or OP implementation).
The process of ex ante evaluation of NSRF and OP was completed in the year 2006. The results of ex-ante
evaluations of NSRF and OP performed by external evaluators were taken into account in the final version of the
National Strategic Reference Framework for 2007-2013 and of the different particular Operational Programmes.

Ex post evaluation is done by the European Commission in cooperation with the member states and with the
Management Bodies. Regardless of the evaluation conducted by the European Commission, member states may
perform ex post evaluation on their own account.
Ongoing evaluation is a process with the purpose of arriving at better understanding of the current outcomes of
intervention and the formulation of recommendations that would be useful from the point of view of programme
implementation. In the next few years ongoing evaluation will become key for the effective cohesion policy
implementation in Poland.

Evaluation Approaches & Types

There are various types of evaluations but two main philosophical
approaches: formative and summative. After a brief introduction to these two approaches, we shall
share several specific types of evaluations that fall under the formative and summative approaches.
Formative evaluation is an on-going process that allows for feedback to be implemented
during a program cycle. Formative evaluations (Boulmetis & Dutwin, 2005):
Concentrate on examining and changing processes as they occur
Provide timely feedback about program services
Allow you to make program adjustments on the fly to help achieve program goals
Needs assessment determines who needs the program, how great the need is, and what might work
to meet the need.
Structured conceptualization helps stakeholders define the program, the target population, and the
possible outcomes.
Implementation evaluation monitors the fidelity of the program delivery.
Process evaluation investigates the process of delivering the program, including alternative delivery
Summative evaluation occurs at the end of a program cycle and provides an overall
description of program effectiveness. Summative evaluation examines program outcomes to
determine overall program effectiveness. Summative evaluation is a method for answering some of
the following questions:
Were your program objectives met?
Will you need to improve and modify the overall structure of the program?
What is the overall impact of the program?
What resources will you need to address the programs weaknesses?
Summative evaluation will enable you to make decisions regarding specific services and the future
direction of the program that cannot be made during the middle of a program cycle. Summative
evaluations should be provided to funders and constituents with an interest in the program.
Goal-based evaluation determines if the intended goals of a program were achieved. Has my program
accomplished its goals?
Outcome evaluation investigate whether the program caused demonstrable effects on specifically
defined target outcomes. What effect does program participation have on students?
Impact evaluation is broader and assesses the overall or net effects intended or unintended of
the program. What impact does this program have on the larger organization (e.g., high school or
college), community, or system?
Cost-effectiveness and cost-benefit analysis address questions of efficiency by standardizing
outcomes in terms of their dollar costs and values. How efficient is my program with respect to cost?

Below is a figure depicting the different ways formative and summative evaluation can be utilized.

Introduction to Evaluation
Evaluation is a methodological area that is closely related to, but
distinguishable from more traditional social research. Evaluation
utilizes many of the same methodologies used in traditional social
research, but because evaluation takes place within a political and
organizational context, it requires group skills, management ability,
political dexterity, sensitivity to multiple stakeholders and other skills
that social research in general does not rely on as much. Here we
introduce the idea of evaluation and some of the major terms and
issues in the field.
Definitions of Evaluation
Probably the most frequently given definition is:
Evaluation is the systematic assessment of the worth or merit of some object
This definition is hardly perfect. There are many types of evaluations
that do not necessarily result in an assessment of worth or merit --
descriptive studies, implementation analyses, and formative
evaluations, to name a few. Better perhaps is a definition that
emphasizes the information-processing and feedback functions of
evaluation. For instance, one might say:
Evaluation is the systematic acquisition and assessment of information to provide useful feedback about
some object
Both definitions agree that evaluation is a systematic endeavor and
both use the deliberately ambiguous term 'object' which could refer to
a program, policy, technology, person, need, activity, and so on. The
latter definition emphasizes acquiring and assessing
information rather than assessing worth or merit because all
evaluation work involves collecting and sifting through data, making
judgements about the validity of the information and of inferences we
derive from it, whether or not an assessment of worth or merit results.
The Goals of Evaluation
The generic goal of most evaluations is to provide "useful feedback" to
a variety of audiences including sponsors, donors, client-groups,
administrators, staff, and other relevant constituencies. Most often,
feedback is perceived as "useful" if it aids in decision-making. But the
relationship between an evaluation and its impact is not a simple one -
- studies that seem critical sometimes fail to influence short-term
decisions, and studies that initially seem to have no influence can
have a delayed impact when more congenial conditions arise. Despite
this, there is broad consensus that the major goal of evaluation should
be to influence decision-making or policy formulation through the
provision of empirically-driven feedback.
Evaluation Strategies
'Evaluation strategies' means broad, overarching perspectives on
evaluation. They encompass the most general groups or "camps" of
evaluators; although, at its best, evaluation work borrows eclectically
from the perspectives of all these camps. Four major groups of
evaluation strategies are discussed here.
Scientific-experimental models are probably the most historically
dominant evaluation strategies. Taking their values and methods from
the sciences -- especially the social sciences -- they prioritize on the
desirability of impartiality, accuracy, objectivity and the validity of the
information generated. Included under scientific-experimental models
would be: the tradition of experimental and quasi-experimental
designs; objectives-based research that comes from education;
econometrically-oriented perspectives including cost-effectiveness
and cost-benefit analysis; and the recent articulation of theory-driven
The second class of strategies are management-oriented systems
models. Two of the most common of these are PERT,
the Program Evaluation and Review Technique, and CPM,
the Critical Path Method. Both have been widely used in business and
government in this country. It would also be legitimate to include the
Logical Framework or "Logframe" model developed at U.S. Agency for
International Development and general systems theory and operations
research approaches in this category. Two management-oriented
systems models were originated by evaluators: the UTOS model
where U stands for Units, T for Treatments, O for Observing
Observations and S for Settings; and the CIPP model where
the C stands for Context, the I for Input, the first P for Process and the
second P for Product. These management-oriented systems models
emphasize comprehensiveness in evaluation, placing evaluation
within a larger framework of organizational activities.
The third class of strategies are the qualitative/anthropological
models. They emphasize the importance of observation, the need to
retain the phenomenological quality of the evaluation context, and the
value of subjective human interpretation in the evaluation process.
Included in this category are the approaches known in evaluation as
naturalistic or 'Fourth Generation' evaluation; the various qualitative
schools; critical theory and art criticism approaches; and, the
'grounded theory' approach of Glaser and Strauss among others.
Finally, a fourth class of strategies is termed participant-oriented
models. As the term suggests, they emphasize the central
importance of the evaluation participants, especially clients and users
of the program or technology. Client-centered and stakeholder
approaches are examples of participant-oriented models, as are
consumer-oriented evaluation systems.
With all of these strategies to choose from, how to decide? Debates
that rage within the evaluation profession -- and they do rage -- are
generally battles between these different strategists, with each
claiming the superiority of their position. In reality, most good
evaluators are familiar with all four categories and borrow from each
as the need arises. There is no inherent incompatibility between these
broad strategies -- each of them brings something valuable to the
evaluation table. In fact, in recent years attention has increasingly
turned to how one might integrate results from evaluations that use
different strategies, carried out from different perspectives, and using
different methods. Clearly, there are no simple answers here. The
problems are complex and the methodologies needed will and should
be varied.
Types of Evaluation
There are many different types of evaluations depending on the object
being evaluated and the purpose of the evaluation. Perhaps the most
important basic distinction in evaluation types is that
between formative andsummative evaluation. Formative evaluations
strengthen or improve the object being evaluated -- they help form it
by examining the delivery of the program or technology, the quality of
its implementation, and the assessment of the organizational context,
personnel, procedures, inputs, and so on. Summative evaluations, in
contrast, examine the effects or outcomes of some object -- they
summarize it by describing what happens subsequent to delivery of
the program or technology; assessing whether the object can be said
to have caused the outcome; determining the overall impact of the
causal factor beyond only the immediate target outcomes; and,
estimating the relative costs associated with the object.
Formative evaluation includes several evaluation types:
needs assessment determines who needs the program, how great the need is, and what might work to
meet the need
evaluability assessment determines whether an evaluation is feasible and how stakeholders can help
shape its usefulness
structured conceptualization helps stakeholders define the program or technology, the target
population, and the possible outcomes
implementation evaluation monitors the fidelity of the program or technology delivery
process evaluation investigates the process of delivering the program or technology, including
alternative delivery procedures
Summative evaluation can also be subdivided:
outcome evaluations investigate whether the program or technology caused demonstrable effects on
specifically defined target outcomes
impact evaluation is broader and assesses the overall or net effects -- intended or unintended -- of the
program or technology as a whole
cost-effectiveness and cost-benefit analysis address questions of efficiency by standardizing
outcomes in terms of their dollar costs and values
secondary analysis reexamines existing data to address new questions or use methods not previously
meta-analysis integrates the outcome estimates from multiple studies to arrive at an overall or
summary judgement on an evaluation question
Evaluation Questions and Methods
Evaluators ask many different kinds of questions and use a variety of
methods to address them. These are considered within the framework
of formative and summative evaluation as presented above.
In formative research the major questions and methodologies are:
What is the definition and scope of the problem or issue, or
what's the question?
Formulating and conceptualizing methods might be used including brainstorming, focus groups, nominal group
techniques, Delphi methods, brainwriting, stakeholder analysis, synectics, lateral thinking, input-output analysis,
and concept mapping.
Where is the problem and how big or serious is it?
The most common method used here is "needs assessment" which can include: analysis of existing data
sources, and the use of sample surveys, interviews of constituent populations, qualitative research, expert
testimony, and focus groups.
How should the program or technology be delivered to address
the problem?
Some of the methods already listed apply here, as do detailing methodologies like simulation techniques, or
multivariate methods like multiattribute utility theory or exploratory causal modeling; decision-making methods;
and project planning and implementation methods like flow charting, PERT/CPM, and project scheduling.
How well is the program or technology delivered?
Qualitative and quantitative monitoring techniques, the use of management information systems, and
implementation assessment would be appropriate methodologies here.
The questions and methods addressed under summative evaluation include:
What type of evaluation is feasible?
Evaluability assessment can be used here, as well as standard approaches for selecting an appropriate
evaluation design.
What was the effectiveness of the program or technology?
One would choose from observational and correlational methods for demonstrating whether desired effects
occurred, and quasi-experimental and experimental designs for determining whether observed effects can
reasonably be attributed to the intervention and not to other sources.
What is the net impact of the program?
Econometric methods for assessing cost effectiveness and cost/benefits would apply here, along with qualitative
methods that enable us to summarize the full range of intended and unintended impacts.
Clearly, this introduction is not meant to be exhaustive. Each of these
methods, and the many not mentioned, are supported by an extensive
methodological research literature. This is a formidable set of tools.
But the need to improve, update and adapt these methods to
changing circumstances means that methodological research and
development needs to have a major place in evaluation work.
Evaluation Models

You learned the basics of evaluation last week. This week we are
going to learn about some of the different evaluation approaches
or models or metaphors that different groups of evaluators tend
to endorse. I generally use the terms approaches, models, and
metaphors as synonyms.

The reading is titled Chapter 4: Evaluation Models, which is from a
book by my (Burke Johnsons) major professor at the University of
Georgia. Here is the reference:

Payne, D.A. (1994). Designing educational project and program
evaluations: A practical overview based on research and
experience. Boston: Kluwer Academic Publishers.

The whole book is actually quite good, but we are only using one
chapter for our course.

In this chapter, Payne discusses four types of models:
1. Management Models
2. Judicial Models
3. Anthropological Models
4. Consumer Models

You might remember these four types using this mnemonic: MJAC.

Here is how Scriven defines models: A term loosely used to refer to
a conception or approach or sometimes even a method (e.g.,
naturalistic, goal-free) of doing evaluation.Models are to paradigms
as hypotheses are to theories, which means less general but with
some overlap.

Payne notes (p.58) that his four metaphors may be helpful in leading
to your theory of evaluation. In fact, this is something I want you to
think about this semester: what is YOUR theory of evaluation. Note:
Marvin Alkins (1969) definition of program theory on p.58 of Paynes
chapter and compare it with Will Shadishs definition of program
theory on page 33 in RFL. I suggest that you memorize Shadishs
definition of program theory. I am a strong advocate of evaluators
being aware of their evaluation theory.

In short, you may wish to pick one model as being of most
importance in your theory of evaluation. On the other hand, I my
theory of evaluation is a needs based or contingency theory of
evaluation. (By the way, I am probably most strongly influenced by
Will Shadishs evaluation writings.) In short, I like to select the model
that best fits the specific needs or situational characteristics of the
program evaluation I am conducting. Payne makes some similar
points in the last section of the chapter in the section titled
Metaphor Selection: In Praise of Eclecticism.

Now I will make some comments about each of the four approaches
to evaluation discussed by David Payne. I will also add some thoughts
not included by Payne.

1. Management Models

The basic idea of the management approach is that the evaluators
job is to provide information to management to help them in making
decisions about programs, products, etc. The evaluators job is to
serve managers (or whoever the key decision makers are).

One very popular management model used today is Michael Pattons
Utilization Focused Evaluation. (Note that Pattons model is not
discussed in Paynes chapter. You may want to examine the appendix
of RFL for pages where Pattons model is briefly discussed.) Basically,
Patton wants evaluators to provide information to primary intended
users, and not to even conduct an evaluation if it has little or no
potential for utilization. He wants evaluators to facilitate use as much
as possible. Pattons motto is to focus on intended use by intended
users. He recommends that evaluators work closely with primary
intended users so that their needs will be met. This requires focusing
on stakeholders key questions, issues, and intended uses. It also
requires involving intended users in the interpretation of the
findings, and then disseminating those findings so that they can be
used. One should also follow up on actual use. It is helpful to develop
a utilization plan and to outline what the evaluator and primary users
must do to result in the use of the evaluation findings. Ultimately,
evaluations should, according to Patton, be judged by their utility
and actual use. Pattons approach is discussed in detail in the
following book:

Patton, M.Q. (1997). Utilization-focused evaluation: The new
century text. Thousand Oaks, CA: Sage.

The first edition of Pattons Utilization-focused evaluation book was
published in 1978.

Another current giant in evaluation that fits into the management
oriented evaluation camp is Joseph Wholey, but I will not outline his
theory here (see, for example his 1979 book titled Evaluation:
Promise and Performance, or his 1983 book titled Evaluation and
Effective Public Management, or his edited 1994 book titled
Handbook of Practical Program Evaluation.).

Now I will make a few comments on the only management model
discussed by Payne (i.e., the CIPP Model).

Daniel Stufflebeams CIPP Model has been around for many years
(e.g., see Stufflebeam, et al. 1971), and it has been very popular in

The CIPP Model is a simple systems model applied to program
evaluation. A basic open system includes input, process, and output.
Stufflebeam added context, included input and process, and
relabeled output with the term product. Hence, CIPP stands for
context evaluation, input evaluation, process evaluation, and
product evaluation. These types are typically viewed as separate
forms of evaluation, but they can also be viewed as steps or stages in
a comprehensive evaluation.

Context evaluation includes examining and describing the context of
the program you are evaluating, conducting a needs and goals
assessment, determining the objectives of the program, and
determining whether the proposed objectives will be sufficiently
responsive to the identified needs. It helps in making program
planning decisions.

Input evaluation includes activities such as a description of the
program inputs and resources, a comparison of how the program
might perform compared to other programs, a prospective
benefit/cost assessment (i.e., decide whether you think the benefits
will outweigh the costs of the program, before the program is
actually implemented), an evaluation of the proposed design of the
program, and an examination of what alternative strategies and
procedures for the program should be considered and
recommended. In short, this type of evaluation examines what the
program plans on doing. It helps in making program structuring

Process evaluation includes examining how a program is being
implemented, monitoring how the program is performing, auditing
the program to make sure it is following required legal and ethical
guidelines, and identifying defects in the procedural design or in the
implementation of the program. It is here that evaluators provide
information about what is actually occurring in the program.
Evaluators typically provide this kind of feedback to program
personnel because it can be helpful in making formative evaluation
decisions (i.e., decisions about how to modify or improve the
program). In general, process evaluation helps in making
implementing decisions.

Product evaluation includes determining and examining the general
and specific outcomes of the program (i.e., which requires using
impact or outcome assessment techniques), measuring anticipated
outcomes, attempting to identify unanticipated outcomes, assessing
the merit of the program, conducting a retrospective benefit/cost
assessment (to establish the actual worth or value of the program),
and/or conducting a cost effectiveness assessment (to determine if
the program is cost effective compared to other similar programs).
Product evaluation is very helpful in making summative evaluation
decisions (e.g., What is the merit and worth of the program? Should
the program be continued?)

(By the way, formative evaluation is conducted for the purpose of
improving an evaluation object (evaluand) and summative
evaluation is conducted for the purpose of accountability which
requires determining the overall effectiveness or merit and worth of
an evaluation object. Formative evaluation information tends to be
used by program administrators and staff members, whereas
summative evaluation information tends to be used by high level
administrators and policy makers to assist them in making funding or
program continuation decisions. As I mentioned earlier (in lecture
one), the terms formative and summative evaluation were coined by
Michael Scriven in the late 1960s.)

Thinking of the CIPP Model, input and process evaluation tend to be
very helpful for formative evaluation and product evaluation tends to
be especially helpful for summative evaluation. Note, however, that
the other parts of the CIPP Model can sometimes be used for
formative or summative evaluative decisions. For example, product
evaluation may lead to program improvements (i.e., formative), and
process evaluation may lead to documentation that the program has
met delivery requirements set by law (i.e., summative).

As you can see, the CIPP Evaluation Model is quite comprehensive,
and one would often not use every part of the CIPP Model in a single
evaluation. On the other hand, it would be fruitful for you to think
about a small program (e.g., a training program in a local
organization) where you would go through all four steps or parts of
the CIPP Model. (Again, there are two different ways to view the
CIPP model: first as four distinct kinds of evaluation and second as
steps or stages in a comprehensive evaluation model.) The CIPP
Model is, in general, quite useful in helping us to focus on some very
important evaluation questions and issues and to think about some
different types or stages of evaluation.

Interestingly, Stufflebeam no longer talks about the CIPP Model. He
now seems to refer to his approach as Decision/Accountability-
Oriented Evaluation (see Stufflebeam, 2001, in the Sage book titled
Evaluation Models). (By the way, I generally do not recommend
Stufflebeams recent book titled Evaluation Models because he tends
to denigrate other useful approaches (in my opinion) while pushing
his own approach. In contrast, I advocate an eclectic approach to
evaluation or what Will Shadish calls needs based evaluation; needs
based evaluation is based on contingency theory because the type of
evaluation needed in a particular time and place is said to be
contingent upon many factors which must be determined and
considered by the evaluator.)

2. Judicial Models

Judicial or adversary-oriented evaluation is based on the judicial
metaphor. It is assumed here that the potential for evaluation bias
by a single evaluator cannot be ruled out, and, therefore, each side
should have a separate evaluator to make their case. For example,
one evaluator can examine and present the evidence for terminating
a program and another evaluator can examine and present the
evidence for continuing the program. A hearing of some sort is
conducted where each evaluator makes his or her case regarding the
evaluand. In a sense, this approach sets up a system of checks and
balances, by ensuring that all sides be heard, including alternative
explanations for the data. Obviously the quality of the different
evaluators must be equated for fairness. The ultimate decision is
made by some judge or arbiter who considers the arguments and the
evidence and then renders a decision.

One example, that includes multiple experts is the so called blue-
ribbon panel, where multiple experts of different backgrounds argue
the merits of some policy or program. Some committees also
operate, to some degree, along the lines of the judicial model.

As one set of authors put it, adversary evaluation has a built-in
metaevaluation (Worthen and Sanders, 1999). A metaevaluation is
simply an evaluation of an evaluation.

By showing the positive and negative aspects of a program,
considering alternative interpretations of the data, and examining
the strengths and weaknesses of the evaluation report
(metaevaluation), the adversary or judicial approach seems to have
some potential. On the other hand, it may lead to unnecessary
arguing, competition, and an indictment mentality. It can also be
quite expensive because of the requirement of multiple evaluators.
In general, formal judicial or adversary models are not often used in
program evaluation. It is, however, an interesting idea that may be
useful on occasion.

3. Anthropological Models

Payne includes under this heading the qualitative approaches to
program evaluation. For a review of qualitative research you can
review pages 17-21 and Chapter 11 in my research methods book
(Educational Research by Johnson and Christensen). (Remember that
IDE 510 or a very similar course is a prerequisite for IDE 660.) Briefly,
qualitative research tends to be exploratory, collect a lot of
descriptive data, and take an inductive approach to understanding
the world (i.e., looking at specifics and then trying to come up with
conclusions or generalizations about the what is observed). Payne
points out that you may want to view the group of people involved in
a program as forming a unique culture that can be systematically

Payne treats several approaches as being very similar and
anthropological in nature, including responsive evaluation
(Robert Stakes model), goal-free evaluation (developed by Scriven
as a supplement to his other evaluation approach), and naturalistic
evaluation (which is somewhat attributable to Guba and Lincoln,
who wrote a 1985 book titled Naturalistic Evaluation). Again, what
all these approaches have in common is that they tend to rely on the
qualitative research paradigm.

In all of these approaches the evaluator enters the field and
observes what is going on in the program. Participant and
nonparticipant observation are commonly used. Additional data are
also regularly collected (e.g., focus groups, interviews,
questionnaires, and secondary or extant data), especially for the
purpose of triangulation.

The key to Scrivens goal-free evaluation is to have an evaluator
enter the field and try to learn about a program and its results
inductively and without being aware of the specific objectives of the
program. Note that Scrivens approach is useful as a supplement to
the more traditional goal-oriented evaluation. Goal free evaluation is
done by a separate evaluator, who collects exploratory data to
supplement another evaluators goal-oriented data.

Payne next lists several strengths of qualitative evaluation. This list is
from a nice book by Michael Patton (titled How to Use Qualitative
Methods in Evaluation). Qualitative methods tend to be useful for
describing program implementation, studying process, studying
participation, getting program participants views or opinions about
program impact, and identifying program strengths and weaknesses.
Another strength is identifying unintended outcomes which may be
missed if you design a study only to measure certain specific

Next, Payne talks about Robert Stakes specific anthropological
model, which is called Responsive Evaluation. (By the way, Robert
(Bob) Stake also has a recent book on case study research which I
recommend you add to your library sometime (titled The Art of Case
Study Research, Sage Publications, 1997).) Stake uses the term
responsive because he wants evaluators to be flexible and
responsive to the concerns and issues of program stakeholders. He
also believes that qualitative methods provide the way to be the
most responsive. He uses a somewhat derogatory (I think) term to
refer to what he sees as the traditional evaluator. In particular, he
labels the traditional evaluation approach preordinate evaluation,
which means evaluation that relies only on formal plans and
measurement of pre-specified program objectives.

In explaining responsive evaluation, Stake says an educational
evaluation is responsive evaluation if it orients more directly to
program activities than to program intents; responds to audience
requirements for information; and if the different value-perspectives
present are referred to in reporting the success and failure of the
program (Stake, 1975).

Payne also shows Stakes events clock which shows the key
evaluation activities and events, while stressing that they do not
have to be done in a predetermined or linear order. Flexibility is the
key. Go where the data and your emerging conclusions and
opportunities lead you. Ultimately, the responsive evaluator
prepares a narrative or case study report on what he or she finds,
although it is also essential that the responsive evaluator present
findings informally to different stakeholders during the conduct of
the study to increase their input, participation, buy-in, and use of
findings. As you can see, responsive evaluation is very much a
participatory evaluation approach.

On page 74 Payne lists some strengths and weaknesses of the
Anthropological evaluation approach. He also gives a nice real world
example of an evaluation using the responsive approach.

4. Consumer Models

The last approach discussed by Payne is the consumer approach. The
primary evaluation theorist behind this approach is Michael Scriven.
Obviously this approach is based on the consumer product
metaphor. In other words, perhaps evaluators can obtain some
useful evaluation ideas from the field of consumer product
evaluation (which is exemplified by the magazine Consumer Reports).
As Payne mentions, the consumer approach is primarily summative.
For example, when you read Consumer Reports, your goal is to learn
if a product is good or not and how well it stacks up against similar
products and whether you want to purchase it. In short, you are
looking at the merit and worth (absolute and relative) of a particular
product. Note, however, that it is much more difficult to evaluate a
social or educational program that it is to evaluate, for example, an
automobile or a coffee maker. With an automobile or a coffee
maker, you can easily measure its specifications and performance. A
social program is a much more complex package, that includes many
elements and that requires an impact assessment using social
science research techniques to determine if the program works and
how it works.

Payne includes an excellent checklist (developed by Scriven, and
sometimes called an evaluation checklist) that you may want to use
when you are evaluating any type of evaluand (i.e., not just
consumer products).

As Payne points out, the consumer approach also holds some
promise for developing lists of programs that work, which can be
used by policy makers and others when developing or selecting
programs for specific problems. Payne also discuss the process of
how a program could get on to such a list.

Logic model
A logic model (also known as a logical framework, theory of change, or program matrix) is a tool
used most often by managers and evaluators of programs to evaluate the effectiveness of a
program. Logic models are usually a graphical depiction of the logical relationships between the
resources, activities, outputs and outcomes of a program.
While there are many ways in which
logic models can be presented, the underlying purpose of constructing a logic model is to assess
the "if-then" (causal) relationships between the elements of the program; if the resources are
available for a program, then the activities can be implemented, if the activities are implemented
successfully then certain outputs and outcomes can be expected. Logic models are most often
used in the evaluation stage of a program, they can however be used during planning and

In its simplest form, a logic model has four components:

Inputs Activities Outputs Outcomes/impacts
what resources go
into a program
what activities the
program undertakes
what is produced through
those activities
the changes or benefits that result from the
e.g. money, staff,
e.g. development of
materials, training
e.g. number of booklets
produced, workshops held,
people trained
e.g. increased skills/ knowledge/ confidence,
leading in longer-term to promotion, new job,
Following the early development of the logic model in the 1970s by Carol Weiss, Joseph
Wholey and others, many refinements and variations have been added to the basic concept.
Many versions of logic models set out a series of outcomes/impacts, explaining in more detail the
logic of how an intervention contributes to intended or observed results.
This will often include
distinguishing between short-term, medium-term and long-term results, and between direct and
indirect results.
Some logic models also include assumptions, which are beliefs the prospective grantees have
about the program, the people involved, and the context and the way the prospective grantees
think the program will work, and external factors, consisting of the environment in which the
program exists, including a variety of external factors that interact with and influence the program
University Cooperative Extension Programs in the US have developed a more elaborate logic
model, called the Program Action Logic Model, which includes six steps:
Inputs (what we invest)
Activities (the actual tasks we do)
Participation (who we serve; customers & stakeholders)
Engagement (how those we serve engage with the activities)
Short Term (learning: awareness, knowledge, skills, motivations)
Medium Term (action: behavior, practice, decisions, policies)
Long Term (consequences: social, economic, environmental etc.)
In front of Inputs, there is a description of a Situation and Priorities. These are the considerations
that determine what Inputs will be needed.
The University of Wisconsin Extension offers a series of guidance documents
on the use of
logic models. There is also an extensive bibliography
of work on this program logic model.
By describing work in this way, managers have an easier way to define the work and measure
it. Performance measures can be drawn from any of the steps. One of the key insights of the
logic model is the importance of measuring final outcomes or results, because it is quite possible
to waste time and money (inputs), "spin the wheels" on work activities, or produce outputs
without achieving desired outcomes. It is these outcomes (impacts, long-term results) that are
the only justification for doing the work in the first place. For commercial organizations, outcomes
relate to profit. For not-for-profit or governmental organizations, outcomes relate to successful
achievement of mission or program goals.
Uses of the logic model[edit]
Program planning[edit]
One of the most important uses of the logic model is for program planning. Here it helps
managers to 'plan with the end in mind' Stephen Covey, rather than just consider inputs (e.g.
budgets, employees) or just the tasks that must be done. In the past, program logic has been
justified by explaining the process from the perspective of an insider. Paul McCawley (no date)
outlines how this process was approached:
1. We invest this time/money so that we can generate this activity/product.
2. The activity/product is needed so people will learn how to do this.
3. People need to learn that so they can apply their knowledge to this practice.
4. When that practice is applied, the effect will be to change this condition
5. When that condition changes, we will no longer be in this situation.
While logic models have been used in this way successfully, Millar et al. (1999) has suggested
that following the above sequence, from the inputs through to the outcomes, could limit ones
thinking to the existing activities, programs and research questions. Instead, by using the logic
model to focus on the intended outcomes of a particular program the questions change from
what is being done? to what needs to be done? McCawley (no date) suggests that by using
this new reasoning, a logic model for a program can be built by asking the following questions in
1. What is the current situation that we intend to impact?
2. What will it look like when we achieve the desired situation or outcome?
3. What behaviors need to change for that outcome to be achieved?
4. What knowledge or skills do people need before the behavior will change?
5. What activities need to be performed to cause the necessary learning?
6. What resources will be required to achieve the desired outcome?
By placing the focus on ultimate outcomes or results, planners can think backwards through the
logic model to identify how best to achieve the desired results. Planners therefore need to
understand the difference between the categories of the logic model.
Performance evaluation[edit]
The logic model is often used in government or not-for-profit organizations, where the mission
and vision are not aimed at achieving a financial benefit. In such situations, where profit is not the
intended result, it may be difficult to monitor progress toward outcomes. A program logic model
provides such indicators, in terms of output and outcome measures of performance. It is
therefore important in these organizations to carefully specify the desired results, and consider
how to monitor them over time. Often, such as in education or social programs, the outcomes are
long-term and mission success is far in the future. In these cases, intermediate or shorter-term
outcomes may be identified that provide an indication of progress toward the ultimate long-term
Traditionally, government programs were described only in terms of their budgets. It is easy to
measure the amount of money spent on a program, but this is a poor indicator of mission
success. Likewise it is relatively easy to measure the amount of work done (e.g. number of
workers or number of years spent), but the workers may have just been 'spinning their wheels'
without getting very far in terms of ultimate results or outcomes. The production of outputs is a
better indicator that something was delivered to customers, but it is still possible that the output
did not really meet the customer's needs, was not used, etc. Therefore, the focus on results or
outcomes has become a mantra in government and not-for-profit programs.
The President's Management Agenda
is an example of the increasing emphasis on results
in government management. It states:
"Government likes to begin things to declare grand new programs and causes. But good
beginnings are not the measure of success. What matters in the end is completion. Performance.

However, although outcomes are used as the primary indicators of program success or failure
they are still insufficient. Outcomes may easily be achieved through processes independent of
the program and an evaluation of those outcomes would suggest program success when in fact
external outputs were responsible for the outcomes (Rossi, Lipsey and Freeman, 2004). In this
respect, Rossi, Lipsey and Freeman (2004) suggest that a typical evaluation study should
concern itself with measuring how the process indicators (inputs and outputs) have had an effect
on the outcome indicators. A program logic model would need to be assessed or designed in
order for an evaluation of these standards to be possible. The logic model can and, indeed,
should be used in both formative (during the implementation to offer the chance to improve the
program) and summative (after the completion of the program) evaluations.
Program evaluation offers a way to understand and improve community health and
development practice using methods that are useful, feasible, proper, and accurate. The
framework described below is a practical non-prescriptive tool that summarizes in a logical
order the important elements of program evaluation.
Steps in evaluation practice, and
Standards for "good" evaluation.

The six connected steps of the framework are actions that should be a part of any evaluation.
Although in practice the steps may be encountered out of order, it will usually make sense to
follow them in the recommended sequence. That's because earlier steps provide the
foundation for subsequent progress. Thus, decisions about how to carry out a given step
should not be finalized until prior steps have been thoroughly addressed.
However, these steps are meant to be adaptable, not rigid. Sensitivity to each program's
unique context (for example, the program's history and organizational climate) is essential for
sound evaluation. They are intended to serve as starting points around which community
organizations can tailor an evaluation to best meet their needs.
Engage stakeholders
Describe the program
Focus the evaluation design
Gather credible evidence
Justify conclusions
Ensure use and share lessons learned
Understanding and adhering to these basic steps will improve most evaluation efforts.
The second part of the framework is a basic set of standards to assess the quality of
evaluation activities. There are 30 specific standards, organized into the following four
These standards help answer the question, "Will this evaluation be a 'good' evaluation?" They
are recommended as the initial criteria by which to judge the quality of the program
evaluation efforts.
Stakeholders are people or organizations that have something to gain or lose from what will
be learned from an evaluation, and also in what will be done with that knowledge. Evaluation
cannot be done in isolation. Almost everything done in community health and development
work involves partnerships - alliances among different organizations, board members, those
affected by the problem, and others. Therefore, any serious effort to evaluate a program must
consider the different values held by the partners. Stakeholders must be part of the evaluation
to ensure that their unique perspectives are understood. When stakeholders are not
appropriately involved, evaluation findings are likely to be ignored, criticized, or resisted.
However, if they are part of the process, people are likely to feel a good deal of ownership for
the evaluation process and results. They will probably want to develop it, defend it, and make
sure that the evaluation really works.
That's why this evaluation cycle begins by engaging stakeholders. Once involved, these
people will help to carry out each of the steps that follows.
People or organizations involved in program operations may include community
members, sponsors, collaborators, coalition partners, funding officials, administrators,
managers, and staff.
People or organizations served or affected by the program may include clients, family
members, neighborhood organizations, academic institutions, elected and appointed officials,
advocacy groups, and community residents. Individuals who are openly skeptical of or
antagonistic toward the program may also be important to involve. Opening an evaluation to
opposing perspectives and enlisting the help of potential program opponents can strengthen
the evaluation's credibility.
Likewise, individuals or groups who could be adversely or inadvertently affected by changes
arising from the evaluation have a right to be engaged. For example, it is important to include
those who would be affected if program services were expanded, altered, limited, or ended as
a result of the evaluation.
Primary intended users of the evaluation are the specific individuals who are in a position
to decide and/or do something with the results.They shouldn't be confused with primary
intended users of the program, although some of them should be involved in this group. In
fact, primary intended users should be a subset of all of the stakeholders who have been
identified. A successful evaluation will designate primary intended users, such as program
staff and funders, early in its development and maintain frequent interaction with them to be
sure that the evaluation specifically addresses their values and needs.
The amount and type of stakeholder involvement will be different for each program
evaluation. For instance, stakeholders can be directly involved in designing and conducting
the evaluation. They can be kept informed about progress of the evaluation through periodic
meetings, reports, and other means of communication.
It may be helpful, when working with a group such as this, to develop an explicit process to
share power and resolve conflicts. This may help avoid overemphasis of values held by any
specific stakeholder.
A program description is a summary of the intervention being evaluated. It should explain
what the program is trying to accomplish and how it tries to bring about those changes. The
description will also illustrate the program's core components and elements, its ability to
make changes, its stage of development, and how the program fits into the larger
organizational and community environment.
How a program is described sets the frame of reference for all future decisions about its
evaluation. For example, if a program is described as, "attempting to strengthen enforcement
of existing laws that discourage underage drinking," the evaluation might be very different
than if it is described as, "a program to reduce drunk driving by teens." Also, the description
allows members of the group to compare the program to other similar efforts, and it makes it
easier to figure out what parts of the program brought about what effects.
Moreover, different stakeholders may have different ideas about what the program is
supposed to achieve and why. For example, a program to reduce teen pregnancy may have
some members who believe this means only increasing access to contraceptives, and other
members who believe it means only focusing on abstinence.
Evaluations done without agreement on the program definition aren't likely to be very useful.
In many cases, the process of working with stakeholders to develop a clear and logical
program description will bring benefits long before data are available to measure program
Statement of need
A statement of need describes the problem, goal, or opportunity that the program addresses; it
also begins to imply what the program will do in response. Important features to note
regarding a program's need are: the nature of the problem or goal, who is affected, how big it
is, and whether (and how) it is changing.
Expectations are the program's intended results. They describe what the program has to
accomplish to be considered successful. For most programs, the accomplishments exist on a
continuum (first, we want to accomplish X... then, we want to do Y...). Therefore, they
should be organized by time ranging from specific (and immediate) to broad (and longer-
term) consequences. For example, a program's vision, mission, goals, and objectives, all
represent varying levels of specificity about a program's expectations.
Activities are everything the program does to bring about changes. Describing program
components and elements permits specific strategies and actions to be listed in logical
sequence. This also shows how different program activities, such as education and
enforcement, relate to one another. Describing program activities also provides an
opportunity to distinguish activities that are the direct responsibility of the program from
those that are conducted by related programs or partner organizations. Things outside of the
program that may affect its success, such as harsher laws punishing businesses that sell
alcohol to minors, can also be noted.
Resources include the time, talent, equipment, information, money, and other assets available
to conduct program activities. Reviewing the resources a program has tells a lot about the
amount and intensity of its services. It may also point out situations where there is a
mismatch between what the group wants to do and the resources available to carry out these
activities. Understanding program costs is a necessity to assess the cost-benefit ratio as part
of the evaluation.
Stage of development
A program's stage of development reflects its maturity. All community health and
development programs mature and change over time. People who conduct evaluations, as
well as those who use their findings, need to consider the dynamic nature of programs. For
example, a new program that just received its first grant may differ in many respects from
one that has been running for over a decade.
At least three phases of development are commonly recognized: planning, implementation,
and effects or outcomes. In the planning stage, program activities are untested and the goal of
evaluation is to refine plans as much as possible. In the implementation phase, program
activities are being field tested and modified; the goal of evaluation is to see what happens in
the "real world" and to improve operations. In the effects stage, enough time has passed for
the program's effects to emerge; the goal of evaluation is to identify and understand the
program's results, including those that were unintentional.
A description of the program's context considers the important features of the environment in
which the program operates. This includes understanding the area's history, geography,
politics, and social and economic conditions, and also what other organizations have done. A
realistic and responsive evaluation is sensitive to a broad range of potential influences on the
program. An understanding of the context lets users interpret findings accurately and assess
their generalizability. For example, a program to improve housing in an inner-city
neighborhood might have been a tremendous success, but would likely not work in a small
town on the other side of the country without significant adaptation.
Logic model
A logic model synthesizes the main program elements into a picture of how the program is
supposed to work. It makes explicit the sequence of events that are presumed to bring about
change. Often this logic is displayed in a flow-chart, map, or table to portray the sequence of
steps leading to program results.
Creating a logic model allows stakeholders to improve and focus program direction. It reveals
assumptions about conditions for program effectiveness and provides a frame of reference for
one or more evaluations of the program. A detailed logic model can also be a basis for
estimating the program's effect on endpoints that are not directly measured. For example, it
may be possible to estimate the rate of reduction in disease from a known number of persons
experiencing the intervention if there is prior knowledge about its effectiveness.
The breadth and depth of a program description will vary for each program evaluation. And
so, many different activities may be part of developing that description. For instance, multiple
sources of information could be pulled together to construct a well-rounded description. The
accuracy of an existing program description could be confirmed through discussion with
stakeholders. Descriptions of what's going on could be checked against direct observation of
activities in the field. A narrow program description could be fleshed out by addressing
contextual factors (such as staff turnover, inadequate resources, political pressures, or strong
community participation) that may affect program performance.
By focusing the evaluation design, we mean doing advance planning about where the
evaluation is headed, and what steps it will take to get there. It isn't possible or useful for an
evaluation to try to answer all questions for all stakeholders; there must be a focus. A well-
focused plan is a safeguard against using time and resources inefficiently.
Depending on what you want to learn, some types of evaluation will be better suited than
others. However, once data collection begins, it may be difficult or impossible to change
what you are doing, even if it becomes obvious that other methods would work better. A
thorough plan anticipates intended uses and creates an evaluation strategy with the greatest
chance to be useful, feasible, proper, and accurate.
Purpose refers to the general intent of the evaluation. A clear purpose serves as the basis for
the design, methods, and use of the evaluation. Taking time to articulate an overall purpose
will stop your organization from making uninformed decisions about how the evaluation
should be conducted and used.
There are at least four general purposes for which a community group might conduct
an evaluation:
To gain insight.This happens, for example, when deciding whether to use a new approach
(e.g., would a neighborhood watch program work for our community?) Knowledge from such
an evaluation will provide information about its practicality. For a developing program,
information from evaluations of similar programs can provide the insight needed to clarify
how its activities should be designed.
To improve how things get done.This is appropriate in the implementation stage when an
established program tries to describe what it has done. This information can be used to
describe program processes, to improve how the program operates, and to fine-tune the
overall strategy. Evaluations done for this purpose include efforts to improve the quality,
effectiveness, or efficiency of program activities.
To determine what the effects of the program are. Evaluations done for this purpose
examine the relationship between program activities and observed consequences. For
example, are more students finishing high school as a result of the program? Programs most
appropriate for this type of evaluation are mature programs that are able to state clearly what
happened and who it happened to. Such evaluations should provide evidence about what the
program's contribution was to reaching longer-term goals such as a decrease in child abuse or
crime in the area. This type of evaluation helps establish the accountability, and thus, the
credibility, of a program to funders and to the community.
To affect those who participate in it. The logic and reflection required of evaluation
participants can itself be a catalyst for self-directed change. And so, one of the purposes of
evaluating a program is for the process and results to have a positive influence. Such
influences may:
o Empower program participants (for example, being part of an evaluation can increase
community members' sense of control over the program);
o Supplement the program (for example, using a follow-up questionnaire can reinforce
the main messages of the program);
o Promote staff development (for example, by teaching staff how to collect, analyze,
and interpret evidence); or
o Contribute to organizational growth (for example, the evaluation may clarify how the
program relates to the organization's mission).
Users are the specific individuals who will receive evaluation findings. They will directly
experience the consequences of inevitable trade-offs in the evaluation process. For example, a
trade-off might be having a relatively modest evaluation to fit the budget with the outcome
that the evaluation results will be less certain than they would be for a full-scale evaluation.
Because they will be affected by these tradeoffs, intended users have a right to participate in
choosing a focus for the evaluation. An evaluation designed without adequate user
involvement in selecting the focus can become a misguided and irrelevant exercise. By
contrast, when users are encouraged to clarify intended uses, priority questions, and preferred
methods, the evaluation is more likely to focus on things that will inform (and influence)
future actions.
Uses describe what will be done with what is learned from the evaluation. There is a wide
range of potential uses for program evaluation. Generally speaking, the uses fall in the same
four categories as the purposes listed above: to gain insight, improve how things get done,
determine what the effects of the program are, and affect participants. The following list
gives examples of uses in each category.
o Assess needs and wants of community members
o Identify barriers to use of the program
o Learn how to best describe and measure program activities
o Refine plans for introducing a new practice
o Determine the extent to which plans were implemented
o Improve educational materials
o Enhance cultural competence
o Verify that participants' rights are protected
o Set priorities for staff training
o Make mid-course adjustments
o Clarify communication
o Determine if client satisfaction can be improved
o Compare costs to benefits
o Find out which participants benefit most from the program
o Mobilize community support for the program
o Assess skills development by program participants
o Compare changes in behavior over time
o Decide where to allocate new resources
o Document the level of success in accomplishing objectives
o Demonstrate that accountability requirements are fulfilled
o Use information from multiple evaluations to predict the likely effects of similar
o Reinforce messages of the program
o Stimulate dialogue and raise awareness about community issues
o Broaden consensus among partners about program goals
o Teach evaluation skills to staff and other stakeholders
o Gather success stories
o Support organizational change and improvement
The evaluation needs to answer specific questions. Drafting questions encourages
stakeholders to reveal what they believe the evaluation should answer. That is, what
questions are more important to stakeholders? The process of developing evaluation
questions further refines the focus of the evaluation.
The methods available for an evaluation are drawn from behavioral science and social
research and development. Three types of methods are commonly recognized. They are
experimental, quasi-experimental, and observational or case study designs. Experimental
designs use random assignment to compare the effect of an intervention between otherwise
equivalent groups (for example, comparing a randomly assigned group of students who took
part in an after-school reading program with those who didn't). Quasi-experimental methods
make comparisons between groups that aren't equal (e.g. program participants vs. those on a
waiting list) or use of comparisons within a group over time, such as in an interrupted time
series in which the intervention may be introduced sequentially across different individuals,
groups, or contexts. Observational or case study methods use comparisons within a group to
describe and explain what happens (e.g., comparative case studies with multiple
No design is necessarily better than another. Evaluation methods should be selected because
they provide the appropriate information to answer stakeholders' questions, not because they
are familiar, easy, or popular. The choice of methods has implications for what will count as
evidence, how that evidence will be gathered, and what kind of claims can be made. Because
each method option has its own biases and limitations, evaluations that mix methods are
generally more robust.
Over the course of an evaluation, methods may need to be revised or modified.
Circumstances that make a particular approach useful can change. For example, the intended
use of the evaluation could shift from discovering how to improve the program to helping
decide about whether the program should continue or not. Thus, methods may need to be
adapted or redesigned to keep the evaluation on track.
Agreements summarize the evaluation procedures and clarify everyone's roles and
responsibilities. An agreement describes how the evaluation activities will be implemented.
Elements of an agreement include statements about the intended purpose, users, uses, and
methods, as well as a summary of the deliverables, those responsible, a timeline, and budget.
The formality of the agreement depends upon the relationships that exist between those
involved. For example, it may take the form of a legal contract, a detailed protocol, or a
simple memorandum of understanding. Regardless of its formality, creating an explicit
agreement provides an opportunity to verify the mutual understanding needed for a successful
evaluation. It also provides a basis for modifying procedures if that turns out to be necessary.
As you can see, focusing the evaluation design may involve many activities. For instance,
both supporters and skeptics of the program could be consulted to ensure that the proposed
evaluation questions are politically viable. A menu of potential evaluation uses appropriate
for the program's stage of development could be circulated among stakeholders to determine
which is most compelling. Interviews could be held with specific intended users to better
understand their information needs and timeline for action. Resource requirements could be
reduced when users are willing to employ more timely but less precise evaluation methods.
Credible evidence is the raw material of a good evaluation. The information learned should
be seen by stakeholders as believable, trustworthy, and relevant to answer their questions.
This requires thinking broadly about what counts as "evidence." Such decisions are always
situational; they depend on the question being posed and the motives for asking it. For some
questions, a stakeholder's standard for credibility could demand having the results of a
randomized experiment. For another question, a set of well-done, systematic observations
such as interactions between an outreach worker and community residents, will have high
credibility. The difference depends on what kind of information the stakeholders want and the
situation in which it is gathered.
Context matters! In some situations, it may be necessary to consult evaluation specialists.
This may be especially true if concern for data quality is especially high. In other
circumstances, local people may offer the deepest insights. Regardless of their expertise,
however, those involved in an evaluation should strive to collect information that will convey
a credible, well-rounded picture of the program and its efforts.
Having credible evidence strengthens the evaluation results as well as the recommendations
that follow from them. Although all types of data have limitations, it is possible to improve
an evaluation's overall credibility. One way to do this is by using multiple procedures for
gathering, analyzing, and interpreting data. Encouraging participation by stakeholders can
also enhance perceived credibility. When stakeholders help define questions and gather data,
they will be more likely to accept the evaluation's conclusions and to act on its
Indicators translate general concepts about the program and its expected effects into specific,
measurable parts.
Examples of indicators include:
The program's capacity to deliver services
The participation rate
The level of client satisfaction
The amount of intervention exposure (how many people were exposed to the program, and for
how long they were exposed)
Changes in participant behavior
Changes in community conditions or norms
Changes in the environment (e.g., new programs, policies, or practices)
Longer-term changes in population health status (e.g., estimated teen pregnancy rate in the
Indicators should address the criteria that will be used to judge the program. That is, they
reflect the aspects of the program that are most meaningful to monitor. Several indicators are
usually needed to track the implementation and effects of a complex program or intervention.
One way to develop multiple indicators is to create a "balanced scorecard," which contains
indicators that are carefully selected to complement one another. According to this strategy,
program processes and effects are viewed from multiple perspectives using small groups of
related indicators. For instance, a balanced scorecard for a single program might include
indicators of how the program is being delivered; what participants think of the program;
what effects are observed; what goals were attained; and what changes are occurring in the
environment around the program.
Another approach to using multiple indicators is based on a program logic model, such as we
discussed earlier in the section. A logic model can be used as a template to define a full
spectrum of indicators along the pathway that leads from program activities to expected
effects. For each step in the model, qualitative and/or quantitative indicators could be
Indicators can be broad-based and don't need to focus only on a program's long -term goals.
They can also address intermediary factors that influence program effectiveness, including
such intangible factors as service quality, community capacity, or inter -organizational
relations. Indicators for these and similar concepts can be created by systematically
identifying and then tracking markers of what is said or done when the concept is expressed.
In the course of an evaluation, indicators may need to be modified or new ones adopted.
Also, measuring program performance by tracking indicators is only one part of evaluation,
and shouldn't be confused as a basis for decision making in itself. There are definite perils to
using performance indicators as a substitute for completing the evaluation process and
reaching fully justified conclusions. For example, an indicator, such as a rising rate of
unemployment, may be falsely assumed to reflect a failing program when it may actually be
due to changing environmental conditions that are beyond the program's control.
Sources of evidence in an evaluation may be people, documents, or observations. More than
one source may be used to gather evidence for each indicator. In fact, selecting multiple
sources provides an opportunity to include different perspectives about the program and
enhances the evaluation's credibility. For instance, an inside perspective may be reflected by
internal documents and comments from staff or program managers; whereas clients and those
who do not support the program may provide different, but equally relevant perspectives.
Mixing these and other perspectives provides a more comprehensive view of the program or
The criteria used to select sources should be clearly stated so that users and other
stakeholders can interpret the evidence accurately and assess if it may be biased. In addition,
some sources provide information in narrative form (for example, a person's experience when
taking part in the program) and others are numerical (for example, how many people were
involved in the program). The integration of qualitative and quantitative information can
yield evidence that is more complete and more useful, thus meeting the needs and
expectations of a wider range of stakeholders.
Quality refers to the appropriateness and integrity of information gathered in an evaluation.
High quality data are reliable and informative. It is easier to collect if the indicators have
been well defined. Other factors that affect quality may include instrument design, data
collection procedures, training of those involved in data collection, source selection, coding,
data management, and routine error checking. Obtaining quality data will entail tradeoffs
(e.g. breadth vs. depth); stakeholders should decide together what is most important to them.
Because all data have limitations, the intent of a practical evaluation is to strive for a level of
quality that meets the stakeholders' threshold for credibility.
Quantity refers to the amount of evidence gathered in an evaluation. It is necessary to
estimate in advance the amount of information that will be required and to establish criteria to
decide when to stop collecting data - to know when enough is enough. Quantity affects the
level of confidence or precision users can have - how sure we are that what we've learned is
true. It also partly determines whether the evaluation will be able to detect effects. All
evidence collected should have a clear, anticipated use.
By logistics, we mean the methods, timing, and physical infrastructure for gathering and
handling evidence. People and organizations also have cultural preferences that dictate
acceptable ways of asking questions and collecting information, including who would be
perceived as an appropriate person to ask the questions. For example, some participants may
be unwilling to discuss their behavior with a stranger, whereas others are more at ease with
someone they don't know. Therefore, the techniques for gathering evidence in an evaluation
must be in keeping with the cultural norms of the community. Data collection procedures
should also ensure that confidentiality is protected.
The process of justifying conclusions recognizes that evidence in an evaluation does not
necessarily speak for itself. Evidence must be carefully considered from a number of different
stakeholders' perspectives to reach conclusions that are well -substantiated and justified.
Conclusions become justified when they are linked to the evidence gathered and judged
against agreed-upon values set by the stakeholders. Stakeholders must agree that conclusions
are justified in order to use the evaluation results with confidence.
Standards reflect the values held by stakeholders about the program. They provide the basis
to make program judgments. The use of explicit standards for judgment is fundamental to
sound evaluation. In practice, when stakeholders articulate and negotiate their values, these
become the standards to judge whether a given program's performance will, for instance, be
considered "successful," "adequate," or "unsuccessful."
Analysis and synthesis
Analysis and synthesis are methods to discover and summarize an evaluation's findings. They
are designed to detect patterns in evidence, either by isolating important findings (analysis) or
by combining different sources of information to reach a larger understanding (synthesis).
Mixed method evaluations require the separate analysis of each evidence element, as well as
a synthesis of all sources to examine patterns that emerge. Deciphering facts from a given
body of evidence involves deciding how to organize, classify, compare, and display
information. These decisions are guided by the questions being asked, the types of data
available, and especially by input from stakeholders and primary intended users.
Interpretation is the effort to figure out what the findings mean. Uncovering facts about a
program's performance isn't enough to make conclusions. The facts must be interpreted to
understand their practical significance. For example, saying, "15 % of the people in our area
witnessed a violent act last year," may be interpreted differently depending on the situation.
For example, if 50% of community members had watched a violent act in the last year when
they were surveyed five years ago, the group can suggest that, while still a problem, things
are getting better in the community. However, if five years ago only 7% of those surveyed
said the same thing, community organizations may see this as a sign that they might want to
change what they are doing. In short, interpretations draw on information and perspectives
that stakeholders bring to the evaluation. They can be strengthened through active
participation or interaction with the data and preliminary explanations of what happened.
Judgments are statements about the merit, worth, or significance of the program. They are
formed by comparing the findings and their interpretations against one or more selected
standards. Because multiple standards can be applied to a given program, stakeholders may
reach different or even conflicting judgments. For instance, a program that increases its
outreach by 10% from the previous year may be judged positively by program managers,
based on standards of improved performance over time. Community members, however, may
feel that despite improvements, a minimum threshold of access to services has still not been
reached. Their judgment, based on standards of social equity, would therefore be negative.
Conflicting claims about a program's quality, value, or importance often indicate that
stakeholders are using different standards or values in making judgments. This type of
disagreement can be a catalyst to clarify values and to negotiate the appropriate basis (or
bases) on which the program should be judged.
Recommendations are actions to consider as a result of the evaluation. Forming
recommendations requires information beyond just what is necessary to form judgments. For
example, knowing that a program is able to increase the services available to battered women
doesn't necessarily translate into a recommendation to continue the effort, particularly when
there are competing priorities or other effective alternatives. Thus, recommendations about
what to do with a given intervention go beyond judgments about a specific program's
If recommendations aren't supported by enough evidence, or if they aren't in keeping with
stakeholders' values, they can really undermine an evaluation's credibility. By contrast, an
evaluation can be strengthened by recommendations that anticipate and react to what users
will want to know.
Sharing draft recommendations
Soliciting reactions from multiple stakeholders
Presenting options instead of directive advice
Justifying conclusions in an evaluation is a process that involves different possible steps. For
instance, conclusions could be strengthened by searching for alternative explanations from
the ones you have chosen, and then showing why they are unsupported by the evidence.
When there are different but equally well supported conclusions, each could be presented
with a summary of their strengths and weaknesses. Techniques to analyze, synthesize, and
interpret findings might be agreed upon before data collection begins.
It is naive to assume that lessons learned in an evaluation will necessarily be used in decision
making and subsequent action. Deliberate effort on the part of evaluators is needed to ensure
that the evaluation findings will be used appropriately. Preparing for their use involves
strategic thinking and continued vigilance in looking for opportunities to communicate and
influence. Both of these should begin in the earliest stages of the process and continue
throughout the evaluation.
Design refers to how the evaluation's questions, methods, and overall processes are
constructed. As discussed in the third step of this framework (focusing the evaluation design),
the evaluation should be organized from the start to achieve specific agreed-upon uses.
Having a clear purpose that is focused on the use of what is learned helps those who will
carry out the evaluation to know who will do what with the findings. Furthermore, the
process of creating a clear design will highlight ways that stakeholders, through their many
contributions, can improve the evaluation and facilitate the use of the results.
Preparation refers to the steps taken to get ready for the future uses of the evaluation
findings. The ability to translate new knowledge into appropriate action is a skill that can be
strengthened through practice. In fact, building this skill can itself be a useful benefit of the
evaluation. It is possible to prepare stakeholders for future use of the results by discussing
how potential findings might affect decision making.
For example, primary intended users and other stakeholders could be given a set of
hypothetical results and asked what decisions or actions they would make on the basis of this
new knowledge. If they indicate that the evidence presented is incomplete or irrelevant and
that no action would be taken, then this is an early warning sign that the planned evaluation
should be modified. Preparing for use also gives stakeholders more time to explore both
positive and negative implications of potential results and to identify different options for
program improvement.
Feedback is the communication that occurs among everyone involved in the evaluation.
Giving and receiving feedback creates an atmosphere of trust among stakeholders; it keeps an
evaluation on track by keeping everyone informed about how the evaluation is proceeding.
Primary intended users and other stakeholders have a right to comment on evaluation
decisions. From a standpoint of ensuring use, stakeholder feedback is a necessary part of
every step in the evaluation. Obtaining valuable feedback can be encouraged by holding
discussions during each step of the evaluation and routinely sharing interim findings,
provisional interpretations, and draft reports.
Follow-up refers to the support that many users need during the evaluation and after they
receive evaluation findings. Because of the amount of effort required, reaching justified
conclusions in an evaluation can seem like an end in itself. It is not. Active follow-up may be
necessary to remind users of the intended uses of what has been learned. Follow-up may also
be required to stop lessons learned from becoming lost or ignored in the process of making
complex or political decisions. To guard against such oversight, it may be helpful to have
someone involved in the evaluation serve as an advocate for the evaluation's findings during
the decision -making phase.
Facilitating the use of evaluation findings also carries with it the responsibility to prevent
misuse. Evaluation results are always bounded by the context in which the evaluation was
conducted. Some stakeholders, however, may be tempted to take results out of context or to
use them for different purposes than what they were developed for. For instance, over-
generalizing the results from a single case study to make decisions that affect all sites in a
national program is an example of misuse of a case study evaluation.
Similarly, program opponents may misuse results by overemphasizing negative findings
without giving proper credit for what has worked. Active follow-up can help to prevent these
and other forms of misuse by ensuring that evidence is only applied to the questions that were
the central focus of the evaluation.
Dissemination is the process of communicating the procedures or the lessons learned from an
evaluation to relevant audiences in a timely, unbiased, and consistent fashion. Like other
elements of the evaluation, the reporting strategy should be discussed in advance with
intended users and other stakeholders. Planning effective communications also requires
considering the timing, style, tone, message source, vehicle, and format of information
products. Regardless of how communications are constructed, the goal for dissemination is to
achieve full disclosure and impartial reporting.
Along with the uses for evaluation findings, there are also uses that flow from the very
process of evaluating. These "process uses" should be encouraged. The people who take part
in an evaluation can experience profound changes in beliefs and behavior. For instance, an
evaluation challenges staff members to act differently in what they are doing, and to question
assumptions that connect program activities with intended effects.
Evaluation also prompts staff to clarify their understanding of the goals of the program. This
greater clarity, in turn, helps staff members to better function as a team focused on a common
end. In short, immersion in the logic, reasoning, and values of evaluation can have very
positive effects, such as basing decisions on systematic judgments instead of on unfounded
Additional process uses for evaluation include:
By defining indicators, what really matters to stakeholders becomes clear
It helps make outcomes matter by changing the reinforcements connected with achieving
positive results. For example, a funder might offer "bonus grants" or "outcome dividends" to a
program that has shown a significant amount of community change and improvement.
There are standards to assess whether all of the parts of an evaluation are well -designed and
working to their greatest potential. The Joint Committee on Educational Evaluation
developed "The Program Evaluation Standards" for this purpose. These standards, designed
to assess evaluations of educational programs, are also relevant for programs and
interventions related to community health and development.
The program evaluation standards make it practical to conduct sound and fair evaluations.
They offer well-supported principles to follow when faced with having to make tradeoffs or
compromises. Attending to the standards can guard against an imbalanced evaluation, such as
one that is accurate and feasible, but isn't very useful or sensitive to the context. Another
example of an imbalanced evaluation is one that would be genuinely useful, but is impossible
to carry out.
The following standards can be applied while developing an evaluation design and
throughout the course of its implementation. Remember, the standards are written as guiding
principles, not as rigid rules to be followed in all situations.
The utility standards are:
Stakeholder Identification: People who are involved in (or will be affected by) the
evaluation should be identified, so that their needs can be addressed.
Evaluator Credibility: The people conducting the evaluation should be both trustworthy and
competent, so that the evaluation will be generally accepted as credible or believable.
Information Scope and Selection: Information collected should address pertinent questions
about the program, and it should be responsive to the needs and interests of clients and other
specified stakeholders.
Values Identification: The perspectives, procedures, and rationale used to interpret the
findings should be carefully described, so that the bases for judgments about merit and value
are clear.
Report Clarity: Evaluation reports should clearly describe the program being evaluated,
including its context, and the purposes, procedures, and findings of the evaluation. This will
help ensure that essential information is provided and easily understood.
Report Timeliness and Dissemination: Significant midcourse findings and evaluation
reports should be shared with intended users so that they can be used in a timely fashion.
Evaluation Impact: Evaluations should be planned, conducted, and reported in ways that
encourage follow-through by stakeholders, so that the evaluation will be used.
The feasibility standards are to ensure that the evaluation makes sense - that the steps that are
planned are both viable and pragmatic.
The feasibility standards are:
Practical Procedures: The evaluation procedures should be practical, to keep disruption of
everyday activities to a minimum while needed information is obtained.
Political Viability: The evaluation should be planned and conducted with anticipation of the
different positions or interests of various groups. This should help in obtaining their
cooperation so that possible attempts by these groups to curtail evaluation operations or to
misuse the results can be avoided or counteracted.
Cost Effectiveness: The evaluation should be efficient and produce enough valuable
information that the resources used can be justified.
The propriety standards ensure that the evaluation is an ethical one, conducted with regard for
the rights and interests of those involved. The eight propriety standards follow.
Service Orientation: Evaluations should be designed to help organizations effectively serve
the needs of all of the targeted participants.
Formal Agreements: The responsibilities in an evaluation (what is to be done, how, by
whom, when) should be agreed to in writing, so that those involved are obligated to follow all
conditions of the agreement, or to formally renegotiate it.
Rights of Human Subjects: Evaluation should be designed and conducted to respect and
protect the rights and welfare of human subjects, that is, all participants in the study.
Human Interactions: Evaluators should respect basic human dignity and worth when
working with other people in an evaluation, so that participants don't feel threatened or
Complete and Fair Assessment: The evaluation should be complete and fair in its
examination, recording both strengths and weaknesses of the program being evaluated. This
allows strengths to be built upon and problem areas addressed.
Disclosure of Findings: The people working on the evaluation should ensure that all of the
evaluation findings, along with the limitations of the evaluation, are accessible to everyone
affected by the evaluation, and any others with expressed legal rights to receive the results.
Conflict of Interest: Conflict of interest should be dealt with openly and honestly, so that it
does not compromise the evaluation processes and results.
Fiscal Responsibility: The evaluator's use of resources should reflect sound accountability
procedures and otherwise be prudent and ethically responsible, so that expenditures are
accounted for and appropriate.
The accuracy standards ensure that the evaluation findings are considered correct.
There are 12 accuracy standards:
Program Documentation: The program should be described and documented clearly and
accurately, so that what is being evaluated is clearly identified.
Context Analysis: The context in which the program exists should be thoroughly examined
so that likely influences on the program can be identified.
Described Purposes and Procedures: The purposes and procedures of the evaluation should
be monitored and described in enough detail that they can be identified and assessed.
Defensible Information Sources: The sources of information used in a program evaluation
should be described in enough detail that the adequacy of the information can be assessed.
Valid Information: The information gathering procedures should be chosen or developed
and then implemented in such a way that they will assure that the interpretation arrived at is
Reliable Information: The information gathering procedures should be chosen or developed
and then implemented so that they will assure that the information obtained is sufficiently
Systematic Information: The information from an evaluation should be systematically
reviewed and any errors found should be corrected.
Analysis of Quantitative Information: Quantitative information - data from observations or
surveys - in an evaluation should be appropriately and systematically analyzed so that
evaluation questions are effectively answered.
Analysis of Qualitative Information: Qualitative information - descriptive information from
interviews and other sources - in an evaluation should be appropriately and systematically
analyzed so that evaluation questions are effectively answered.
Justified Conclusions: The conclusions reached in an evaluation should be explicitly
justified, so that stakeholders can understand their worth.
Impartial Reporting: Reporting procedures should guard against the distortion caused by
personal feelings and biases of people involved in the evaluation, so that evaluation reports
fairly reflect the evaluation findings.
Metaevaluation: The evaluation itself should be evaluated against these and other pertinent
standards, so that it is appropriately guided and, on completion, stakeholders can closely
examine its strengths and weaknesses.
There is an ever-increasing agreement on the worth of evaluation; in fact, doing so is often
required by funders and other constituents. So, community health and development
professionals can no longer question whether or not to evaluate their programs. Instead, the
appropriate questions are:
What is the best way to evaluate?
What are we learning from the evaluation?
How will we use what we learn to become more effective?
The framework for program evaluation helps answer these questions by guiding users to
select evaluation strategies that are useful, feasible, proper, and accurate.
To use this framework requires quite a bit of skill in program evaluation. In most cases there
are multiple stakeholders to consider, the political context may be divisive, steps don't always
follow a logical order, and limited resources may make it difficult to take a preferred course
of action. An evaluator's challenge is to devise an optimal strategy, given the conditions she
is working under. An optimal strategy is one that accomplishes each step in the framework in
a way that takes into account the program context and is able to meet or exceed the relevant
This framework also makes it possible to respond to common concerns about program
evaluation. For instance, many evaluations are not undertaken because they are seen as being
too expensive. The cost of an evaluation, however, is relative; it depends upon the question
being asked and the level of certainty desired for the answer. A simple, low-cost evaluation
can deliver information valuable for understanding and improvement.
Rather than discounting evaluations as a time-consuming sideline, the framework encourages
evaluations that are timed strategically to provide necessary feedback. This makes it possible
to make evaluation closely linked with everyday practices.
Another concern centers on the perceived technical demands of designing and conducting an
evaluation. However, the practical approach endorsed by this framework focuses on
questions that can improve the program.
Finally, the prospect of evaluation troubles many staff members because they perceive
evaluation methods as punishing ("They just want to show what we're doing wrong."),
exclusionary ("Why aren't we part of it? We're the ones who know what's going on."), and
adversarial ("It's us against them.") The framework instead encourages an evaluation
approach that is designed to be helpful and engages all interested stakeholders in a process
that welcomes their participation.
What is program evaluation?
Program evaluation is the systematic assessment of the processes and/or outcomes of a program with the intent
of furthering its development and improvement. As such, it is a collaborative process in which evaluators work
closely with program staff to craft and implement an evaluation design that is responsive to the needs of the
program. For example, during program implementation, evaluators can provide formative evaluation findings so
that program staff can make immediate, data-based decisions about program implementation and delivery. In
addition, evaluators can, towards the end of a program or upon its completion, provide cumulative
and summative evaluation findings, often required by funding agencies and used to make decisions about
program continuation or expansion.
How is evaluation different than research?
Evaluators use many of the same qualitative and quantitative methodologies used by researchers in other fields.
Indeed, program evaluations are as rigorous and systematic in collecting data as traditional social research. That
being said, the primary purpose of evaluation is to provide timely and constructive information for decision-
making about particular programs, not to advance more wide-ranging knowledge or theory. Accordingly,
evaluation is typically more client-focused than traditional research, in that evaluators work closely with program
staff to create and carry-out an evaluation plan that attend to the particular needs of their program.
How is evaluation different than assessment?
The primary difference between evaluation and assessment lies in the focus of examination.
Whereas evaluation serves to facilitate a program's development, implementation, and improvement by
examining its processes and/or outcomes; the purpose of an assessment is to determine individuals or group's
performances by measuring their skill level on a variable of interest (e.g., reading comprehension, math or social
skills, to mention just a few). In line with this distinctionand quite common in evaluating educational programs
where the intended outcome is often some specified level of academic achievementassessment data may be
used in determining program impact and success.
How much does it cost?
The cost of an evaluation is entirely contingent upon the scope and nature of the evaluation activities and
measures requested. The National Science Foundation's rule of thumb about evaluation budgets is 10% of the
total grant amount. We at the OEA are committed to providing cost effective evaluation plans that are both
responsive to the evaluative needs of a given program and also suitable to its budget. As a result, we have
worked in the pastand aspire to work in the futurewith programs and projects representing a wide-range of
financial plans.
My proposal requires an evaluation section. Can you help me with that?
Many federal agencies (e.g., NSF, NIH) require that proposals include information about how the effectiveness of
the proposed program will be evaluated. This section usually contains a brief description of possible metrics for
program outcomes and a plan for both formative and summative evaluation of the program. The program
evaluators at OEA will provide text for the evaluation section of your proposal free of charge, assuming that you
plan to work with OEA if and when your grant is funded. Depending on time and available resources, our staff will
also provide feedback on the grant as a whole and guidance on the development of your goals and outcomes.
The best way to begin is to _contact us_ as early in the proposal process as possible.
My proposal is due very soon, can you still help me?
Although we prefer being contacted well in advance of proposal deadlines, we also understand that project
timelines and planning processes may not always be ideal. For that reason, we will do our best to work with you
and your program even if the time period is limited. For all clients, we respectfully request that you contact us
before including our evaluation services and practices in grant proposals, even if this contact is initiated
immediately preceding a fast approaching deadline.

Definition of Program Evaluation
Evaluation is the systematic application of scientific methods to assess the design,
implementation, improvement or outcomes of a program (Rossi & Freeman, 1993;
Short, Hennessy, & Campbell, 1996). The term "program" may include any
organized action such as media campaigns, service provision, educational services,
public policies, research projects, etc.
Center for Disease Control and Prevention
[CDC], 1999).
Purposes for Program Evaluation
Demonstrate program effectiveness to funders
Improve the implementation and effectiveness of programs
Better manage limited resources
Document program accomplishments
Justify current program funding
Support the need for increased levels of funding
Satisfy ethical responsibility to clients to demonstrate positive and negative
effects of program participation (Short, Hennessy, & Campbell, 1996).
Document program development and activities to help ensure successful
Program evaluations require funding, time and technical skills: requirements that
are often perceived as diverting limited program resources from clients. Program
staff are often concerned that evaluation activities will inhibit timely accessibility
to services or compromise the safety of clients. Evaluation can necessitate alliances
between historically separate community groups (e.g. academia, advocacy groups,
service providers; Short, Hennessy, & Campbell, 1996).

Mutual misperceptions
regarding the goals and process of evaluation can result in adverse attitudes (CDC,
1999; Chalk & King, 1998).
Overcoming Barriers
Collaboration is the key to successful program evaluation. In evaluation
terminology, stakeholders are defined as entities or individuals that are affected by
the program and its evaluation (Rossi & Freeman, 1993; C
DC, 1999).
Involvement of
these stakeholders is an integral part of program evaluation. Stakeholders include
but are not limited to program staff, program clients, decision makers, and
evaluators. A participatory approach to evaluation based on respect for one
another's roles and equal partnership in the process overcomes barriers to a
mutually beneficial evaluation (Burt, Harrell, Newmark, Aron, & Jacobs, 1997;
Chalk & King, 1998). Identifying an evaluator with the necessary technical skills
as well as a collaborative approach to the process is integral. Programs have
several options for identifying an evaluator. Health departments, other state
agencies, local universities, evaluation associations and other programs can
provide recommendations. Additionally, several companies and university
departments providing these services can be located on the internet. Selecting an
evaluator entails finding an individual who has an understanding of the program
and funding requirements for evaluations, demonstrated experience, and
knowledge of the issue that the program is targeting (CDC, 1992).
Types of Evaluation
Various types of evaluation can be used to assess different aspects or stages of
program development. As terminology and definitions of evaluation types are not
uniform, an effort has been made to briefly introduce a number of types here.
Context Evaluation Investigating how the program operates or will operate in a
particular social, political, physical and economic environment. This type of
evaluation could include a community needs or organizational assessment
( Sample question: What
are the environmental barriers to accessing program services? Formative
Evaluation Assessing needs that a new program should fulfill (Short, Hennessy, &
Campbell, 1996), examining the early stages of a program's development (Rossi &
Freeman, 1993), or testing a program on a small scale before broad dissemination
(Coyle, Boruch, & Turner, 1991). Sample question: Who is the intended audience
for the program? Process Evaluation Examining the implementation and
operation of program components. Sample question: Was the program
administered as planned? Impact Evaluation Investigating the magnitude of both
positive and negative changes produced by a program (Rossi & Freeman, 1993).
Some evaluators limit these changes to those occurring immediately (Green &
Kreuter, 1991). Sample question: Did participant knowledge change after
attending the program? Outcome Evaluation Assessing the short and long-term
results of a program. Sample question: What are the long-term positive effects of
program participation? Performance or Program Monitoring
Similar to process evaluation, differing only by providing regular updates of
evaluation results to stakeholders rather than summarizing results at the
evaluation's conclusion (Rossi & Freeman, 1993; Burt, Harrell, Newmark, Aron, &
Jacobs, 1997).
Evaluation Standards and Designs
Evaluation should be incorporated during the initial stages of program
development. An initial step of the evaluation process is to describe the program in
detail. This collaborative activity can create a mutual understanding of the
program, the evaluation process, and program and evaluation terminology.
Developing a program description also helps ensure that program activities and
objectives are clearly defined and that the objectives can be measured. In general,
the evaluation should be feasible, useful, culturally competent, ethical and accurate
(CDC, 1999). Data should be collected over time using multiple instruments that
are valid, meaning they measure what they are supposed to measure, and reliable,
meaning they produce similar results consistently (Rossi & Freeman, 1993). The
use of qualitative as well as quantitative data can provide a more comprehensive
picture of the program. Evaluations of programs aimed at violence prevention
should also be particularly sensitive to issues of safety and confidentiality.
Experimental designs are defined by the random assignment of individuals to a
group participating in the program or to a control group not receiving the program.
These ideal experimental conditions are not always practical or ethical in "real
world" constraints of program delivery. A possible solution to blending the need
for a comparison group with feasibility is the quasi-experimental design in which
an equivalent group (i.e. individuals receiving standard services) is compared to
the group participating in the target program. However, the use of this design may
introduce difficulties in attributing the causation of effects to the target program.
While non-experimental designs may be easiest to implement in a program setting
and provide a large quantity of data, drawing conclusions of program effects are
Logic Models
Logic models are flowcharts that depict program components. These models can
include any number of program elements, showing the development of a program
from theory to activities and outcomes. Infrastructure, inputs, processes, and
outputs are often included. The process of developing logic models can serve to
clarify program elements and expectations for the stakeholders. By depicting the
sequence and logic of inputs, processes and outputs, logic models can help ensure
that the necessary data are collected to make credible statements of causality
(CDC, 1999).
Communicating Evaluation Findings
Preparation, effective communication and timeliness in order to ensure the utility
of evaluation findings. Questions that should be answered at the evaluation's
inception include: what will be communicated? to whom? by whom? and how?
The target audience must be identified and the report written to address their needs
including the use of non-technical language and a user-friendly format (National
Committee for Injury Prevention and Control, 1989). Policy makers, current and
potential funders, the media, current and potential clients, and members of the
community at large should be considered as possible audiences. Evaluation reports
describe the process as well as findings based on the data

Summative Evaluation
Summative evaluation looks at the impact of an intervention on the target group.
This type of evaluation is arguably what is considered most often as 'evaluation' by
project staff and funding bodies- that is, finding out what the project achieved.
Summative evaluation can take place during the project implementation, but is
most often undertaken at the end of a project. As such, summative evaluation can
also be referred to as ex-post evaluation (meaning after the event).
Summative evaluation is often associated with more objective, quantitative
methods of data collection. Summative evaluation is linked to the evaluation
drivers of accountability. It is recommended to use a balance of both quantitative
and qualitative methods in order to get a better understanding of what your project
has achieved, and how or why this has occurred. Using qualitative methods of data
collection can also provide a good insight into unintended consequences and
lessons for improvement.
Summative evaluation is outcome-focused more than process focussed. It is
important to distinguish outcome from output. Summative evaluation is not about
stating that three workshops were held, with a total of fifty people attending
(outputs), but rather the result of these workshops, such as increased knowledge or
increased uptake of rainwater tanks (outcomes).
Why undertake a summative evalation?
Here are some key reasons why you should undertake a summative evaluation:
Summative evaluation provides a means to find out whether your project
has reached its goals/objectives/outcomes.
Summative evaluation allows you to quantify the changes in resource use
attributable to your project so that you can track how you are the impact of
your project.
Summative evaluation allows you to compare the impact of different
projects and make results-based decisions on future spending allocations
(taking into account unintended consequences ).
Summative evaluation allows you to develop a better understanding of the
process of change, and finding out what works, what doesnt, and why. This
allows you to gather the knowledge to learn and improve future project
designs and implementation.

Categories of summative evaluation

Outcome Evaluation
When Project implementation and post-project>
To assess whether the project has met its goals, whether there were any unintended
consequences, what were the learnings, and how to improve
Quantitative Qualitative
Meter reading
Audits or counts
Deemed Savings
Footprint Calculators
Focus Group
Storytelling / Most Significant Change
Outcome Hierarchy
Some types of summative evaluation require the collection of baseline data in order
to provide a before and after intervention figures. As such, it is important to factor
this into the evaluation design.
It is considered good evaluation practice to include both formative and summative

Types of evaluation
Formative evaluation
Formative evaluation is about gathering information in order to plan, refine and improve a programme. It most
often takes place when a service or programme is being set up and can continue throughout the life of the
project. Its intent is to assess ongoing project activities and provide information to monitor and improve the
project. It is done at several points in the developmental life of a project and its activities. It is common for the
evaluator to participate in decision-making during this phase with an emphasis on quick feedback to support
necessary programme changes during early stages.
Types of questions answered in a formative evaluation includes:
What is the need for the programme (ie, needs assessment)?
What approach should the programme take?
What record keeping systems are needed to enable the programme to be evaluated?
Process evaluation
Process evaluation is about documenting the development and process of the programme, in order to assess
strengths and weaknesses and determine why outcomes occur.
Process evaluations describe and assess program materials and activities. Examination of materials is likely to
occur while programs are being developed, as a check on the appropriateness of the approach and procedures
that will be used in the programme.
Types of questions answered in a process evaluation include:
What happens in the programme?
To what extent is it being implemented as planned? What changes are being made?
What immediate improvements need to be made?
What is the participants experience of the programme?
What is working well? What needs to be changed?
Are the resources for the programme adequate?
Is the programme reaching the intended audiences?
Impact evaluation and outcome evaluation
Impact evaluations look beyond the immediate results of policies, instruction, or services to identify longer-term
as well as unintended program effects. It may also examine what happens when several programmes operate in
Outcome evaluations study the immediate or direct effects of the program on participants. The scope of an
outcome evaluation can extend beyond knowledge or attitudes, however, to examine the immediate behavioural
effects of programmes.
Types of questions answered in an impact and/or outcome evaluation include:
How effective is the programme?
What has changed as a result of the programme?
Which audiences benefit from the programme? Which audiences do not?
Are there unintended outcomes of the programme? How significant are they?
National Science Foundation. (2002). User-Friendly Handbook for Project Evaluation. National Science
Waa, A., Holibar, F., and Spinola, C. (1998). Planning and doing programme evaluation: An introductory guide
for health promotion. Alcohol and Public Health Research Unit: Whariki Runanga, Wananga, Hauora me te
Paekaka, University of Auckland, New Zealand.
Selecting evaluation methods
The overall goal in selecting evaluation method(s) is to get the most useful information in the most cost effective
and realistic fashion. Consider the following questions when selecting your evaluation methods:
1. What information is needed?
2. Where is the information?
3. Who has the information?
4. What is the best way to get the information?
1. Will the methods get all of the needed information?
2. Will the audience find these methods non-intrusive and culturally appropriate?
3. How can the information be analysed?
4. PLUS resources how much money, time, people?

Types of Evaluation

When you come to evaluate your project, you will need to focus on two aspects of your project. You
will need to look at firstly the activities, and secondly the effect your project has had. In evaluation
language this is known as process and impact/outcome evaluation.
Apart from process and impact evaluation, it is also useful to also consider summative evaluation.
Process Evaluation
This involves judging the activities (or strategies) of your project. This often involves looking at what
has been done, who has been reached, and the quality of the activities. It involves seeking answers to
questions such as :
Has the project reached the appropriate people?
Are all the projects activities going to plan? If not, why not?
Were any changes made to the intended activities? If so, why?
Are materials, information, presentations of good quality?
Are the participants and other key people satisfied?
Impact/Outcome Evaluation
This involves judging the extent to which your project has had an effect on the changes you were
seeking. In other words, the extent to which your project has met its goal and objectives. Impact
evaluation judges how well the objectives were achieved and outcome evaluation involves judging
how well the goal has been achieved. It involves seeking answers to questions such as :
What progress has been made toward achieving the goal?
To what extent has the project met its objectives?
How effective has the project been at producing changes?
Are there any factors outside of the project that have contributed to (or prevented) the desired
Has the project resulted in any unintended change?
Summative Evaluation
This is done at the end of the project and involves considering the project as a whole, from beginning
to 'end'. It is meant to summarise and inform decisons about whether to continue the project (or parts
of it), whether it is valuable to expand into other settings. It involves seeking answers to questions
such as:
what were the main benefits and disappointments?
what things helped and hindered the project?
in retrospect, what could have strengthened it?
what would you advise others embarking on something similar?
what aspects will be sustained and how?
is it worth continuing in its current form? Why/why not?
what recommendations have emerged about where to from here?

Program evaluation
Program evaluation is a systematic method for collecting, analyzing, and using
information to answer questions about projects, policies
and programs,
particularly about their effectiveness and efficiency. In both the
public and private sectors, stakeholders often want to know whether the programs
they are funding, implementing, voting for, receiving or objecting to are producing
the intended effect. Whileprogram evaluation first focuses around this definition,
important considerations often include how much the program costs per
participant, how the program could be improved, whether the program is
worthwhile, whether there are better alternatives, if there are unintended outcomes,
and whether the program goals are appropriate and useful.
Evaluators help to
answer these questions, but the best way to answer the questions is for the
evaluation to be a joint project between evaluators and stakeholders.

The process of evaluation is considered to be a relatively recent phenomenon.
However, planned social evaluation has been documented as dating as far back as
2200 BC.
Evaluation became particularly relevant in the U.S. in the 1960s during
the period of the Great Society social programs associated with
the Kennedy and Johnson administrations.
Extraordinary sums were invested in
social programs, but the impacts of these investments were largely unknown.
Program evaluations can involve both quantitative and qualitative
methods of social research. People who do program evaluation come from many
different backgrounds, such as sociology, psychology,economics, and social work.
Some graduate schools also have specific training programs for program
Doing an evaluation[edit]
Program evaluation may be conducted at several stages during a program's
lifetime. Each of these stages raises different questions to be answered by the
evaluator, and correspondingly different evaluation approaches are needed. Rossi,
Lipsey and Freeman (2004) suggest the following kinds of assessment, which may
be appropriate at these different stages:
Assessment of the need for the program
Assessment of program design and logic/theory
Assessment of how the program is being implemented (i.e., is it being
implemented according to plan? Are the program's processes maximizing
possible outcomes?)
Assessment of the program's outcome or impact (i.e., what it has actually
Assessment of the program's cost and efficiency
Assessing needs[edit]
A needs assessment examines the population that the program intends to target, to
see whether the need as conceptualized in the program actually exists in the
population; whether it is, in fact, a problem; and if so, how it might best be dealt
with. This includes identifying and diagnosing the actual problem the program is
trying to address, who or what is affected by the problem, how widespread the
problem is, and what are the measurable effects that are caused by the problem. For
example, for a housing program aimed at mitigating homelessness, a program
evaluator may want to find out how many people are homeless in a given
geographic area and what their demographics are. Rossi, Lipsey and Freeman
(2004) caution against undertaking an intervention without properly assessing the
need for one, because this might result in a great deal of wasted funds if the need
did not exist or was misconceived.
Needs assessment involves the processes or methods used by evaluators to describe
and diagnose social needs
This is essential for evaluators because they need to
identify whether programs are effective and they cannot do this unless they have
identified what the problem/need is. Programs that do not do a needs assessment
can have the illusion that they have eradicated the problem/need when in fact there
was no need in the first place. Needs assessment involves research and regular
consultation with community stakeholders and with the people that will benefit
from the project before the program can be developed and implemented. Hence it
should be a bottom-up approach. In this way potential problems can be realized
early because the process would have involved the community in identifying the
need and thereby allowed the opportunity to identify potential barriers.
The important task of a program evaluator is thus to: First, construct a precise
definition of what the problem is.
Evaluators need to first identify the
problem/need. This is most effectively done by collaboratively including all
possible stakeholders, i.e., the community impacted by the potential problem, the
agents/actors working to address and resolve the problem, funders, etc. Including
buy-in early on in the process reduces potential for push-back, miscommunication,
and incomplete information later on.
Second, assess the extent of the problem.
Having clearly identified what the
problem is, evaluators need to then assess the extent of the problem. They need to
answer the where and how big questions. Evaluators need to work out where
the problem is located and how big it is. Pointing out that a problem exists is much
easier than having to specify where it is located and how rife it is. Rossi, Lipsey &
Freeman (2004) gave an example that: a person identifying some battered children
may be enough evidence to persuade one that child abuse exists. But indicating
how many children it affects and where it is located geographically and socially
would require knowledge about abused children, the characteristics of perpetrators
and the impact of the problem throughout the political authority in question.
This can be difficult considering that child abuse is not a public behavior, also
keeping in mind that estimates of the rates on private behavior are usually not
possible because of factors like unreported cases. In this case evaluators would
have to use data from several sources and apply different approaches in order to
estimate incidence rates. There are two more questions that need to be
Evaluators need to also answer the how and what questions
how question requires that evaluators determine how the need will be addressed.
Having identified the need and having familiarized oneself with the community
evaluators should conduct a performance analysis to identify whether the proposed
plan in the program will actually be able to eliminate the need. The what question
requires that evaluators conduct a task analysis to find out what the best way to
perform would be. For example whether the job performance standards are set by
an organization or whether some governmental rules need to be considered when
undertaking the task.

Third, define and identify the target of interventions and accurately describe the
nature of the service needs of that population
It is important to know what/who
the target population is/are it might be individuals, groups, communities, etc.
There are three units of the population: population at risk, population in need and
population in demand

Population at risk: are people with a significant probability of developing
the risk e.g. the population at risk for birth control programs are women of
child bearing age.
Population in need: are people with the condition that the program seeks
to address; e.g. the population in need for a program that aims to provide
ARVs to HIV positive people are people that are HIV positive.
Population in demand: that part of the population in need that agrees to be
having the need and are willing to take part in what the program has to
offer e.g. not all HIV positive people will be willing to take ARVs.
Being able to specify what/who the target is will assist in establishing appropriate
boundaries, so that interventions can correctly address the target population and be
feasible to apply<

There are four steps in conducting a needs assessment:

1. Perform a gap analyses
Evaluators need to compare current situation to the desired or necessary
situation. The difference or the gap between the two situations will help
identify the need, purpose and aims of the program.
2. Identify priorities and importance
In the first step above, evaluators would have identified a number of
interventions that could potentially address the need e.g. training and
development, organization development etc. These must now be examined
in view of their significance to the programs goals and constraints. This
must be done by considering the following factors: cost effectiveness
(consider the budget of the program, assess cost/benefit ratio), executive
pressure (whether top management expects a solution) and population
(whether many key people are involved).
3. Identify causes of performance problems and/or opportunities
When the needs have been prioritized the next step is to identify specific
problem areas within the need to be addressed. And to also assess the
skills of the people that will be carrying out the interventions.
4. Identify possible solutions and growth opportunities
Compare the consequences of the interventions if it was to be
implemented or not.
Needs analysis is hence a very crucial step in evaluating programs because the
effectiveness of a program cannot be assessed unless we know what the problem
was in the first place.
Assessing program theory[edit]
The program theory, also called a logic model or impact pathway,
is an
assumption, implicit in the way the program is designed, about how the program's
actions are supposed to achieve the outcomes it intends. This 'logic model' is often
not stated explicitly by people who run programs, it is simply assumed, and so an
evaluator will need to draw out from the program staff how exactly the program is
supposed to achieve its aims and assess whether this logic is plausible. For
example, in an HIV prevention program, it may be assumed that educating people
about HIV/AIDS transmission, risk and safe sex practices will result in safer sex
being practiced. However, research in South Africa increasingly shows that in spite
of increased education and knowledge, people still often do not practice safe
Therefore, the logic of a program which relies on education as a means to
get people to use condoms may be faulty. This is why it is important to read
research that has been done in the area. Explicating this logic can also reveal
unintended or unforeseen consequences of a program, both positive and negative.
The program theory drives the hypotheses to test for impact evaluation.
Developing a logic model can also build common understanding amongst program
staff and stakeholders about what the program is actually supposed to do and how
it is supposed to do it, which is often lacking (see Participatory impact pathways
Rossi, Lipsey & Freeman (2004) suggest four approaches and procedures that can
be used to assess the program theory.
These approaches are discussed below.
Assessment in relation to social needs

This entails assessing the program theory by relating it to the needs of the target
population the program is intended to serve. If the program theory fails to address
the needs of the target population it will be rendered ineffective even when if it is
well implemented.

Assessment of logic and plausibility

This form of assessment involves asking a panel of expert reviewers to critically
review the logic and plausibility of the assumptions and expectations inherent in
the program's design.
The review process is unstructured and open ended so as to
address certain issues on the program design. Rutman (1980), Smith (1989), and
Wholey (1994) suggested the questions listed below to assist with the review

Are the program goals and objectives well defined?
Are the program goals and objectives feasible?
Is the change process presumed in the program theory feasible?
Are the procedures for identifying members of the target population,
delivering service to them, and sustaining that service through completion
well defined and suffiient?
Are the constituent components, activities, and functions of the program
well defined and sufficient?
Are the resources allocated to the program and its various activities
Assessment through comparison with research and practice

This form of assessment requires gaining information from research literature and
existing practices to assess various components of the program theory. The
evaluator can assess whether the program theory is congruent with research
evidence and practical experiences of programs with similar concepts.

Assessment via preliminary observation

This approach involves incorporating firsthand observations into the assessment
process as it provides a reality check on the concordance between the program
theory and the program itself.
The observations can focus on the attainability of
the outcomes, circumstances of the target population, and the plausibility of the
program activities and the supporting resources.

These different forms of assessment of program theory can be conducted to ensure
that the program theory is sound.
Assessing implementation[edit]
Process analysis looks beyond the theory of what the program is supposed to do
and instead evaluates how the program is being implemented. This evaluation
determines whether the components identified as critical to the success of the
program are being implemented. The evaluation determines whether target
populations are being reached, people are receiving the intended services, staff are
adequately qualified. Process evaluation is an ongoing process in which repeated
measures may be used to evaluate whether the program is being implemented
Assessing the impact (effectiveness)[edit]
The impact evaluation determines the causal effects of the program. This involves
trying to measure if the program has achieved its intended outcomes, i.e. program
Program Outcomes[edit]
An outcome is the state of the target population or the social conditions that a
program is expected to have changed.
Program outcomes are the observed
characteristics of the target population or social conditions, not of the program.
Thus the concept of an outcome does not necessarily mean that the program targets
have actually changed or that the program has caused them to change in any way.

There are two kinds of outcomes, namely outcome level and outcome change, also
associated with program effect.

Outcome level refers to the status of an outcome at some point in time.
Outcome change refers to the difference between outcome levels at
different points in time.
Program effect refers to that portion of an outcome change that can be
attributed uniquely to a program as opposed to the influence of some
other factor.
Measuring Program Outcomes[edit]
Outcome measurement is a matter of representing the circumstances defined as the
outcome by means of observable indicators that vary systematically with changes
or differences in those circumstances.
Outcome measurement is a systematic way
to assess the extent to which a program has achieved its intended
According to Mouton (2009) measuring the impact of a program
means demonstrating or estimating the accumulated differentiated proximate and
emergent effect, some of which might be unintended and therefore unforeseen.

Outcome measurement serves to help you understand whether the program is
effective or not. It further helps you to clarify your understanding of your program.
But the most important reason for undertaking the effort is to understand the
impacts of your work on the people you serve.
With the information you collect,
you can determine which activities to continue and build upon, and which you need
to change in order to improve the effectiveness of the program.
This can involve using sophisticated statistical techniques in order to measure the
effect of the program and to find causal relationship between the program and the
various outcomes. More information about impact evaluation is found under the
heading 'Determining Causation'.
Assessing efficiency[edit]
Finally, cost-benefit or cost-effectiveness analysis assesses the efficiency of a
program. Evaluators outline the benefits and cost of the program for comparison.
An efficient program has a lower cost-benefit ratio.
Determining causation[edit]
Perhaps the most difficult part of evaluation is determining whether the program
itself is causing the changes that are observed in the population it was aimed at.
Events or processes outside of the program may be the real cause of the observed
outcome (or the real prevention of the anticipated outcome).
Causation is difficult to determine. One main reason for this is self
selection bias.
People select themselves to participate in a program. For
example, in a job training program, some people decide to participate and others do
not. Those who do participate may differ from those who do not in important ways.
They may be more determined to find a job or have better support resources. These
characteristics may actually be causing the observed outcome of increased
employment, not the job training program.
Evaluations conducted with random assignment are able to make stronger
inferences about causation. Randomly assigning people to participate or to not
participate in the program, reduces or eliminates self-selection bias. Thus, the
group of people who participate would likely be more comparable to the group
who did not participate.
However, since most programs cannot use random assignment, causation cannot be
determined. Impact analysis can still provide useful information. For example, the
outcomes of the program can be described. Thus the evaluation can describe that
people who participated in the program were more likely to experience a given
outcome than people who did not participate.
If the program is fairly large, and there are enough data, statistical analysis can be
used to make a reasonable case for the program by showing, for example, that
other causes are unlikely.
Reliability, validity and sensitivity in program evaluation[edit]
It is important to ensure that the instruments (for example, tests, questionnaires,
etc.) used in program evaluation are as reliable, valid and sensitive as possible.
According to Rossi et al. (2004, p. 222),
'a measure that is poorly chosen or
poorly conceived can completely undermine the worth of an impact assessment by
producing misleading estimates. Only if outcome measures are valid, reliable and
appropriately sensitive can impact assessments be regarded as credible'.
The reliability of a measurement instrument is the 'extent to which the measure
produces the same results when used repeatedly to measure the same thing' (Rossi
et al., 2004, p. 218).
The more reliable a measure is, the greater its statistical
power and the more credible its findings. If a measuring instrument is unreliable, it
may dilute and obscure the real effects of a program, and the program will 'appear
to be less effective than it actually is' (Rossi et al., 2004, p. 219).
Hence, it is
important to ensure the evaluation is as reliable as possible.
The validity of a measurement instrument is 'the extent to which it measures what
it is intended to measure' (Rossi et al., 2004, p. 219).
This concept can be
difficult to accurately measure: in general use in evaluations, an instrument may be
deemed valid if accepted as valid by the stakeholders (stakeholders may include,
for example, funders, program administrators, et cetera).
The principal purpose of the evaluation process is to measure whether the program
has an effect on the social problem it seeks to redress; hence, the measurement
instrument must be sensitive enough to discern these potential changes (Rossi et
al., 2004).
A measurement instrument may be insensitive if it contains items
measuring outcomes which the program couldn't possibly effect, or if the
instrument was originally developed for applications to individuals (for example
standardized psychological measures) rather than to a group setting (Rossi et al.,
These factors may result in 'noise' which may obscure any effect the
program may have had.
Only measures which adequately achieve the benchmarks of reliability, validity
and sensitivity can be said to be credible evaluations. It is the duty of evaluators to
produce credible evaluations, as their findings may have far reaching effects. A
discreditable evaluation which is unable to show that a program is achieving its
purpose when it is in fact creating positive change may cause the program to lose
its funding undeservedly.
[improper synthesis?]

Steps to Program Evaluation Framework
According to the Center for Decease Control (CDC) there are six steps to a
complete program evaluation. The steps described are: engage stakeholder,
describe the program, focus the evaluation design, gather credible evidence, justify
conclusions, and ensure use and share lessons learned.
These steps can happen
in a cycle framework to represent the continuing process of evaluation.
Methodological constraints and challenges[edit]
The shoestring approach[edit]
The shoestring evaluation approach is designed to assist evaluators operating
under limited budget, limited access or availability of data and limited turnaround
time, to conduct effective evaluations that are methodologically
rigorous(Bamberger, Rugh, Church & Fort, 2004).
This approach has responded
to the continued greater need for evaluation processes that are more rapid and
economical under difficult circumstances of budget, time constraints and limited
availability of data. However, it is not always possible to design an evaluation to
achieve the highest standards available. Many programs do not build an evaluation
procedure into their design or budget. Hence, many evaluation processes do not
begin until the program is already underway, which can result in time, budget or
data constraints for the evaluators, which in turn can affect the reliability, validity
or sensitivity of the evaluation. > The shoestring approach helps to ensure that the
maximum possible methodological rigor is achieved under these constraints.
Budget constraints[edit]
Frequently, programs are faced with budget constraints because most original
projects do not include a budget to conduct an evaluation (Bamberger et al., 2004).
Therefore, this automatically results in evaluations being allocated smaller budgets
that are inadequate for a rigorous evaluation. Due to the budget constraints it might
be difficult to effectively apply the most appropriate methodological instruments.
These constraints may consequently affect the time available in which to do the
evaluation (Bamberger et al., 2004).
Budget constraints may be addressed by
simplifying the evaluation design, revising the sample size, exploring economical
data collection methods (such as using volunteers to collect data, shortening
surveys, or using focus groups and key informants) or looking for reliable
secondary data (Bamberger et al., 2004).

Time constraints[edit]
The most time constraint that can be faced by an evaluator is when the evaluator is
summoned to conduct an evaluation when a project is already underway if they are
given limited time to do the evaluation compared to the life of the study, or if they
are not given enough time for adequate planning. Time constraints are particularly
problematic when the evaluator is not familiar with the area or country in which
the program is situated (Bamberger et al., 2004).
Time constraints can be
addressed by the methods listed under budget constraints as above, and also by
careful planning to ensure effective data collection and analysis within the limited
time space.
Data constraints[edit]
If the evaluation is initiated late in the program, there may be no baseline data on
the conditions of the target group before the intervention began (Bamberger et al.,
Another possible cause of data constraints is if the data have been
collected by program staff and contain systematic reporting biases or poor record
keeping standards and is subsequently of little use (Bamberger et al.,
Another source of data constraints may result if the target group are
difficult to reach to collect data from - for example homeless people, drug addicts,
migrant workers, et cetera (Bamberger et al., 2004).
Data constraints can be
addressed by reconstructing baseline data from secondary data or through the use
of multiple methods. Multiple methods, such as the combination of qualitative and
quantitative data can increase validity through triangulation and save time and
money. Additionally, these constraints may be dealt with through careful planning
and consultation with program stakeholders. By clearly identifying and
understanding client needs ahead of the evaluation, costs and time of the evaluative
process can be streamlined and reduced, while still maintaining credibility.
All in all, time, monetary and data constraints can have negative implications on
the validity, reliability and transferability of the evaluation. The shoestring
approach has been created to assist evaluators to correct the limitations identified
above by identifying ways to reduce costs and time, reconstruct baseline data and
to ensure maximum quality under existing constraints (Bamberger et al., 2004).
Five-tiered approach[edit]
The five-tiered approach to evaluation further develops the strategies that the
shoestring approach to evaluation is based upon.
It was originally developed by
Jacobs (1988) as an alternative way to evaluate community-based programs and as
such was applied to a state wide child and family program in Massachusetts,
The five-tiered approach is offered as a conceptual framework for
matching evaluations more precisely to the characteristics of the programs
themselves, and to the particular resources and constraints inherent in each
evaluation context.
In other words, the five-tiered approach seeks to tailor the
evaluation to the specific needs of each evaluation context.
The earlier tiers (1-3) generate descriptive and process-oriented information while
the later tiers (4-5) determine both the short-term and the long-term effects of the
The five levels are organized as follows:
Tier 1: needs assessment (sometimes referred to as pre-implementation)

Tier 2: monitoring and accountability
Tier 3: quality review and program clarification (sometimes referred to as
understanding and refining)

Tier 4: achieving outcomes
Tier 5: establishing impact
For each tier, purpose(s) are identified, along with corresponding tasks that enable
the identified purpose of the tier to be achieved.
For example, the purpose of the
first tier, Needs assessment, would be to document a need for a program in a
community. The task for that tier would be to assess the community's needs and
assets by working with all relevant stakeholders.

While the tiers are structured for consecutive use, meaning that information
gathered in the earlier tiers is required for tasks on higher tiers, it acknowledges the
fluid nature of evaluation.
Therefore, it is possible to move from later tiers back
to preceding ones, or even to work in two tiers at the same time.
It is important
for program evaluators to note, however, that a program must be evaluated at the
appropriate level.

The five-tiered approach is said to be useful for family support programs which
emphasise community and participant empowerment. This is because it encourages
a participatory approach involving all stakeholders and it is through this process of
reflection that empowerment is achieved.

Methodological challenges presented by language and culture[edit]
The purpose of this section is to draw attention to some of the methodological
challenges and dilemmas evaluators are potentially faced with when conducting a
program evaluation in a developing country. In many developing countries the
major sponsors of evaluation are donor agencies from the developed world, and
these agencies require regular evaluation reports in order to maintain accountability
and control of resources, as well as generate evidence for the programs success or
However, there are many hurdles and challenges which evaluators face
when attempting to implement an evaluation program which attempts to make use
of techniques and systems which are not developed within the context to which
they are applied.
Some of the issues include differences in culture, attitudes,
language and political process.

Culture is defined by Ebbutt (1998, p. 416) as a constellation of both written and
unwritten expectations, values, norms, rules, laws, artifacts, rituals and behaviors
that permeate a society and influence how people behave socially.
Culture can
influence many facets of the evaluation process, including data collection,
evaluation program implementation and the analysis and understanding of the
results of the evaluation.
In particular, instruments which are traditionally used
to collect data such as questionnaires and semi-structured interviews need to be
sensitive to differences in culture, if they were originally developed in a different
cultural context.
The understanding and meaning of constructs which the
evaluator is attempting to measure may not be shared between the evaluator and
the sample population and thus the transference of concepts is an important notion,
as this will influence the quality of the data collection carried out by evaluators as
well as the analysis and results generated by the data.

Language also plays an important part in the evaluation process, as language is tied
closely to culture.
Language can be a major barrier to communicating concepts
which the evaluator is trying to access, and translation is often required.
are a multitude of problems with translation, including the loss of meaning as well
as the exaggeration or enhancement of meaning by translators.
For example,
terms which are contextually specific may not translate into another language with
the same weight or meaning. In particular, data collection instruments need to take
meaning into account as the subject matter may not be considered sensitive in a
particular context might prove to be sensitive in the context in which the evaluation
is taking place.
Thus, evaluators need to take into account two important
concepts when administering data collection tools: lexical equivalence and
conceptual equivalence.
Lexical equivalence asks the question: how does one
phrase a question in two languages using the same words? This is a difficult task to
accomplish, and uses of techniques such as back-translation may aid the evaluator
but may not result in perfect transference of meaning.
This leads to the next
point, conceptual equivalence. It is not a common occurrence for concepts to
transfer unambiguously from one culture to another.
Data collection instruments
which have not undergone adequate testing and piloting may therefore render
results which are not useful as the concepts which are measured by the instrument
may have taken on a different meaning and thus rendered the instrument unreliable
and invalid.

Thus, it can be seen that evaluators need to take into account the methodological
challenges created by differences in culture and language when attempting to
conduct a program evaluation in a developing country.
Utilization results[edit]
There are three conventional uses of evaluation results: persuasive
utilization, direct (instrumental) utilization, and conceptual utilization.
Persuasive utilization[edit]
Persuasive utilization is the enlistment of evaluation results in an effort to persuade
an audience to either support an agenda or to oppose it. Unless the 'persuader' is the
same person that ran the evaluation, this form of utilization is not of much interest
to evaluators as they often cannot foresee possible future efforts of persuasion.

Direct (instrumental) utilization[edit]
Evaluators often tailor their evaluations to produce results that can have a direct
influence in the improvement of the structure, or on the process, of a program. For
example, the evaluation of a novel educational intervention may produce results
that indicate no improvement in students' marks. This may be due to the
intervention not having a sound theoretical background, or it may be that the
intervention is not conducted as originally intended. The results of the evaluation
would hopefully cause to the creators of the intervention to go back to the drawing
board to re-create the core structure of the intervention, or even change the
implementation processes.

Conceptual utilization[edit]
But even if evaluation results do not have a direct influence in the re-shaping of a
program, they may still be used to make people aware of the issues the program is
trying to address. Going back to the example of an evaluation of a novel
educational intervention, the results can also be used to inform educators and
students about the different barriers that may influence students' learning
difficulties. A number of studies on these barriers may then be initiated by this new

Variables affecting utilization[edit]
There are five conditions that seem to affect the utility of evaluation results,
namely relevance, communication between the evaluators and the users of the
results, information processing by the users,the plausibility of the results, as well
as the level of involvement or advocacy of the users.

Guidelines for maximizing utilization[edit]
Quoted directly from Rossi et al. (2004, p. 416).:

Evaluators must understand the cognitive styles of decisionmakers
Evaluation results must be timely and available when needed
Evaluations must respect stakeholders' program commitments
Utilization and dissemination plans should be part of the evaluation design
Evaluations should include an assessment of utilization
Internal versus external program evaluators[edit]
The choice of the evaluator chosen to evaluate the program may be regarded as
equally important as the process of the evaluation. Evaluators may be internal
(persons associated with the program to be executed) or external (Persons not
associated with any part of the execution/implementation of the program).
(Division for oversight services,2004). The following provides a brief summary of
the advantages and disadvantages of internal and external evaluators adapted from
the Division of oversight services (2004), for a more comprehensive list of
advantages and disadvantages of internal and external evaluators, see (Division of
oversight services, 2004).
Internal evaluators[edit]
May have better overall knowledge of the program and possess informal
knowledge of the program
Less threatening as already familiar with staff
Less costly
May be less objective
May be more preocuppied with other activities of the program and not give
the evaluation complete attention
May not be adequately trained as an evaluator.
External evaluators[edit]
More objective of the process, offers new perspectives, different angles to
observe and critique the process
May be able to dedicate greater amount of time and attention to the
May have greater expertise and evaluation brain
May be more costly and require more time for the contract, monitoring,
negotiations etc.
May be unfamiliar with program staff and create anxiety about being
May be unfamiliar with organization policies, certain constraints affecting
the program.
Three paradigms[edit]
Potter (2006)
identifies and describes three broad paradigms within program
evaluation . The first, and probably most common, is the positivist approach, in
which evaluation can only occur where there are objective, observable and
measurable aspects of a program, requiring predominantly quantitative evidence.
The positivist approach includes evaluation dimensions such as needs assessment,
assessment of program theory, assessment of program process, impact assessment
and efficiency assessment (Rossi, Lipsey and Freeman, 2004).
A detailed
example of the positivist approach is a study conducted by the Public Policy
Institute of California report titled "Evaluating Academic Programs in California's
Community Colleges", in which the evaluators examine measurable activities (i.e.
enrollment data) and conduct quantitive assessments like factor analysis.

The second paradigm identified by Potter (2006) is that of interpretive approaches,
where it is argued that it is essential that the evaluator develops an understanding
of the perspective, experiences and expectations of all stakeholders. This would
lead to a better understanding of the various meanings and needs held by
stakeholders, which is crucial before one is able to make judgments about the merit
or value of a program. The evaluators contact with the program is often over an
extended period of time and, although there is no standardized method,
observation, interviews and focus groups are commonly used. A report
commissioned by the World Bank details 8 approaches in which qualitative and
quantitative methods can be integrated and perhaps yield insights not achievable
through only one method.

Potter (2006) also identifies critical-emancipatory approaches to program
evaluation, which are largely based on action research for the purposes of social
transformation. This type of approach is much more ideological and often includes
a greater degree of social activism on the part of the evaluator. This approach
would be appropriate for qualitative and participative evaluations. Because of its
critical focus on societal power structures and its emphasis on participation and
empowerment, Potter argues this type of evaluation can be particularly useful in
developing countries.
Despite the paradigm which is used in any program evaluation, whether it be
positivist, interpretive or critical-emancipatory, it is essential to acknowledge that
evaluation takes place in specific socio-political contexts. Evaluation does not exist
in a vacuum and all evaluations, whether they are aware of it or not, are influenced
by socio-political factors. It is important to recognize the evaluations and the
findings which result from this kind of evaluation process can be used in favour or
against particular ideological, social and political agendas (Weiss, 1999).
This is
especially true in an age when resources are limited and there is competition
between organizations for certain projects to be prioritised over others (Louw,

Empowerment evaluation[edit]
Main article: Empowerment evaluation
Empowerment evaluation makes use of evaluation concepts, techniques, and
findings to foster improvement and self-determination of a particular program
aimed at a specific target population/program participants.
evaluation is value oriented towards getting program participants involved in
bringing about change in the programs they are targeted for. One of the main
focuses in empowerment evaluation is to incorporate the program participants in
the conducting of the evaluation process. This process is then often followed by
some sort of critical reflection of the program. In such cases, an external/outsider
evaluator serves as a consultant/coach/facilitator to the program participants and
seeks to understand the program from the perspective of the participants. Once a
clear understanding of the participants perspective has been gained appropriate
steps and strategies can be devised (with the valuable input of the participants) and
implemented in order to reach desired outcomes.
According to Fetterman (2002)
empowerment evaluation has three steps;
Establishing a mission
Taking stock
Planning for the future
Establishing a mission[edit]
The first step involves evaluators asking the program participants and staff
members (of the program) to define the mission of the program. Evaluators may
opt to carry this step out by bringing such parties together and asking them to
generate and discuss the mission of the program. The logic behind this approach is
to show each party that there may be divergent views of what the program mission
actually is.
Taking stock[edit]
Taking stock as the second step consists of two important tasks. The first task is
concerned with program participants and program staff generating a list of current
key activities that are crucial to the functioning of the program. The second task is
concerned with rating the identified key activities, also known as prioritization.
For example, each party member may be asked to rate each key activity on a scale
from 1 to 10, where 10 is the most important and 1 the least important. The role of
the evaluator during this task is to facilitate interactive discussion amongst
members in an attempt to establish some baseline of shared meaning and
understanding pertaining to the key activities.In addition, relevant documentation
(such as financial reports and curriculum information) may be brought into the
discussion when considering some of the key activities.
Planning for the future[edit]
After prioritizing the key activities the next step is to plan for the future. Here the
evaluator asks program participants and program staff how they would like to
improve the program in relation to the key activities listed. The objective is to
create a thread of coherence whereby the mission generated (step 1) guides the
stock take (step 2) which forms the basis for the plans for the future (step 3). Thus,
in planning for the future specific goals are aligned with relevant key activities. In
addition to this it is also important for program participants and program staff to
identify possible forms of evidence (measurable indicators) which can be used to
monitor progress towards specific goals. Goals must be related to the program's
activities, talents, resources and scope of capability- in short the goals formulated
must be realistic.
These three steps of empowerment evaluation produce the potential for a program
to run more effectively and more in touch with the needs of the target population.
Empowerment evaluation as a process which is facilitated by a skilled evaluator
equips as well as empowers participants by providing them with a 'new' way of
critically thinking and reflecting on programs. Furthermore, it empowers program
participants and staff to recognize their own capacity to bring about program
change through collective action.

Transformative Paradigm[edit]
The transformative paradigm is integral in incorporating social justice in
evaluation. Donna Mertens, primary researcher in this field, states that the
transformative paradigm, focuses primarily on viewpoints of marginalized groups
and interrogating systemic power structures through mixed methods to further
social justice and human rights.
The transformative paradigm arose after
marginalized group, who have historically been pushed to the side in evaluation,
began to collaborate with scholars to advocate for social justice and human rights
in evaluation. The transformative paradigm introduces many different paradigms
and lenses to the evaluation process, leading it to continually call into question the
evaluation process.
Both the American Evaluation Association and National Association of Social
Workers call attention to the ethical duty to possess cultural competence when
conducting evaluations. Cultural competence in evaluation can be broadly defined
as a systemic, response inquiry that is actively cognizant, understanding, and
appreciative of the cultural context in which the evaluation takes place; that frames
and articulates epistemology of the evaluation endeavor; that employs culturally
and contextually appropriate methodology; and that uses stakeholder-generated,
interpretive means to arrive at the results and further use of the findings.
health and evaluation leaders are careful to point out that cultural competence
cannot be determined by a simple checklist, but rather it is an attribute that
develops over time. The root of cultural competency in evaluation is a genuine
respect for communities being studied and openness to seek depth in understanding
different cultural contexts, practices and paradigms of thinking. This includes
being creative and flexible to capture different cultural contexts, and heightened
awareness of power differentials that exist in an evaluation context. Important
skills include: ability to build rapport across difference, gain the trust of the
community members, and self-reflect and recognize ones own biases.

The paradigms axiology, ontology, epistemology, and methodology are reflective
of social justice practice in evaluation. These examples focus on addressing
inequalities and injustices in society by promoting inclusion and equality in human
Axiology (Values and Value Judgements)[edit]
The transformative paradigms axiological assumption rests on four primary

The importance of being culturally respectful
The promotion of social justice
The furtherance of human rights
Addressing inequities
Ontology (Reality)[edit]
Differences in perspectives on what is real are determined by diverse values and
life experiences. In turn these values and life experiences are often associated with
differences in access to privilege, based on such characteristics as disability,
gender, sexual identity, religion, race/ethnicity, national origins, political party,
income level, are, language, and immigration or refugee status.

Epistemology (Knowledge)[edit]
Knowledge is constructed within the context of power and privilege with
consequences attached to which version of knowledge is given
Knowledge is socially and historically located within a complex
cultural context.

Methodology (Systematic Inquiry)[edit]
Methodological decisions are aimed at determining the approach that will best
facilitate use of the process and findings to enhance social justice; identify the
systemic forces that support the status quo and those that will allow change to
happen; and acknowledge the need for a critical and reflexive relationship between
the evaluator and the stakeholders.

While operating through social justice, it is imperative to be able to view the world
through the lens of those who experience injustices. Critical Race Theory, Feminist
Theory, and Queer/LGBTQ Theory are frameworks for how we think should think
about providing justice for marginalized groups. These lenses create opportunity to
make each theory priority in addressing inequality.
Critical Race Theory[edit]
Critical Race Theory(CRT)is an extension of critical theory that is focused in
inequities based on race and ethnicity. Daniel Solorzano describes the role of CRT
as providing a framework to investigate and make visible those systemic aspects of
society that allow the discriminatory and oppressive status quo of racism to

Feminist Theory[edit]
The essence of feminist theories is to expose the individual and institutional
practices that have denied access to women and other oppressed groups and have
ignored or devalued women

Queer/LGBTQ Theory[edit]
Queer/LGBTQ theorists question the heterosexist bias that pervades society in
terms of power over and discrimination toward sexual orientation minorities.
Because of the sensitivity of issues surrounding LGBTQ status, evaluators need to
be aware of safe ways to protect such individuals identities and ensure that
discriminatory practices are brought to light in order to bring about a more just

Government requirements[edit]
Given the Federal budget deficit, the Obama Administration moved to apply an
"evidence-based approach" to government spending, including rigorous methods of
program evaluation. The President's 2011 Budget earmarked funding for 19
government program evaluations for agencies such as the Department of Education
and the United States Agency for International Development (USAID). An inter-
agency group delivers the goal of increasing transparency and accountability by
creating effective evaluation networks and drawing on best practices.
A six-step
framework for conducting evaluation of public health programs, published by
the Centers for Disease Control and Prevention (CDC), initially increased the
emphasis on program evaluation of government programs in the US. The
framework is as follows:
1. Engage stakeholders
2. Describe the program.
3. Focus the evaluation.
4. Gather credible evidence.
5. Justify conclusions.
6. Ensure use and share lessons learned.
CIPP Model of evaluation[edit]
History of the CIPP model[edit]
The CIPP model of evaluation was developed by Daniel Stufflebeam and
colleagues in the 1960s.CIPP is an acronym for Context, Input, Process and
Product. CIPP is an evaluation model that requires the evaluation
of context, input, process and product in judging a programmes value. CIPP is a
decision-focused approach to evaluation and emphasises the systematic provision
of information for programme management and operation.

CIPP model[edit]
The CIPP framework was developed as a means of linking evaluation with
programme decision-making. It aims to provide an analytic and rational basis for
programme decision-making, based on a cycle of planning, structuring,
implementing and reviewing and revising decisions, each examined through a
different aspect of evaluation context, input, process and product evaluation.

The CIPP model is an attempt to make evaluation directly relevant to the needs of
decision-makers during the phases and activities of a programme.
context, input, process, and product(CIPP) evaluation model is recommended as
a framework to systematically guide the conception, design, implementation, and
assessment of service-learning projects, and provide feedback and judgment of the
projects effectiveness for continuous improvement.

Four aspects of CIPP evaluation[edit]
These aspects are context, inputs, process, and product. These four aspects of CIPP
evaluation assist a decision-maker to answer four basic questions:
What should we do?
This involves collecting and analysing needs assessment data to determine goals,
priorities and objectives. For example, a context evaluation of a literacy program
might involve an analysis of the existing objectives of the literacy programme,
literacy achievement test scores, staff concerns (general and particular), literacy
policies and plans and community concerns, perceptions or attitudes and needs.

How should we do it?
This involves the steps and resources needed to meet the new goals and objectives
and might include identifying successful external programs and materials as well
as gathering information.

Are we doing it as planned?
This provides decision-makers with information about how well the programme is
being implemented. By continuously monitoring the program, decision-makers
learn such things as how well it is following the plans and guidelines, conflicts
arising, staff support and morale, strengths and weaknesses of materials, delivery
and budgeting problems.

Did the programme work?
By measuring the actual outcomes and comparing them to the anticipated
outcomes, decision-makers are better able to decide if the program should be
continued, modified, or dropped altogether. This is the essence of product

Using CIPP in the different stages of the evaluation[edit]
The CIPP model is unique as an evaluation guide as it allows evaluators to evaluate
the program at different stages, namely: before the program commences by helping
evaluators to assess the need and at the end of the program to assess whether or not
the program had an effect.
CIPP model allows you to ask formative questions at the beginning of the program,
then later gives you a guide of how to evaluate the programs impact by allowing
you to ask summative questions on all aspects of the program.
Context: What needs to be done? Vs. Were important needs addressed?
Input: How should it be done? Vs. Was a defensible design employed?
Process: Is it being done? Vs. Was the design well executed?
Product: Is it succeeding? Vs. Did the effort succeed?

What is assessment?
Adrian Tennant takes a look at what is meant by assessment. Many
people assume that assessment is simply another word for testing
but this article outlines its role as an important aspect of teaching
and learning.
When people see, or hear, the word assessment they normally react in a fairly negative way. It might
be a deep sigh or a cry of Oh no!, but rarely will it be a smile or a cry of joy. Why is it that people feel
this way at the mention of assessment? I think the first problem is that people dont really understand
what is meant (or should be meant) by assessment. A second issue could be that they have had fairly
bad experiences in the past and this has an influence on them. And, thirdly, it could be that
assessment is often seen as a pass or fail thing and nobody likes to fail.
Anchor Point:2So, what do we mean by assessment?
assessment, noun [U]
the process of making a judgement or forming an opinion, after considering something or someone
I think the most interesting thing here is the word process. The purpose of most forms of assessment
in the English Language classroom should be to inform people of how much progress a student is
making. Assessment can take many different forms and does not need to be limited to tests and
exams. Here are two types of assessment:
1. Activity assessment
a) Did you like that activity?
b) Was that activity easy or difficult?
c) What was the hardest part of that?
d) Was the activity useful? How? Why?
2. Self-assessment
a) Now I can
b) I still need to work on
c) Ive improved in
d) Today I learnt
e) In the test I got X and Y wrong. Im going to study these for homework.
As you can see, the onus here is on the students to think about what theyve done. Unlike tests which
are handed out, collected in and marked by a teacher and then handed back, these forms of
assessment are about the process of learning rather than only on the product.
Anchor Point:3Does this mean that tests are not a valid form of
No, not at all. But they are not the only form of assessment. If students only think of assessment in
terms of a formal test or exam then it is likely that they will have negative feelings towards the idea of
assessment. Its also important to emphasize that assessment shouldnt be about how good or bad
someone is at a particular point in time, it should be about the progress they have made, the work
theyve put in and the learning that has taken place. In other words, it should be about the process of
learning and not simply the results.
One thing that is quite useful to do with formal tests is to actually analyze the process as well as the
product. Here are a couple of ideas that can be used for this purpose:
1. After collecting in the test, hand out a blank copy to each student. Ask them to look at the test and a)
say how well they think they did on each particular question; b) say which questions were easy, ok,
difficult; and, c) say what score they think they got. Then, when you hand back the marked tests ask
them to compare their thoughts to the actual test, i.e. Did they get the questions right that they thought
they had?, etc.
2. After collecting in the test, hand out a blank copy to each student. Ask them to look at the test, choose
two questions and tell a partner how they worked out the answer.
Anchor Point:4When should assessment take place?
The simple answer is that it should take place at every stage of the learning process and that it should
be fairly frequent. Of course, there are many different forms of assessment. So, at the start of a
course some form of diagnostic assessment should take place to see how much students know. This
can then be used as a form of benchmark used later on to see how much progress has been made.
Throughout a course various forms of assessment can be used, from homework, project work, in
class activities to more formal tests. If you are required to give students a certain number of tests
each year say three then one thing you could do is give them five and tell them that only the best
three will be used. This kind of flexibility not only helps students be a little less worried but also takes
into account that people have bad days sometimes. In fact, we will see this idea of selection again
when we look at portfolios.
Anchor Point:5Helping students become comfortable
One of our first tasks as a teacher has got to be to help our students become more comfortable with
the idea of assessment. Because assessment often has a negative connotation and is equated with
tests, passing, failing and scores, this can be quite a challenge. But if we can make our students
understand that assessment is actually beneficial then it will make the whole process easier. Here are
a few simple ideas aimed at achieving this:
1. Talk about assessment with your students.
a) What is assessment?
b) Why do we assess students?
c) How are we going to assess them?
d) What are the criteria used? Are these criteria clear?
2. Get students involved in assessment.
a) Use self-assessment, i.e. Can do statements.
b) Use peer assessment.
c) Get students to come up with assessment criteria / agree criteria with students.
d) Get students involved in picking or designing assessment tasks.
3. Make assessment part of the teaching and learning process.
a) If you can build in a form of assessment regularly, maybe even every lesson, then your students
will become used to it and therefore more comfortable.
b) Make sure you include the results of any assessment into your teaching. For example, if students
have a particular problem with an aspect of grammar then go back over the grammar in a lesson
making it clear that you are doing this because it was identified as a problem from the assessment. If
students can see that you actually take notice of the assessment, and not simply the score, it will
become more meaningful and positive for them
Well give more specific ideas in some of the subsequent articles. However, the key here is to make
students see assessment as part of the teaching and learning process that has a direct influence on
what is taught. If students understand that assessment is about the process and not simply about a
product (i.e. a score), then they will start to have a more positive attitude towards it.
Anchor Point:6And finally
In this series of articles well take a closer look at the following areas of assessment:
Diagnostic tests
'Can do' statements, self-assessment and peer assessment
Assessing skills
Assessing tasks and lessons
Preparing students for tests and exams
Assessing Young Learners

Whats the difference between formative
and summative evaluations?
In formative evaluation, programs or projects are typically assessed during their
development or early implementation to provide information about how best to
revise and modify for improvement. This type of evaluation often is helpful for
pilot projects and new programs, but can be used for progress monitoring of
ongoing programs. In summative evaluation, programs or projects are assessed at
the end of an operating cycle, and findings typically are used to help decide
whether a program should be adopted, continued, or modified for improvement.
Both evaluation methods are recommended for use, when possible, to provide
program staff with ongoing feedback for program modifications (formative) as
well as periodic review of long-term progress on major program goals and
objectives (summative), and to meet regular reporting requirements (e.g., for a
grantor, agency, or organizational manager).