Anda di halaman 1dari 27


18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

A peer-reviewed electronic journal. ISSN 1531-7714


keyw ords


Copyright 1990, Copyright is retained by the first or sole author, who grants right of first publication to Practical Assessment, Research & Evaluation. Permission is granted to distribute this article for nonprofit, educational purposes if it is copied in its entirety and the journal is credited. PARE has the right to authorize third party reproduction of this article in print, electronic and database forms. Wiggins, Grant (1990). The case for authentic assessment. Practical Assessment, Research & Evaluation, 2(2). Retrieved February 27, 2013 from . This paper has been viewed 199,244 times since 11/13/1999.

The Case for Authentic Assessment.

Grant Wiggins

Mr. Wiggins, a researcher and consultant on school reform issues, is a widelyknown advocate of authentic assessment in education. This article is based on materials that he prepared for the California Assessment Program.
WHAT IS AUTHENTIC ASSESSMENT? Assessment is authentic when we directly examine student performance on worthy intellectual tasks. Traditional assessment, by contract, relies on indirect or proxy 'items'--efficient, simplistic substitutes from which we think valid inferences can be made about the student's performance at those valued challenges. Do we want to evaluate student problem-posing and problem-solving in mathematics? experimental research in science? speaking, listening, and facilitating a discussion? doing document-based historical inquiry? thoroughly revising a piece of imaginative writing until it "works" for the reader? Then let our assessment be built out of such exemplary intellectual challenges.

Further comparisons with traditional standardized tests will help to clarify what "authenticity" means when considering assessment design and use: * Authentic assessments require students to be effective performers with acquired knowledge. Traditional tests tend to reveal only whether the student can recognize, recall or "plug in" what was learned out of context. This may be as problematic as inferring driving or teaching ability from written tests alone. (Note, therefore, that the debate is not "either-or": there may well be virtue in an array of local and state assessment instruments as befits the purpose of the measurement.) * Authentic assessments present the student with the full array of tasks that mirror the priorities and challenges found in the best instructional activities: conducting research; writing, revising and discussing papers; providing an engaging oral analysis of a recent political event; collaborating with others on a debate, etc. Conventional tests are usually limited to paper-and-pencil, oneanswer questions. * Authentic assessments attend to whether the student can craft polished, thorough and justifiable answers, performances or products. Conventional tests typically only ask the student to select or write correct responses--irrespective of reasons. (There is rarely an adequate opportunity to plan, revise and substantiate responses on typical tests, even when there are open-ended questions). As a result, * Authentic assessment achieves validity and reliability by emphasizing and standardizing the appropriate criteria for scoring such (varied) products; traditional testing standardizes objective "items" and, hence, the (one) right answer for each. * "Test validity" should depend in part upon whether the test simulates realworld "tests" of ability. Validity on most multiple-choice tests is determined merely by matching items to the curriculum content (or through sophisticated correlations with other test results). * Authentic tasks involve "ill-structured" challenges and roles that help students rehearse for the complex ambiguities of the "game" of adult and professional life. Traditional tests are more like drills, assessing static and too-often arbitrarily discrete or simplistic elements of those activities. Beyond these technical considerations the move to reform assessment is based upon the premise that assessment should primarily support the needs of learners. Thus, secretive tests composed of proxy items and scores that have no obvious meaning or usefulness undermine teachers' ability to improve

instruction and students' ability to improve their performance. We rehearse for and teach to authentic tests--think of music and military training--without compromising validity. The best tests always teach students and teachers alike the kind of work that most matters; they are enabling and forward-looking, not just reflective of prior teaching. In many colleges and all professional settings the essential challenges are known in advance--the upcoming report, recital, Board presentation, legal case, book to write, etc. Traditional tests, by requiring complete secrecy for their validity, make it difficult for teachers and students to rehearse and gain the confidence that comes from knowing their performance obligations. (A known challenge also makes it possible to hold all students to higher standards). WHY DO WE NEED TO INVEST IN THESE LABOR-INTENSIVE FORMS OF ASSESSMENT? While multiple-choice tests can be valid indicators or predictors of academic performance, too often our tests mislead students and teachers about the kinds of work that should be mastered. Norms are not standards; items are not real problems; right answers are not rationales. What most defenders of traditional tests fail to see is that it is the form, not the content of the test that is harmful to learning; demonstrations of the technical validity of standardized tests should not be the issue in the assessment reform debate. Students come to believe that learning is cramming; teachers come to believe that tests are after-the-fact, imposed nuisances composed of contrived questions--irrelevant to their intent and success. Both parties are led to believe that right answers matter more than habits of mind and the justification of one's approach and results. A move toward more authentic tasks and outcomes thus improves teaching and learning: students have greater clarity about their obligations (and are asked to master more engaging tasks), and teachers can come to believe that assessment results are both meaningful and useful for improving instruction. If our aim is merely to monitor performance then conventional testing is probably adequate. If our aim is to improve performance across the board then the tests must be composed of exemplary tasks, criteria and standards. WON'T AUTHENTIC ASSESSMENT BE TOO EXPENSIVE AND TIMECONSUMING? The costs are deceptive: while the scoring of judgment-based tasks seems expensive when compared to multiple-choice tests (about $2 per student vs. 1

cent) the gains to teacher professional development, local assessing, and student learning are many. As states like California and New York have found (with their writing and hands-on science tests) significant improvements occur locally in the teaching and assessing of writing and science when teachers become involved and invested in the scoring process. If costs prove prohibitive, sampling may well be the appropriate response--the strategy employed in California, Vermont and Connecticut in their new performance and portfolio assessment projects. Whether through a sampling of many writing genres, where each student gets one prompt only; or through sampling a small number of all student papers and school-wide portfolios; or through assessing only a small sample of students, valuable information is gained at a minimum cost. And what have we gained by failing to adequately assess all the capacities and outcomes we profess to value simply because it is time-consuming, expensive, or labor-intensive? Most other countries routinely ask students to respond orally and in writing on their major tests--the same countries that outperform us on international comparisons. Money, time and training are routinely set aside to insure that assessment is of high quality. They also correctly assume that high standards depend on the quality of day-to-day local assessment--further offsetting the apparent high cost of training teachers to score student work in regional or national assessments. WILL THE PUBLIC HAVE ANY FAITH IN THE OBJECTIVITY AND RELIABILITY OF JUDGMENT-BASED SCORES? We forget that numerous state and national testing programs with a high degree of credibility and integrity have for many years operated using human judges: * the New York Regents exams, parts of which have included essay questions since their inception--and which are scored locally (while audited by the state); * the Advanced Placement program which uses open-ended questions and tasks, including not only essays on most tests but the performance-based tests in the Art Portfolio and Foreign Language exams; * state-wide writing assessments in two dozen states where model papers, training of readers, papers read "blind" and procedures to prevent bias and drift gain adequate reliability; * the National Assessment of Educational Progress (NAEP), the Congressionally-mandated assessment, uses numerous open-ended test questions and writing prompts (and successfully piloted a hands-on test of

science performance); * newly-mandated performance-based and portfolio-based state-wide testing in Arizona, California, Connecticut, Kentucky, Maryland, and New York. Though the scoring of standardized tests is not subject to significant error, the procedure by which items are chosen, and the manner in which norms or cutscores are established is often quite subjective--and typically immune from public scrutiny and oversight. Genuine accountability does not avoid human judgment. We monitor and improve judgment through training sessions, model performances used as exemplars, audit and oversight policies as well as through such basic procedures as having disinterested judges review student work "blind" to the name or experience of the student--as occurs routinely throughout the professional, athletic and artistic worlds in the judging of performance. Authentic assessment also has the advantage of providing parents and community members with directly observable products and understandable evidence concerning their students' performance; the quality of student work is more discernible to laypersons than when we must rely on translations of talk about stanines and renorming. Ultimately, as the researcher Lauren Resnick has put it, What you assess is what you get; if you don't test it you won't get it. To improve student performance we must recognize that essential intellectual abilities are falling through the cracks of conventional testing. ADDITIONAL READING Archbald, D. & Newmann, F. (1989) "The Functions of Assessment and the Nature of Authentic Academic Achievement," in Berlak (ed.) Assessing Buffalo, NY: SUNY Press.

Achievement: Toward the development of a New Science of Educational Testing.

Frederiksen, J. & Collins, A. (1989) "A Systems Approach to Educational Testing," Educational Researcher, 18, 9 (December). National Commission on Testing and Public Policy (1990) From Gatekeeper to Gateway: Transforming Testing in America. Chestnut Hill, MA: NCTPP, Boston College. Wiggins, G. (1989) "A True Test: Toward More Authentic and Equitable

Assessment," Phi Delta Kappan, 70, 9 (May). Wolf, D. (1989) "Portfolio Assessment: Sampling Student Work," Educational Leadership 46, 7, pp. 35-39 (April). -----

Authentic Assessment Resources


the ERIC Clearinghouse on Assessment and Evaluation

Recommendations for Teachers

Teachers who have begun to use alternative assessment in their classrooms are good sources for ideas and guidance. The following recommendations were made by teachers in Virginia after they spent six months developing and implementing alternative assessment activities in their classrooms. 1. Start small. Follow someone else's example in the beginning, or do one activity in combination with a traditional test. 2. Develop clear rubrics. Realize that developing an effective rubric (rating scale with several categories) for judging student products and performances is harder than carrying out the activity. Standards and expectations must be clear. Benchmarks for levels of performance are essential. Characteristics of typical student products and performances may be used to generate performance assessment rubrics and standards for the class. 3. Expect to use more time at first. Developing and evaluating alternative assessments and their rubrics requires additional time until you and your students become comfortable with the method. 4. Adapt existing curriculum. Plan assessment as you plan instruction, not as an afterthought. 5. Have a partner. Sharing ideas and experiences with a colleague is beneficial to teachers and to students. 6. Make a collection. Look for examples of alternative assessments or activities that could be modified for your students and keep a file readily accessible. 7. Assign a high value (grade) to the assessment. Students need to see the experience as being important and worth their time. Make expectations clear in advance. 8. Expect to learn by trial and error. Be willing to take risks and learn from mistakes, just as we expect students to do. The best assessments are developed over time and with repeated use. 9. Try peer assessment activities. Relieve yourself of some grading responsibilities and increase student evaluation skills and accountability by involving them in administering assessments.

10. Don't give up. If the first tries are not as successful as you had hoped, remember, this is new to the students, too. They can help you refine the process. Once you have tried an alternative assessment, reflect and evaluate the activities. Ask yourself some questions. What worked? What needs modification? What would I do differently? Would I use this activity again? How did the students respond? Did the end results justify the time spent? Did students learn from the activity? Virginia Education Association and the Appalac hia Educational Laboratory (1992)


Prepared by Lawrence Rudner, ERIC/AE and Carol Boston, ACCESS ERIC So, what's all the hoopla about? Federal commissions have endorsed performance assessment. It's been discussed on C-SPAN and in a number of books and articles. Full issues of major education journals, including Educational Leadership (April 1989 and May 1992) and Phi Delta Kappan (February 1993), have been devoted to performance assessment. A surprisingly large number of organizations are actively involved in developing components of a performance assessment system. Chances are good that one or more of your professional associations is in the middle of debating goals and standards right now. Is this just the latest bandwagon? Another short-term fix? Probably not. The performance assessment movement encompasses much more than a technology for testing students. It requires examining the purposes of education, identifying skills we want students to master, and empowering teachers. Even without an assessment component, these activities can only be good for education. You can be certain they will have an impact on classrooms. This article describes performance assessments, weighs their advantages and disadvantages as instructional tools and accountability measures, and offers suggestions to teachers and administrators who want to use performance assessments to improve teaching and learning. Key Features of Performance Assessment The Office of Technology Assessment (OTA) of the U.S. Congress (1992) provides a simple, yet insightful, definition of performance assessment: testing that requires a student to create an answer or a product that demonstrates his or her knowledge or skills. A wide variety of assessment techniques fall within this broad definition. Several are described in Table 1. One key feature of all performance assessments is that they require students to be active participants. Rather than choosing from presented options, as in traditional multiplechoice tests, students are responsible for creating or constructing their responses. These may vary in complexity from writing short answers or essays to designing and conducting experiments or creating comprehensive portfolios. It is important to note that proponents of "authentic assessment" make distinctions among the various types of performance assessments, preferring those that have meaning and value in themselves to those that are meaningful primarily in an academic context. In a chemistry class, for example, students might be asked to identify the chemical composition of a premixed solution by applying tests for various properties, or they might take samples from local lakes and rivers and identify pollutants. Both assessments would be performance-based, but the one involving the real-world problem would be considered more authentic.

Testing has traditionally focused on whether students get the right answers; how they arrive at their answers has been considered important only during the test development. When students take a standardized mathematics test, for example, there is no way to distinguish among those who select the correct answer because they truly understand the problem, those who understand the problem but make a careless calculation mistake,

and those who have no idea how to do the work but simply guess correctly. Performance assessments, on the other hand, require students to demonstrate knowledge or skills; therefore, the process by which they solve problems becomes important. To illustrate, if high school juniors are asked to demonstrate their understanding of interest rates by comparison shopping for a used-car loan and identifying the best deal in a report, a teacher can easily see if they understand the concept of interest, know how to calculate it, and perform mathematical operations accurately. In performance assessment, items directly reflect intended outcomes. Whereas a traditional test might ask students about grammar rules, a performance assessment would have them demonstrate their understanding of English grammar by editing a poorly written passage. A traditional auto mechanics test might include questions about a front-end alignment; a performance assessment would have students do one. Performance assessments can also measure skills that have not traditionally been measured in large groups of students skills such as integrating knowledge across disciplines, contributing to the work of a group, and developing a plan of action when confronted with a novel situation. Grant Wiggins (1990) captures their potential nicely: Do we want to evaluate student problem-posing and problem-solving in mathematics? Experimental research in science? Speaking, listening, and facilitating a discussion? Doing document-based historical inquiry? Thoroughly revising a piece of imaginative writing until it `works' for the reader? Then let our assessment be built out of such exemplary intellectual challenges. What's Wrong With the Way We've Been Doing It? Many tests used in state and local assessments, as well as the Scholastic Aptitude Test and the National Assessment of Educational Progress, have been criticized for failing to provide the information we need about students and their ability to meet specific curricular objectives. Critics contend that these tests, as currently formulated, often assess only a narrow range of the curriculum; focus on aptitudes, not specific curriculum objectives; and emphasize minimum competencies, thus creating little incentive for students to excel. Further, they yield results that are analyzed primarily on the national, state, and district levels rather than used to improve the performance of individual pupils or schools. The true measure of performance assessment must, however, lie in its ability to assess desired skills, not in the alleged inability of other forms of assessment. Here We Go Again? You might ask, "Is performance assessment really new?" Good classroom teachers have used projects and portfolios for years, preparing numerous activities requiring students to blend skills and insights across disciplines. Performance assessment has been particularly common in vocational education, the military, and business. ERIC has used "performance tests" as a descriptor since 1966. What is new is the widespread interest in the potential of performance assessment. Many superintendents, state legislators, governors, and Washington officials see high-stakes performance tests as a means to motivate students to learn and schools to teach concepts and skills that are more in line with today's expectations. This perspective will be called the motivator viewpoint in this article. Many researchers, curriculum specialists, and teachers, on the other hand, see performance

assessment as empowering teachers by providing them with better instructional tools and a new emphasis on teaching more relevant skills, a perspective that will be referred to as the empowerment viewpoint here. Proponents of both viewpoints agree on the need to change assessment methods but differ in their views about how assessment information should be used. On the Value of Performance Assessments Advocates of the motivator and empowerment viewpoints concur that performance assessments can form a solid foundation for improving schools and increasing what students know and can do. However, the two groups frame the advantages differently. Their positions are sketched here briefly, then developed more fully in the sections that follow. The motivators emphasize that performance-based assessments, if instituted on a district, state, or national level, will allow us to monitor the effectiveness of schools and teachers and track students' progress toward achieving national educational goals (see "Standards, Assessments, and the National Education Goals" on pp. X X). According to the motivator viewpoint, performance assessments will make the educational system more accountable for results. Proponents expect them to do the following: prompt schools to focus on important, performance-based outcomes; provide sound data on achievement, not just aptitude; allow valid comparisons among schools, districts, and states; and yield results for every important level of the education system, from individual children to the nation as a whole. Those in the empowerment camp, on the other hand, tend to focus on how performance assessments will improve teaching and learning at the classroom level. Instructional objectives in most subject areas are being redefined to include more practical applications and more emphasis on synthesis and integration of content and skills. Performance assessments that are closely tied to this new curriculum can give teachers license to emphasize important skills that traditionally have not been measured. Performance assessments can also provide teachers with diagnostic information to help guide instruction. The outcomes-based education (OBE) movement supports instructional activities closely tied to performance assessment tasks. Under OBE, students who do not demonstrate the level of accomplishment their local communities and school districts have agreed upon receive additional instruction to bring them up to the level. High-Stakes Performance Assessments as Motivators One of the most historic events concerning education occurred in September 1989, when President George Bush held an education summit in Charlottesville, Virginia, with the nation's governors. Together, the participants hammered out six far-reaching national education goals, effectively acknowledging that education issues transcend state and local levels to affect the democratic and economic foundations of the entire country. In a closing statement, participants announced, We unanimously agree that there is a need for the first time in this nation's history to have specific results-oriented goals. We recognize the need for ... accountability for outcome-related results. Consensus is now building among state legislators, governors, members of Congress, Washington officials, and the general public regarding the desirability and feasibility of some sort of voluntary national assessment system linked with high national standards in such subject areas as

mathematics, science, English, history, geography, and the arts. A number of professional organizations have received funding to coordinate the development of such standards (see sidebar on p. X). The groundbreaking work of the National Council of Teachers of Mathematics (NCTM) serves as a model for this process: NCTM developed its Standards in CB: date and is now developing curriculum frameworks and assessment guidelines to match it (see "From Standards to Assessment" on p. X). The National Council on Education Standards and Testing (NCEST), an advisory group formed by Congress and the President in response to national and state interest in national standards and assessments, describes the motivational effect of a national system of assessments in its 1992 report, Raising Standards for American Education: National standards and a system of assessments are desirable and feasible mechanisms for raising expectations, revitalizing instruction, and rejuvenating educational reform efforts for all American schools and students (p. 8). Envision, if you will, the enormous potential of an assessment that perfectly and equitably measures the right skills. NCEST believes that developing standards and high-quality assessments has "the potential to raise learning expectations at all levels of education, better target human and fiscal resources for educational improvement, and help meet the needs of an increasingly mobile population". This is a shared vision. At least a half-dozen groups have begun calling for a national assessment system or developing instrumentation during the past two years (see Calls for New Assessments. According to NCEST, student standards must be "world class" and include the "specification of content what students should know and be able to do and the level of performance students are expected to attain" (p. 3). NCEST envisions standards that include substantative content together with complex problem-solving and higher-order thinking skills. Such standards would reflect "high expectations not expectations of minimal competency" (p. 13). NCEST believes in the motivation potential of these world-class standards, stating that they will "raise the ceiling for students who are currently above average" and "lift the floor for those students who now experience the least success in school" (p. 4). Acknowledging that tests tend to influence curriculum, NCEST suggests that assessments should be developed to reflect the new high standards. Such assessments would not be immediately associated with high stakes. However, once issues of validity, reliability, and fairness have been resolved, these assessments "could be used for such high-stakes purposes as high school graduation, college admission, continuing education, or certification for employment. Assessments could also be used by states and localities as the basis for system accountability" (p. 27). The U.S. already has one national assessment in place, the National Assessment of Educational Progress (NAEP). Since 1969, the U.S. Department of Education-sponsored NAEP has been used to assess what our nation's children know in a variety of curriculum areas, including mathematics, reading, science, writing, U.S. history, and geography. Historically, NAEP has been a multiple-choice test administered to random samples of fourth-, eighth-, and twelfth-graders in order to report on the educational progress of our nation as a whole. As interest in accountability has grown, NAEP has begun to conduct trial state-level assessments. NAEP is also increasing the number of performance-based tasks to better reflect what students can do (see "Performance-Based Aspects of the National Assessment

of Educational Progress" on p. x). The National Council on Education Standards and Testing envisions that large-scale sample assessments such as NAEP will be one component of a national system of assessments, to be coupled with assessments that can provide results for individual students. Supporters argue that a system of national assessments would improve education by giving parents and students more accurate, relevant, and comparable data and encouraging students to strive for world-class standards of achievement. Critics of a national assessment system are equally visible. The National Education Association and other professional associations have argued that high-stakes national assessments will not improve schooling and could easily be harmful. They are particularly concerned that students with disabilities, students whose native language is not English, and students and teachers attending schools with minimal resources will be penalized under such a system. Fearing that a national assessment system might not be a good model and could short-circuit current reform efforts, The National Center for Fair and Open Testing, or FairTest, testified that the federal dollars would be better spent in support of state efforts. Performance Assessment for Teacher Empowerment An enormous amount of activity is taking place in the area of establishing national standards and a system of assessments. The assessments are expected to encompass performance-based tasks that call on students to demonstrate what they can do. They may well have strong accountability features and be used eventually to make high-stakes decisions. Should building principals and classroom teachers get excited about performance assessment now? Absolutely. Viewed in its larger context, performance assessment can play an important part in the school reform/restructuring movement: Performance assessment can be seen as a lever to promote the changes needed for the assessment to be maximally useful. Among these changes are a redefinition of learning and a different conception of the place of assessment in the education process (Mitchell, 1992). In order to implement performance assessment fully, administrators and teachers must have a clear picture of the skills they want students to master and a coherent plan for how students are going to master those skills. They need to consider how students learn and what instructional strategies are most likely to be effective. Finally, they need to be flexible in using assessment information for diagnostic purposes to help individual students achieve. This level of reflection is consistent with the best practices in education. As Joan Herman, Pamela Aschbacher, and Lynn Winters note in their important book, A Practical Guide to Alternative Assessment (1992), No longer is learning thought to be a one-way transmission from teacher to students, with the teacher as lecturer and the students as passive receptacles. Rather, meaningful instruction engages students actively in the learning process. Good teachers draw on and synthesize discipline-based knowledge, knowledge of student learning, and knowledge of child development. They use a variety of instructional strategies, from direct instruction to coaching, to involve their students in meaningful activities . . .

and to achieve specific learning goals (p. 12). Quality performance assessment is a key part of this vision because "good teachers constantly assess how their students are doing, gather evidence of problems and progress, and adjust their instructional plans accordingly" (p. 12). Properly implemented, performance assessment offers an opportunity to align curriculum and teaching efforts with the important skills we wish children to master. Cognitive learning theory, which emphasizes that knowledge is constructed and that learners vary, provides some insight into what an aligned curriculum might look like (see Implications from Learning Theory).

Developing Performance Assessment Tasks

What is performance-based assessment? Performance-based assessment is an approach to the monitoring of students' progress in relationship to identified learner outcomes. This method of assessment requires the student to create answers or products which demonstrate his/her knowledge or skills. This differs from traditional testing methods which require a student to select a single correct answer or to fill in the blank. What are the characteristics of an effective performance assessment task? The Office of Technology Assessment of the U.S. Congress defines performance assessment as any form of testing that requires a student to create an answer or a product that demonstrates his or her knowledge or skills. According to Stephen K. Hess, Director or Criterion Referenced Evaluation and Testing for Frederick County Public Schools, the goal of effective performance assessment is to develop important tasks that are worthwhile and engaging for student, requiring the application of skills and knowledge learned prior to the assessment. Experts in the field emphasize that any effective performance assessment task should have the following design features: Students should be active participants, not passive selectors of the single right answer." Intended outcomes should be clearly identified and should guide the design of a performance task.

Students should be expected to demonstrate mastery of those intended outcomes when responding to all facets of the task. Students must demonstrate their ability to apply their knowledge and skills to reality-based situations and scenarios. A clear, logical set of performance-based activities that students are expected to follow should be evident. A clearly presented set of criteria should be available to help judge the degree of proficiency in a student response.

What does current research in education and psychology suggest about the value of performance assessment? Click here to see suggestions for aligning instruction and assessment. Suggestions are based on a specific implication from Cognitive Learning Theory (CLT) research What process do I use to design a performance assessment task? Use this link to see a step-by-step procedure for designing performance assessment tasks. Each step includes quiding questions for teachers to think about as they work through this process. How are performance assessment tasks scored? This link provides an overview of process used in the Maryland School Performance Assessment Program for scoring student responses. Included in this link are sample rubrics, rules, and keys, accompanied by an explanation of how each is used. Back to Instructional Strategies Home
From the ERIC database

Creating Meaningful Performance Assessments. ERIC Digest E531.

Elliott, Stephen N.
Performance assessment is a viable alternative to norm-referenced tests. Teachers can use performance assessment to obtain a much richer and more complete picture of what students know and are able to do. DEFINING PERFORMANCE ASSESSMENT Defined by the U.S. Congress, Office of Technology Assessment (OTA) (1992), as "testing methods that require students to create an answer or product that demonstrates their knowledge and skills," performance assessment can take many forms including: *Conducting experiments. *Writing extended essays. *Doing mathematical computations.

Performance assessment is best understood as a continuum of assessment formats ranging from the simplest student-constructed responses to comprehensive demonstrations or collections of work over time. Whatever format, common features of performance assessment involve: 1. Students' construction rather than selection of a response. 2. Direct observation of student behavior on tasks resembling those commonly required for functioning in the world outside school. 3. Illumination of students' learning and thinking processes along with their answers (OTA, 1992). Performance assessments measure what is taught in the curriculum. There are two terms that are core to depicting performance assessment: 1. Performance: A student's active generation of a response that is observable either directly or indirectly via a permanent product. 2. Authentic: The nature of the task and context in which the assessment occurs is relevant and represents "real world" problems or issues. HOW DO YOU ADDRESS VALIDITY IN PERFORMANCE ASSESSMENTS? The validity of an assessment depends on the degree to which the interpretations and uses of assessment results are supported by empirical evidence and logical analysis. According to Baker and her associates (1993), there are five internal characteristics that valid performance assessments should exhibit: 1. Have meaning for students and teachers and motivate high performance. 2. Require the demonstration of complex cognition, applicable to important problem areas. 3. Exemplify current standards of content or subject matter quality. 4. Minimize the effects of ancillary skills that are irrelevant to the focus of assessment. 5. Possess explicit standards for rating or judgment. When considering the validity of a performance test, it is important to first consider how the test or instrument "behaves" given the content covered. Questions should be asked such as: *How does this test relate to other measures of a similar construct? *Can the measure predict future performances? *Does the assessment adequately cover the content domain? It is also important to review the intended effects of using the assessment instrument. Questions about the use of a test typically focus on the test's ability to reliably differentiate individuals into groups and guide the methods teachers use to teach the subject matter covered by the test. A word of caution: Unintended uses of assessments can have precarious effects. To prevent the misuse of assessments, the following questions should be considered: *Does use of the instrument result in discriminatory practices against various groups of individuals? *Is it used to evaluate others (e.g., parents or teachers) who are not directly assessed by the test?

PROVIDING EVIDENCE FOR THE RELIABILITY AND VALIDITY OF PERFORMANCE ASSESSMENT The technical qualities and scoring procedures of performance assessments must meet high standards for reliability and validity. To ensure that sufficient evidence exists for a measure, the following four issues should be addressed: 1. Assessment as a Curriculum Event. Externally mandated assessments that bear little, if any, resemblance to subject area domain and pedagogy cannot provide a valid or reliable indication of what a student knows and is able to do. The assessment should reflect what is taught and how it is taught. Making an assessment a curriculum event means reconceptualizing it as a series of theoretically and practically coherent learning activities that are structured in such a way that they lead to a single predetermined end. When planning for assessment as a curriculum event, the following factors should be considered: *The content of the instrument. *The length of activities required to complete the assessment. *The type of activities required to complete the assessment. *The number of items in the assessment instrument. *The scoring rubric. 2. Task Content Alignment with Curriculum. Content alignment between what is tested and what is taught is essential. What is taught should be linked to valued outcomes for students in the district. 3. Scoring and Subsequent Communications with Consumers. In large scale assessment systems, the scoring and interpretation of performance assessment instruments is akin to a criterion-referenced approach to testing. A student's performance is evaluated by a trained rater who compares the student's responses to multitrait descriptions of performances and then gives the student a single number corresponding to the description that best characterizes the performance. Students are compared directly to scoring criteria and only indirectly to each other. In the classroom, every student needs feedback when the purpose of performance assessment is diagnosis and monitoring of student progress. Students can be shown how to assess their own performances when: *The scoring criteria are well articulated. *Teachers are comfortable with having students share in their own evaluation process. 4. Linking and Comparing Results Over Time. Linking is a generic term that includes a variety of approaches to making results of one assessment comparable to those of another. Two appropriate and manageable approaches to linking in performance assessment include: *Statistical Moderation. This approach is used to compare performances across content areas for groups of students who have taken a test at the same point in time. *Social Moderation. This is a judgmental approach that is built on consensus of raters. The comparability of scores assigned depends substantially on the development of consensus among professionals. HOW CAN TEACHERS INFLUENCE STUDENTS' PERFORMANCES? Performance assessment is a promising method that is achievable in the classroom. In classrooms, teachers can use data gathered from performance assessment to guide instruction. Performance assessment should interact with instruction that precedes and follows an assessment task. When using performance assessments, students' performances can be positively influenced by:

1. Selecting assessment tasks that are clearly aligned or connected to what has been taught. 2. Sharing the scoring criteria for the assessment task with students prior to working on the task. 3. Providing students with clear statements of standards and/or several models of acceptable performances before they attempt a task. 4. Encouraging students to complete self-assessments of their performances. 5. Interpreting students' performances by comparing them to standards that are developmentally appropriate, as well as to other students' performances. REFERENCES Baker, E. L., O'Neill, H. F., Jr., & Linn, R. L. (1993). Policy and validity prospects for performance-based assessments. American Psychologist, 48, 1210-1218. U.S. Congress, Office of Technology Assessment. (1992, February). Testing in American schools: Asking the right questions. (OTA-SET-519). Washington, DC: U.S. Government Printing Office. Derived from: Elliot, S. N. (1994). Creating Meaningful Performance Assessments: Fundamental Concepts. Reston, VA: The Council for Exceptional Children. Product #P5059. ERIC Digests are in the public domain and may be freely reproduced and disseminated. This publication was prepared with funding from the National Library of Education (NLE), Office of Educational Research and Improvement, U.S. Department of Education, under contract no. RR93002005. The opinions expressed in this report do not necessarily reflect the positions or policies of NLE, OERI, or the Department of Education

Title: Creating Meaningful Performance Assessments. ERIC Digest E531. Author: Elliott, Stephen N. Note: 3p.; Derived from "Creating Meaningful Performance Assessments: Fundamental Concepts," by Stephen N. Elliott; see ED 375 566. Publication Year: 1995 Document Type: Eric Product (071); Eric Digests (selected) (073) Target Audience: Practitioners ERIC Identifier: ED381985 Available from: Clearinghouse on Disabilities and Gifted Education, Council for Exceptional Children, 1920 Association Dr., Reston, VA 22091-1589 ($1 each, $5 minimum order prepaid). This document is available from the ERIC Document Reproduction Service. Descriptors: Definitions; Elementary Secondary Education; * Evaluation Methods; Guidelines; * Performance; * Student Evaluation; Test Reliability; Test Validity Identifiers: ERIC Digests; *Performance Based Evaluation

A peer-reviewed electronic journal. ISSN 1531-7714


keyw ords


Copyright 2003, Copyright is retained by the first or sole author, who grants right of first publication to Practical Assessment, Research & Evaluation. Permission is granted to distribute this article for nonprofit, educational purposes if it is copied in its entirety and the journal is credited. PARE has the right to authorize third party reproduction of this article in print, electronic and database forms. Moskal, Barbara M. (2003). Recommendations for developing classroom performance assessments and scoring rubrics. Practical Assessment, Research & Evaluation, 8(14). Retrieved February 27, 2013 from . This paper has been viewed 146,388 times since 5/29/2003.

Recommendations for Developing Classroom Performance Assessments and Scoring Rubrics

Barbara M. Moskal
Colorado School of Mines This paper provides a set of recommendations for developing classroom performance assessments and scoring rubrics similar to the sets of recommendations for multiple choice tests provided in this journal by Frary (1995) and Kehoe (1995a, 1995b). The recommendations are divided into five categories: 1) Writing Goals and Objectives, 2) Developing Performance Assessments, 3) Developing Scoring Rubrics, 4) Administering Performance Assessments and 5) Scoring, Interpreting and Using Results. A broad literary base currently exists for each of these categories. This article draws from this base to provide a set of recommendations that guide the classroom teacher through the four phases of the classroom assessment process planning, gathering, interpreting and using (Moskal, 2000a). Each section concludes with references for further reading. Writing Goals and Objectives Before a performance assessment or a scoring rubric is written or selected, the teacher should clearly identify the purpose of the activity. As is the case with any assessment, a clear statement of goals and objectives should be written to guide the development of both the performance assessment and the scoring rubric. "Goals" are broad statements of expected student outcomes and "objectives" divide the goals into observable behaviors (Rogers & Sando, 1996). Questions such as, "What do I hope to learn about my students' knowledge or skills?," "What content, skills and knowledge should the activity

be designed to assess?," and "What evidence do I need to evaluate the appropriate skills and knowledge?", can help in the identification of specific goals and objectives.

Recommendations for writing goals and objectives:

1. The statement of goals and accompanying objectives should provide a clear focus for both instruction and assessment. Another manner in which to phrase this recommendation is that the stated goals and objectives for the performance assessment should be clearly aligned with the goals and objectives of instruction. Ideally, a statement of goals and objectives is developed prior to the instructional activity and is used to guide both instruction and assessment. 2. Both goals and objectives should reflect knowledge and information that is worthwhile for students to learn. Both the instruction and the assessment of student learning are intentional acts and should be guided through planning. Goals and objectives provide a framework for the development of this plan. Given the critical relationship between goals and objectives and instruction and assessment, goals and objectives should reflect important learning outcomes. 3. The relationship between a given goal and the objectives that describe that goal should be apparent. Objectives lay the framework upon which a given goal is evaluated. Therefore, there should be a clear link between the statement of the goal and the objectives that define that goal.

4. All of the important aspects of the given goal should be reflected through the objectives. Once again, goals and objectives provide a framework for evaluating
the attainment of a given goal. Therefore, the accompanying set of objectives should reflect the important aspects of the goal. 5. Objectives should describe measurable student outcomes. Since objectives provide the framework for evaluation, they need to be phrased in a manner that specifies the student behavior that will demonstrate the attainment of the larger goal. 6. Goals and objectives should be used to guide the selection of an appropriate assessment activity. When the goals and objectives are focused upon the recall of factual knowledge, a multiple choice or short response assessment may be more appropriate and efficient than a performance assessment. When the goals and objectives are focused upon complex learning outcomes, such as reasoning, communication, teamwork, etc., a performance assessment is likely to be appropriate (Perlman, 2002).

Writing goals and objectives, at first, appears to be a simple. After all, this process primarily requires clearly defining the desired student outcomes. Many teachers initially have difficulty creating goals and objectives that can be used to guide instruction and that can be measured. An excellent resource that specifically focuses upon the "how to" of writing measurable objectives is a book by Gronlund (2000). Other authors have also addressed these issues in subsections of larger works (e.g., Airasian, 2000; 2001; Oosterhoff, 1999). Developing Performance Assessment As the term suggests, performance assessments require a demonstration of students' skills or knowledge (Airasian, 2000; 2001; Brualdi, 1998; Perlman, 2002). Performance assessments can take on many different forms, which include written and oral demonstrations and activities that can be completed by either a group or an individual. A factor that distinguishes performance assessments from other extended response activities is that they require students to demonstrate the application of knowledge to a particular context (Brualdi, 1998; Wiggins, 1993). Through observation or analysis of a student's response, the teacher can determine what the student knows, what the student does not know and what misconceptions the student holds with respect to the purpose of the assessment.

Recommendations for developing performance assessments:

1. The selected performance should reflect a valued activity. According to Wiggins (1990), "The best tests always teach students and teachers alike the kind of work that most matters; they are enabling and forward-looking, not just reflective of prior teaching." He suggests the use of tasks that resemble the type of activities that are known to take place in the workforce (e.g., project reports and presentations, writing legal briefs, collecting, analyzing and using data to make and justify decisions). In other words, performance assessments allow students the opportunity to display their skills and knowledge in response to "real" situations (Airasian, 2000; 2001; Wiggins, 1993). 2. The completion of performance assessments should provide a valuable learning experience. Performance assessments require more time to administer than do other forms of assessment. The investment of this classroom time should result in a higher payoff. This payoff should include both an increase in the teacher's understanding of what students know and can do and an increase in the students' knowledge of the intended content and constructs. 3. The statement of goals and objectives should be clearly aligned with the measurable outcomes of the performance activity. Once the task has been selected, a list can be made of how the elements of the task map into the desired

goals and objectives. If it is not apparent as to how the students' performance will be mapped into the desired goals and objectives, then adjustments may need to be made to the task or a new task may need to be selected. 4. The task should not examine extraneous or unintended variables. Examine the task and think about whether there are elements of the task that do not map directly into the goals and objectives. Is knowledge required in the completion of the task that is inconsistent with the purpose? Will lack of this knowledge interfere or prevent the students from completing the task for reasons that are not consistent with the task's purpose? If such factors exist, changes may need to be made to the task or a new task may need to be selected. 5. Performance assessments should be fair and free from bias. The phrasing of the task should be carefully constructed in a manner that eliminates gender and ethnic stereotypes. Additionally, the task should not give an unfair advantage to a particular subset of students. For example, a task that is heavily weighted with baseball statistics may give an unfair advantage to the students that are baseball enthusiasts. The recommendations provided above have been drawn from the broader literary base concerning the construction of performance assessments. The interested reader can acquire further details concerning the development process by consulting other articles that are available through this journal (i.e., Brualdi, 1998; Roeber, 1996; Wiggins, 1990) or books (e.g., Wiggins, 1993; 1998) that address this subject. Developing Scoring Rubrics Scoring rubrics are one method that may be used to evaluate students' responses to performance assessments. Two types of performance assessments are frequently discussed in the literature: analytic and holistic. Analytic scoring rubrics divide a performance into separate facets and each facet is evaluated using a separate scale. Holistic scoring rubrics use a single scale to evaluate the larger process. In holistic scoring rubrics, all of the facets that make-up the task are evaluated in combination. The recommendations that follow are appropriate to both analytic and holistic scoring rubrics.

Recommendations for developing scoring rubrics:

1. The criteria set forth within a scoring rubric should be clearly aligned with the requirements of the task and the stated goals and objectives. As was discussed earlier, a list can be compiled that describes how the elements of the task map into the goals and objectives. This list can be extended to include how the criteria that is set forth in the scoring rubric maps into both the elements of

the task and the goals and objectives. Criteria that cannot be mapped directly back to both the task and the purpose should not be included in the scoring rubric. 2. The criteria set forth in scoring rubrics should be expressed in terms of observable behaviors or product characteristics. A teacher cannot evaluate an internal process unless this process is displayed in an external manner. For example, a teacher cannot look into students' heads and see their reasoning process. Instead, examining reasoning requires that the students explain their reasoning in written or oral form. The scoring criteria should be focused upon evaluating the written or oral display of the reasoning process. 3. Scoring rubrics should be written in specific and clear language that the students understand. One benefit of using scoring rubrics is that they provide students with clear description of what is expected before they complete the assessment activity. If the language employed in a scoring rubric is too complex for the given students, this benefit is lost. Students should be able to understand the scoring criteria. 4. The number of points that are used in the scoring rubric should make sense. The points that are assigned to either an analytic or holistic scoring rubric should clearly reflect the value of the activity. On an analytic scoring rubric, if different facets are weighted differently than other facets of the rubric, there should be a clear reason for these differences. 5. The separation between score levels should be clear. The scale used for a scoring rubric should reflect clear differences between the achievement levels. A scale that requires fine distinctions is likely to result in inconsistent scoring. A scoring rubric that has fewer categories and clear distinctions between these categories is preferable over a scoring rubric that has many categories and unclear distinctions between the categories. 6. The statement of the criteria should be fair and free from bias. As was the case with the statement of the performance activity, the phrasing used in the description of the performance criteria should be carefully constructed in a manner that eliminates gender and ethnic stereotypes. Additionally, the criteria should not give an unfair advantage to a particular subset of students that is unrelated to the purpose of the task. Greater detail concerning the development of scoring rubrics, both analytic and holistic, is immediately available through this journal. Mertler (2001) and Moskal (2000b) have both described the differences between analytic and holistic scoring rubrics and how to develop each type of rubric. Books have also been written or compiled (e.g., Arter & McTighe, 2001; Boston, 2002) that provide

detailed examinations of the rubric development process and the different types of scoring rubrics. Administering Performance Assessments Once a performance assessment and its accompanying scoring rubric are developed, it is time to administer the assessment to students. The recommendations that follow are specifically developed to guide the administration process.

Recommendations for administering performance assessments:

1. Both written and oral explanations of tasks should be clear and concise and presented in language that the students understand. If the task is presented in written form, then the reading level of the students should be given careful consideration. Students should be given the opportunity to ask clarification questions before completing the task. 2. Appropriate tools need to be available to support the completion of the assessment activity. Depending on the activity, students may need access to library resources, computer programs, laboratories, calculators, or other tools. Before the task is administered, the teacher should determine what tools will be needed and ensure that these tools are available during the task administration. 3. Scoring rubrics should be discussed with the students before they complete the assessment activity. This allows the students to adjust their efforts in a manner that maximizes their performance. Teachers are often concerned that by giving the students the criteria in advance, all of the students will perform at the top level. In practice, this rarely (if ever) occurs. The first two recommendations provided above are appropriate well beyond the use of performance assessments and scoring rubrics. These recommendations are consistent with the Standards of the American Educational Research Association, American Psychological Association & National Council on Measurement in Education (1999) with respect to assessment and evaluation. The final recommendation is consistent with prior articles that concern the development of scoring rubrics (Brualdi, 1998; Moskal & Leydens, 2000). Scoring, Interpreting and Using Results As was discussed earlier, a scoring rubric may be used to score student responses to performance assessments. This section provides recommendations

for scoring, interpreting and using the results of performance assessments.

Recommendations for scoring, interpreting and using results of performance assessments:

1. Two independent raters should be able to acquire consistent scores using the categories described in the scoring rubric. If the categories of the scoring rubric are written clearly and concisely, then two raters should be able to score the same set of papers and acquire similar results. 2. A given rater should be able to acquire consistent scores across time using the scoring rubric. Knowledge of who a student is or the mood of a rater on a given day may impact the scoring process. Raters should frequently refer to the scoring rubric to ensure that they are not informally changing the criteria over time. 3. A set of anchor papers should be used to assist raters in the scoring process. Anchor papers are student papers that have been selected as examples of performances at the different levels of the scoring rubric. These papers provide a comparison set for raters as they score the student responses. Raters should frequently refer to these papers to ensure the consistency of scoring over time. 4. A set of anchor papers with students' names removed can be used to

illustrate to both students and parents the different levels of the scoring rubric. Ambiguities within the rubric can often be clarified through the use of
examples. Anchor papers with students names removed can be used to clarify to both students and parents the expectations set forth through the scoring rubric. 5. The connection between the score or grade and the scoring rubric should be immediately apparent. If an analytic rubric is used, then the report should contain the scores for each analytic level. If a summary score or grade is provided, than an explanation should be included as to how the summary score or grade was determined. Both students and parents should be able to understand how the final grade or score is linked to the scoring criteria. 6. The results of the performance assessment should be used to improve instruction and the assessment process. What did the teacher learn from the student responses? How can this be used to improve future classroom instruction? What did the teacher learn about the performance assessment or the scoring rubric? How can these instruments be improved for future instruction? The information that is acquired through classroom assessment should be actively used to improve future instruction and assessment.

The first three recommendations concern the important concept of "rater reliability" or the consistency between scores. Moskal and Leydens (2000) examine the concept of rater reliability in an article that was previously published in this journal. A more comprehensive source that addresses both validity and reliability of scoring rubrics is a book by Arter and McTighe (2001), Scoring Rubrics in the Classroom: Using Performance Criteria for Assessing and Improving Student Performance. The American Educational Research Association, American Psychological Association and National Council of Measurement in Education (1999) also address these issues in their Standards document. For information concerning methods for converting rubric scores to grades, see "Converting Rubric Scores to Letter Grades" (Northwest Regional Educational Laboratory, 2001). Conclusions The purpose of this article is to provide a set of recommendations for the development of performance assessments and scoring rubrics. These recommendations can be used to guide a teacher through the four phases of classroom assessment, planning, gathering, interpreting and using. Extensive literature is available on each phase of the assessment process and this article addresses only a small sample of that work. The reader is encouraged to use the previously cited work as a starting place to better understand the use of performance assessments and scoring rubrics in the classroom. Additionally, books by Airasian (2000; 2001), Oosterhof (1999), Rudner and Schafer (2002), and Stiggins (1994) provide a more detailed look at the broader classroom assessment process. Acknowledgments This article was originally developed as part of a National Science Foundation (NSF) grant (EEC 0230702), Engineering Our World. The opinions and ideas expressed in this article are that of the author and not of the NSF. References Airasian, P.W. (2000). Assessment in the Classroom: A Concise Approach (2nd ed.). Boston: McGraw-Hill. Airasian, P.W. (2001). Classroom Assessment: Concepts and Applications (4th ed.). Boston: McGraw Hill. American Educational Research Association, American Psychological Association & National Council on Measurement in Education (1999). Standards for Educational and Psychological Testing. Washington, DC:

American Educational Research Association. Arter, J. & McTighe, J. (2001). Scoring Rubrics in the Classroom: Using

Performance Criteria for Assessing and Improving Student Performance. Thousand Oaks, California:Corwin Press Inc.

Boston, C. (Eds.) (2002). Understanding Scoring Rubrics. University of Maryland, MD: ERIC Clearinghouse on Assessment and Evaluation. Brualdi, A. (1998). "Implementing performance assessment in the classroom." Practical Assessment, Research & Evaluation, 6(2) [Online]. Available: Gronlund, N.E. (2000). How to Write and Use Instructional Objectives (6th ed.). Upper Saddle River, NJ: Prentice Hall. Frary, R. B. (1995). "More multiple-choice item writing do's and don'ts." Practical Assessment, Research & Evaluation," 4(11) [Online]. Available: Kehoe, J. (1995a). "Writing multiple-choice test items." Practical Assessment, Research & Evaluation, 4(9) [Online]. Available: Kehoe, J. (1995b). "Basic item analysis for multiple-choice tests." Practical Assessment, Research & Evaluation, 4(10). Available online: Mertler, C. A. (2001). "Designing scoring rubrics for your classroom." Practical Assessment, Research & Evaluation, 7(25). Available online: Moskal, B. (2000a). "An assessment model for the mathematics classroom." Mathematics Teaching in the Middle School, 6 (3), 192-194. Moskal, B. (2000b). "Scoring rubrics: What, when and how?" Practical Assessment, Research & Evaluation, 7(3) [Online]. Available: Moskal, B. & Leydens, J. (2000). "Scoring rubric development: Validity and reliability." Practical Assessment, Research & Evaluation, 7(10). Available online: Northwest Regional Educational Laboratory (2002). "Converting rubric scores to letter grades." In C. Boston's (Eds.), Understanding Scoring Rubrics (pp. 34-

40). University of Maryland, MD: Clearing House on Assessment and Evaluation. Oosterhof, A. (1999). Developing and Using Classroom Assessments (2nd ed.). Upper Saddle River, NJ: Prentice Hall. Perlman, C. (2002). "An introduction to performance assessment scoring rubrics". In C. Boston's (Eds.), Understanding Scoring Rubrics (pp. 513). University of Maryland, MD: ERIC Clearinghouse on Assessment and Evaluation. Rogers, G. & Sando, J. (1996). Stepping Ahead: An Assessment Plan Development Guide. Terra Haute, Indiana: Rose-Hulman Institute of Technology. Roeber, E.D. (1996). "Guidelines for the development and management of performance assessments." Practical Assessment, Research & Evaluation, 5(7). Available online: Rudner, L.M. & Schafer, W.D. (Eds.) (2002). What Teachers Need to Know about Assessment. Washington, DC: National Education Association. Stiggins, R. (1994). Student-Centered Classroom Assessment. New York: Macmillan Publishing Company. Wiggins, G. (1998). Educative Assessment: Designing Assessments to Inform and Improve Student Performance. San Francisco: Jossey-Bass Publishers. Wiggins, G. (1993). Assessing Student Performances. San Francisco: JosseyBass Publishers. Wiggins, G. (1990). "The case for authentic assessment." Practical Assessment, Research & Evaluation, 2(2). Available online:

Descriptors: *Rubrics; Scoring; *Student Evaluation; *Test Construction; *Evaluation Methods; Grades; Grading; *Scoring