Anda di halaman 1dari 5

Terms Used in Educational Assessment

Ability and Aptitude. The terms ability and aptitude are closely related and often difficult to
distinguish from each other. Ability, the mental or physical capacity to perform at a given level,
is considered to be innate, therefore determined genetically. According to psychological theory,
it may be described as possession of one or more of the multiple areas of intelligence that have
been described by various theories and models. Aptitude may be described as the proclivity to
excel in the performance of specific tasks (as in, "she has a real aptitude for drawing").
Accountability in assessment refers to holding individuals or institutions responsible for the
outcomes of instruction. For example, you might hear or read that "students are accountable for
their school successes and/or failures," that "teachers (or parents) are accountable for the
performance of their students (or children)," or that "school principals are accountable for the
achievement of their schools."
Achievement is a measure of the quality and or the quantity of the success one has in the
mastery of knowledge, skills, or understandings. References to academic achievement, for
example, usually involve performance in such areas as reading, mathematics, science, or social
studies.
Achievement test batteries. Many schools test students using an array of subtests, in a number
of academic content areas and at a variety of grade levels under a single overall test name. For
example, a particular "Test of Basic Skills" might involve subtests of mathematical skills,
language skills, and vocabulary.
Assessment involves the process of "taking stock" of, or understanding, an individual's
characteristics, status, or performance, and typically involves considering and interpreting
information from several sources of data. It might involve, for example, observations, interviews,
or other kinds of information. (Compare with evaluation and measurement.)
Authentic assessment refers to the evaluation of students' work on activities that students
engage in that approximate realistic or real-life tasks and performances, rather than answering
traditional paper-and-pencil tests. Authentic tasks typically require complex work, problem
solving, and integration of a variety of knowledge and skills brought to bear on a realistic task or
challenge. For example, students might use grocery store ads, a shopping list, and a budget to
spend as a realistic alternative to completing a group of arithmetic "column addition" exercises
on a worksheet.
Competency-based assessment. This phrase indicates that students will be evaluated against
some specific learning, behavior, or performance objective. This objective, and/or the level of
performance that represents "competency" is clearly established in the curriculum and represents
an expected level of expertise or mastery of skills or knowledge.
Criterion-referenced testing refers to evaluating students against an absolute standard of
achievement, rather than evaluating them in comparison with the performance of other students.

A standard of performance is set to represent a level of expertise or mastery of skills or


knowledge.
Derived scores or standard scores transform raw scores (the actual number of correct
responses) into values that allow us to compare one student's performance in relation to the
performance of others of the same age or grade, or to the highest possible score on a test.
Common standard scores are z-scores, T-scores, percentiles, and stanines. Derived or standard
scores are all computed by determining how far above or below the mean of all scores a student
scores, and then representing the results using a standard scale. [Editor's note: The article by
Gregorgy Machek and Jonathan Plucker in the December, 2003 issue of PHP included a chart
illustrating many common derived or standard scores.]
Evaluation represents a judgment or determination of value (e.g., effective or ineffective, or
below, at, or above grade level) is placed on some performance.
Formative evaluation refers to any form of assessment, such as quizzes, tests, essays, projects,
interviews, or presentations, in which the goal is to give students feedback about their work
while it is in progress, to help students correct errors or missteps, or to improve the work along
the way to the final product. In contrast, summative evaluation is to make a judgment about a
final product or about the quality of performance at the end of an instructional unit or course.
Grade equivalent score. A grade equivalent score describes a student's performance on that test
in relation to a grade level and number of months during the year of that grade. (A score of 8.2,
for example, tells you that your child obtained the same score on a test that an average student in
the second month of the eighth grade would obtain.) Of course, if your child is in the fifth grade,
that's very good, but if your child is in the tenth grade, that's not so good!
High-stakes testing typically refers to major state or national standardized school achievement
tests administered periodically to students at various grade levels. The phrase "high stakes" is
used to signify that these test results carry a great deal of weight among school personnel,
government agencies, politicians, community leaders, and the general public. These test results
often are used to make important decisions about students, teachers, and their schools, such as
graduation, grade promotions or retentions, selection for highly competitive programs or schools,
or staffing and budget decisions.
Intelligence. Over many years, the concept of intelligence has had many definitions. Intelligence
has been defined, to cite several examples, as the ability to think conceptually, to solve problems,
to manipulate one's environment, or to develop expertise. Some theorists have proposed that
intelligence is mostly innate, inherited, or biologically-based, and others have argued equally
strongly that intelligence is influenced by one's environment. Issues regarding the nature and
breadth of intelligence continue to be topics of lively discussion among theorists and researchers
in several fields of study (including educational psychology, cognitive psychology, and
sociology, for example).
Learning objective. A learning objective is a specific statement that describes what the student
is to learn, understand, or to be able to do as a result of a lesson or a series of lessons.

Learning outcome. A learning outcome represents what the student actually achieved as a result
of a lesson or a series of lessons. The success of lessons may be influenced by the students' prior
knowledge, their effort and attention, teaching methods, resources, and time. Learning outcomes
refer to the results of instruction, while learning objectives refer to the intended goals and
purposes of lessons.
Measurement is simply the process of assigning a number, or a score if you will, to some
performance or product. Examples would include grading a test or a homework assignment in
terms of number or percent of correct or incorrect responses.
Measures of central tendency are quantitative (numerical) ways to describe the middle of a
distribution of scores. Since most individuals in a given population tend to exhibit middle levels
of competence or presence of a characteristic, most people tend to earn scores that are near the
central portion of the normal curve (see definition, below). There are three common measures of
central tendency: mean, median, and mode. The mean refers to a numerical average of the
scores. It is obtained by adding all the scores and dividing their sum by the number of scores
(e.g. scores of 100,90,80,80 and 70 result in a mean of 84). The median is simply the middle
score when all scores are placed in ranked order. The median in our example would be 80
because it is the third score counted in from either direction. The mode is the most often
occurring score. In our example, the mode is 80 since it occurs more often than any other score.
Minimum competency is a judgment of the lowest level of skill or knowledge a student must
have attained to be considered "competent" in that area. Minimum competency tests are often the
focus of broad national educational efforts to improve education. It is important to note,
especially for high-ability students, that minimum competencies do not represent an adequate
standard or expectation of performance, nor do they imply proficiency in, or mastery of, the
content or skill being tested.
Normal curve ("bell curve). The normal or "bell" curve is a common way of representing the
distribution of scores for a particular competence or characteristic in a large population. Since
most individuals of any population would exhibit "average" competence or presence of a
characteristic, their scores appear in the middle area around the crest of the curve. Those who
exhibit exceptionally high or low competence or very great or very small presence of a
characteristic appear at either ends of the curve's shape. [Editor's Note: The second part of this
series, in the December, 2003 issue of PHP, also included a diagram of the normal curve.]
Norm-referenced testing (or norm-referenced assessment) refers to testing in which individuals'
results are compared to some larger group (such as a national or statewide sample of students).
Usually, "norm" or "normal" groups are those in which the students' scores are distributed in a
"normal" (or "bell-shaped") pattern. In these cases, an individual's performance is assessed in
relation to where his or her score would fall under the normal curve.
Objective test items require the student to select a specific response to a question that can be
graded as either correct or incorrect. They are easy to administer and score (and can often be
machine-scored). Common examples of objective test items include: true-false, multiple-choice,
and matching questions.

Online assessment is an assessment that is accessed on a computer via the Internet or a similar
computer network. The assessment or test is read online and the responses are given online by
selecting or checking a choice by clicking the mouse, typing a response, or perhaps even
touching the computer screen with a special "pen" or speaking a response aloud using voice
recognition technology. Online assessment may also be a vehicle for submitting a portfolio of
student performances or completed assignments for the teacher to evaluate.
Percentile ranks refer to an individual's standing in relation to the rest of the individuals in the
norm or comparison group (i.e., others who are taking the same test). If your child receives a
percentile rank of 90, it means that your child achieved a score equal to or better than 90 percent
of the rest of the group with whom he or she is being compared.
Performance assessment refers to a system of evaluating individuals' abilities or achievements
based on actual work or behavior. Performance assessment focuses on the student's ability to
apply what he or she has learned to a realistic task- a problem or situation that might be
encountered in real life.
Portfolios are collections of an individual's work. Some educators regard portfolio assessment as
a better method of observing and evaluating what learners truly know, understand, and can do
than are tests and homework exercises, for example. In typical classrooms that employ
portfolios, students keep their work (quizzes, test papers, creative writing, homework, book
reports, project reports, art projects, etc.) in large folders, boxes, electronic files, or other storage
containers. They may keep all their work or, as is more typical and recommended as best
practice, students (on their own or with their teachers' guidance) periodically select samples of
their work to illustrate their best performances across a variety of activities. Students and
teachers also may keep work samples of various degrees of achievement to illustrate growth in
ability over time or to help identify and illustrate particular weaknesses or disabilities that require
additional attention.
Power tests typically have no time limits or very generous time limits so that the individual has
sufficient time to answer all questions. On a power test, the goal is to measure as much as the
individual can do without the pressure of time limits. (Compare with "speed tests.")
Profile. A student profile is often used to describe a student's characteristics and learning needs,
to help guide important educational decisions for a particular individual, or to guide
individualized instructional planning. It may contain many different kinds of data (including test
scores, observations, anecdotal records, samples of student work, or comments from cumulative
records) that describe the student, the circumstances that prompted creating the profile, questions
or problems requiring resolution, and suggestions for making desired decisions.
Range. The range of scores is the difference between the highest and lowest recorded scores. If
the lowest score is 28 and the highest is 98, then the range is 70.
Reliability refers to the degree of consistency or dependability of a test. A reliable test will
produce similar scores and distributions whenever it is given to similar populations. Thus, if a
student scored a 90 on an achievement test today, then, if the test is reliable, the student's score

would not differ substantially if the test were taken again another day. Reliability may also mean
that a student would earn similar scores on two different forms of a test, if tested at about the
same time.
Rubric. A rubric is a chart or plan that identifies criteria for evaluating a piece of a student's
work, be it an essay test, a paper, or some other student production. The rubric offers a
description of the qualities or characteristics of performance for several levels (such as:
beginning, intermediate, or advanced, or needs improvement, adequate, or outstanding) that the
teacher or other evaluator may assign. The best rubrics offer the clearest details for each category
of evaluation so that a student's products can be evaluated consistently. Rubrics may be
"analytic" and "holistic." An analytic rubric specifies all the components of a perfect response
and point values are assigned to each component. While holistic scoring also identifies a model
or perfect answer, point values. are not assigned. Thus, holistic or global scoring is more
subjective and may be less reliable than analytic scoring.
Speed tests are tests with specific time limits. Such a test rewards individuals who can work fast
to answer the test items. Students with disabilities may be exempt from time limits set for speed
tests. (Compare with power tests.)
Standardized tests are instruments that are administered, scored, and interpreted in the same,
pre-specified way by all users. There are detailed instructions or rules for how a test is
administered and scored. (One example of a well-known standardized test is the Scholastic
Aptitude Test or SAT.)
Standards-based. To put "standards-based" in front of such terms as instruction, assessment,
testing, measurement, evaluation and other terms typically means that whatever teachers teach
and students do in class is evaluated against specifically written and adopted standards, or goals
and objectives, of achievement, usually written and adopted at the state or national level.
Subjective tests refers to the approach used to evaluate or score the students response to a
writing prompt, an open-ended task or question, or a "free," unstructured response to a shortanswer or essay question. Unlike objective tests, in which the correct or incorrect answer
selection is easily and quickly obtained, subjective assessments present a more difficult
challenge to score and require considerably more time to read and to analyze carefully and
equitably.
Validity is a term that describes how well a test, or a test item, measures what it claims to
measure, accurately predicts a behavior, or accurately contributes to decision making about the
presence or absence of a characteristic.
Portion from: http://www.davidsongifted.org/db/Articles_id_10461.aspx

Anda mungkin juga menyukai