Lomialagi C. Tauiliili
that measures an individual’s knowledge of a skill that has been taught within the classroom. The
process monitors the students' progress in learning. It gathers student’s knowledge and
information for the teacher to make inferences on. Explicitly, assessment plays a crucial role in a
student’s learning by measuring how they learn, their motivation to learn, and how teachers
instruct them. In more details, assessing how the student’s learn refers to student’s proficiency in
mastering the instructional objective. Secondly, measuring the student’s motivation to learn
examines the student’s interests, values, and attitudes towards a given task. Lastly, an assessment
implies on how well the teacher delivered the instructional objective and procedures to the class.
In other words, the student’s proficiency in the assessment task reflects the teacher’s
performance. However, the two most critical characteristics of a well-prepared assessment task
Reliability and validity of assessment highlight important evidence for teachers to make
inferences about students' knowledge, skills, or attitudes towards learning. The two terms may
sound the same but determine different meanings. Reliability of assessment stresses on the
consistency of an assessment. It implies the quality to which the grading scale produces
consistent results when measurements are made repeatedly. For instance, the result of the
student’s first test should correlate to his/her second test. On the other hand, validity is the
accuracy of the student’s results. It emphasizes the process in which the assessment measures
what is intended to measure. For example, the students were assessed on their comprehension
skills because the curricular aim and objective is about the comprehension skills. Therefore, all
assessments need to be reliable and valid. Reliability and validity are two most significant
RELIABILITY AND VALIDITY 3
concepts of assessment because reliability refers to the consistency of results, and validity
Reliability
To begin with, reliability of assessment refers to consistency with which the assessment
measures what is to be measured. Consistency of assessment is when the students are assessed on
what they were being taught in the classroom. For instance, the class was learning about animal
cells through presentations, and later had a presentation exam about animal cells. The assessment
given conducts consistency of results because it demonstrates the alignment of the objective to
the assessment. Therefore, if the assessment is reliable, the students' results should be reliable as
well. According to Popham (2008), “When you encounter the term reliability in any assessment
context, you should draw a mental equal sign between reliability and consistency, because
reliability refers to the consistency with which a test measures whatever it’s measuring” (p. 28).
Reliability of assessment consisted of three types of evidence. The three types of reliability
Stability (Test-Retest)
Stability reliability is a measure of consistency acquired by distributing the same test to a group
of students on different occasions. The assessment is given to the students when they are ready.
Also, the assessment is given twice to the students. The two assessments should obtain a
relationship that indicates the stability reliability of assessment. As John (2015) stated,
reliability with a minimal error while a low correlation coefficient indicates low reliability with
maximal error” (p. 70). To elaborate on this point, the second test taken by an individual should
RELIABILITY AND VALIDITY 4
contribute the same result from the first test. Otherwise, the students will be confronted with the
assessment, students remembering test answers, teacher re-teaching on the test, and other
students may be absent on the day of re-test. According to Popham (2008), “Whether you use a
consistency over time, it is apparent you’ll need to test students twice in order to determine the
test stability” (p. 32). Therefore, stability-reliability focuses on the consistency of results when a
Alternate Form
Also, the alternate form is considered as the second type of reliability evidence. Alternate
reliability deals with the question of whether two or more allegedly equivalent test forms are, in
fact, equivalent” (p. 33). Explicitly, the assessment tools are distributed in two different forms
but measure the same objective or challenges. For instance, half of the class were tested on form
A of the critical thinking assessment and the other half was tested on form B. After being
assessed on the first testing, the half who took form A will take form B and the half who took
form B will be tested on form A. Then two forms of assessment tools are compared to check if
both the forms of measurement is consistent. According to Alonzo (2012), “A key feature of
assessments designed for progress monitoring is that alternate forms must be as equivalent as
possible to allow meaningful interpretation of student performance data across time” (p. 3). In
more details, a high coefficient of equivalence presents that both the assessment tools measure
the same challenges. However, the low coefficient of equivalence demonstrates that the two
RELIABILITY AND VALIDITY 5
forms of assessment are not assessing the same goal. An assessment should be reliable to
Internal Consistency
The last type of reliability evidence is called internal consistency. The internal
consistency emphasizes evaluating if all items of the test are aligned to the objective. This form
of evidence is quite different from the stability and alternate-form reliability. Internal consistency
does not focus on the students’ scores but focuses on items of the test if it is consisted with what
has been taught in the class. As Henson (2001) stated, “Internal consistency is the extent to
which a group of items measure the same construct, as evidenced by how well they vary
together, or interrelate” (as cited in Paulsen and BrckaLorenz (2017), par. 1). To elaborate more,
the items on the test are written accordingly the student’s behavior toward school. For example,
if the students are excelling in working on problem-solving, then the test items should correlate
to the students’ capabilities. Effectively, internal consistency measures the reliability of the
assessment tool by examining the items students are assessed on. Explicitly, each evidence about
assessment reliability are not interchangeable. Stability reliability measures on giving the same
test to the same group of students at different times. Alternate form reliability is the
comparability of two different forms that assess the same challenges and is given to the same
group of students. Lastly, internal consistency measures on the reliability of the assessment
through discovering the type of items given to the same group of students.
Validity
the performance standards in which the teacher measures the student’s proficiency towards the
RELIABILITY AND VALIDITY 6
curricular aim, or how well the student mastered the content standard. According to Charles
Assessment validation is a quality review process to check that the assessment tools
produced valid, reliable, sufficient, current and authentic evidence for assessors to make
reasonable judgements as to whether the requirements of the units and training product
have been met and that assessment judgments are consistently applied. (p. 3)
Explicitly, the validity component of assessment focuses on the teacher’s valid interpretation
about the students, in which it helps them make strong instructional resolutions. For instance,
have three types of validity evidence that can reinforce a dispute. The three types of validity
Content Related
The content-related evidence of validity stresses that the assessment of content represents
the content of the educational standards. If the examination provides less sufficient items related
to the curricular aim, then the examination is less on content-related evidence of validity. For
example, the student’s examination for algebra enhances the full range of content regarding the
curricular aim. In this case, two approaches that gathers the content-related evidence of validity.
As Yaghmaie (2003) stated, “For content validity two judgment are necessary: the measurable
extent of each item for defining the traits and the set of items that represents all aspects of the
traits” (p. 25). The first approach is called developmental care. It assures that the test items
present appropriate skills and knowledge. For example, if a teacher wants to develop a content-
related validity assessment, she needs to make sure that the curricular aim is aligned with the
RELIABILITY AND VALIDITY 7
items on the test. If students had learned about problem-solving, then the test items should be
about problem-solving. Also, if the teacher wants to reaffirm that the assessment is content-
related, the assessment should be taken to the second form of content-related evidence. External
reviews are people from outside of the classroom who rates the appropriateness of the test items.
Such people can include other teachers, administrators, staff, parents, and students. These people
will be able to review the test by using a form of guideline from the teacher to make suggestions
for the test. The content-related evidence of validity ensures that the educational assessment is
Criterion Related
predict the student’s outcomes. An example of the criterion-evidence of validity specifies using a
standard-based test to predict the student’s performance at a higher educational level. According
procedures, should always be used to make better education decisions” (p. 62). Explicitly, the
relationship between the predictor test and the criterion helps make inferences on improving the
teaching skill or the learning skill. If the students failed the assessment because of motivation,
then the teacher should help promote activities that can boost the student’s motivation before
taking the test. The criterion-related evidence is also known as the predictive validity because it
predicts the student’s criterion such the scores, grades, GPA, and other future references.
Construct Related
evidence to identify if the measurements of the hypothetical construct are accurate. The evidence
RELIABILITY AND VALIDITY 8
provides feedback on how the collection of a student’s assessment proposes if the test measures
for educational tests is typically gathered by way of a series of studies, not one whooper “it-
evidence is intervention studies. Intervention studies are predictions that students will have
different response to the assessment after delivering a lesson. An example of this approach is
having students take a pretest, then the teacher teaches based on the student’s results, and
students take a post-test. The second component is known as the different population studies.
Different population studies refer to the different population that concludes different scores. For
instance, the class will conduct different results if they were to have an oral presentation test in
English because many of the students are English language learners. Therefore, the construct-
related evidence supports the fact that students’ scores differently because of their status. Lastly,
related-measure studies are the relationship of students scores on two different forms of
assessment that measures the same goal. This investigation hypothesizes that if an individual
mastered a reading exam in class on a particular skill, the same individual’s results should
correlate to his/her score for the same skill on a different assessment tool. Therefore, all three
validity.
Conclusion
All educational assessments should be reliable and valid. The two concepts are important
Popham (2008), “If convincing evidence is gathered that a test is permitting valid scored-based
inferences, we can be assured the test is also yielding reasonably reliable scores” (p. 66).
RELIABILITY AND VALIDITY 9
Explicitly, the reliability of assessment leads to the accuracy and precision of student’s
learning, the students’ results will bestow the same consistency. Reliability of assessment
appears in three types of evidence. The three types of reliability ensure that the curricular aim
and assessment must be aligned to have consisted of results from students’ scores. The validity
of assessment on the other hand, discovers on the accuracy of measurements between the
curricular aim and the assessment given to students. Validity of assessments consists of three
Moreover, an assessment can be reliable but not valid. However, an assessment cannot be
valid unless it is reliable. In more details, an assessment can provide a consistent result to make it
reliable, but it is not valid if the assessment is not measuring what is intended to measure.
According to John (2015), “If a measuring instrument measures what it purports to measure,
such instrument can be said to be valid because it has turned out to be a suitable measure” (p.
72). “For example, a broken clock in a room is reliable because the clock shows the same time
whenever the student goes to class. However, a working clock is reliable and valid because it
shows the logic and accurate time” (M. Langkilde, class discussion, August 2019). To elaborate
more, the broken clock is reliable because it displays the same time but it is not valid because it
does not give the exact time. Therefore, reliability and validity are crucial concepts in education
assessment because it helps teachers to make a generalized judgment about a student’s level of
References
Alonzo, J., Lai, C., Anderson, D., Park, B. J., & Tindal, G. (2012). An examination of test-retest,
alternate form reliability, and generalization theory study of the easyCBM passage
Charles Darwin University. (2019). VET assessment validation procedures. 3, 1-9. Retrieved
from https://www.cdu.edu.au/governance/doclibrary/pro-136.pdf
John, A. C. (2015). Reliability and validity: A sine qua non for fair assessment of undergraduate
https://files.eric.ed.gov/fulltext/EJ1086092.pdf
Retrieved from
http://fsse.indiana.edu/pdf/pp/2017/FSSE17_Internal_Consistency_Reliability.pdf
Popham, W. J. (2008). Classroom assessment: What Teachers Need to Know (5th ed.). Boston,
Yaghmaie, F. (2003). Content validity and its estimation. Journal of Medical Education, 3(1),
https://www.researchgate.net/publication/277034169_Content_validity_and_its_estimatin