Running Head: Reliability and Validity 1

Running head: RELIABILITY AND VALIDITY 1
Reliability and Validity
Lomialagi C. Tauiliili
American Samoa Community College

RELIABILITY AND VALIDITY 2
Reliability and Validity
An assessment emphasizes the evaluation of someone's learning ability. It is a process
that measures an individual’s knowledge of a skill that has been taught within the classroom. The
process monitors the students' progress in learning. It gathers student’s knowledge and
information for the teacher to make inferences on. Explicitly, assessment plays a crucial role in a
student’s learning by measuring how they learn, their motivation to learn, and how teachers
instruct them. In more details, assessing how the student’s learn refers to student’s proficiency in
mastering the instructional objective. Secondly, measuring the student’s motivation to learn
examines the student’s interests, values, and attitudes towards a given task. Lastly, an assessment
implies on how well the teacher delivered the instructional objective and procedures to the class.
In other words, the student’s proficiency in the assessment task reflects the teacher’s
performance. However, the two most critical characteristics of a well-prepared assessment task
are considered as the validity and reliability of assessment.
Reliability and validity of assessment highlight important evidence for teachers to make
inferences about students' knowledge, skills, or attitudes towards learning. The two terms may
sound the same but determine different meanings. Reliability of assessment stresses on the
consistency of an assessment. It implies the quality to which the grading scale produces
consistent results when measurements are made repeatedly. For instance, the result of the
student’s first test should correlate to his/her second test. On the other hand, validity is the
accuracy of the student’s results. It emphasizes the process in which the assessment measures
what is intended to measure. For example, the students were assessed on their comprehension
skills because the curricular aim and objective is about the comprehension skills. Therefore, all
assessments need to be reliable and valid. Reliability and validity are two most significant
concepts of assessment because reliability refers to the consistency of results, and validity
emphasizes the accuracy of results.
Reliability
To begin with, reliability of assessment refers to consistency with which the assessment
measures what is to be measured. Consistency of assessment is when the students are assessed on
what they were being taught in the classroom. For instance, the class was learning about animal
cells through presentations, and later had a presentation exam about animal cells. The assessment
given conducts consistency of results because it demonstrates the alignment of the objective to
the assessment. Therefore, if the assessment is reliable, the students' results should be reliable as
well. According to Popham (2008), “When you encounter the term reliability in any assessment
context, you should draw a mental equal sign between reliability and consistency, because
reliability refers to the consistency with which a test measures whatever it’s measuring” (p. 28).
Reliability of assessment consisted of three types of evidence. The three types of reliability
evidence include stability, alternate form, and internal consistency.
Stability (Test-Retest)
The first type of reliability evidence is acknowledged as stability reliability or test-retest.
Stability reliability is a measure of consistency acquired by distributing the same test to a group
of students on different occasions. The assessment is given to the students when they are ready.
Also, the assessment is given twice to the students. The two assessments should obtain a
relationship that indicates the stability reliability of assessment. As John (2015) stated,
“Reliability is usually expressed numerically as a coefficient. A high coefficient indicates high
reliability with a minimal error while a low correlation coefficient indicates low reliability with
maximal error” (p. 70). To elaborate on this point, the second test taken by an individual should
contribute the same result from the first test. Otherwise, the students will be confronted with the
variation of results in test-retest. The variations of results incorporate the improvement of
assessment, students remembering test answers, teacher re-teaching on the test, and other
students may be absent on the day of re-test. According to Popham (2008), “Whether you use a
correlational approach or a classification-consistency approach to the determination of a test’s
consistency over time, it is apparent you’ll need to test students twice in order to determine the
test stability” (p. 32). Therefore, stability-reliability focuses on the consistency of results when a
test is given to the same students at different times but twice.
Alternate Form
Also, the alternate form is considered as the second type of reliability evidence. Alternate
form reliability is a measure of consistency acquired by contributing different forms of
assessment to the same group of students. As Popham (2008) proposed, “Alternate-form
reliability deals with the question of whether two or more allegedly equivalent test forms are, in
fact, equivalent” (p. 33). Explicitly, the assessment tools are distributed in two different forms
but measure the same objective or challenges. For instance, half of the class were tested on form
A of the critical thinking assessment and the other half was tested on form B. After being
assessed on the first testing, the half who took form A will take form B and the half who took
form B will be tested on form A. Then two forms of assessment tools are compared to check if
both the forms of measurement is consistent. According to Alonzo (2012), “A key feature of
assessments designed for progress monitoring is that alternate forms must be as equivalent as
possible to allow meaningful interpretation of student performance data across time” (p. 3). In
more details, a high coefficient of equivalence presents that both the assessment tools measure
the same challenges. However, the low coefficient of equivalence demonstrates that the two
forms of assessment are not assessing the same goal. An assessment should be reliable to
measure all students learning abilities.
Internal Consistency
The last type of reliability evidence is called internal consistency. The internal
consistency emphasizes evaluating if all items of the test are aligned to the objective. This form
of evidence is quite different from the stability and alternate-form reliability. Internal consistency
does not focus on the students’ scores but focuses on items of the test if it is consisted with what
has been taught in the class. As Henson (2001) stated, “Internal consistency is the extent to
which a group of items measure the same construct, as evidenced by how well they vary
together, or interrelate” (as cited in Paulsen and BrckaLorenz (2017), par. 1). To elaborate more,
the items on the test are written accordingly the student’s behavior toward school. For example,
if the students are excelling in working on problem-solving, then the test items should correlate
to the students’ capabilities. Effectively, internal consistency measures the reliability of the
assessment tool by examining the items students are assessed on. Explicitly, each evidence about
assessment reliability are not interchangeable. Stability reliability measures on giving the same
test to the same group of students at different times. Alternate form reliability is the
comparability of two different forms that assess the same challenges and is given to the same
group of students. Lastly, internal consistency measures on the reliability of the assessment
through discovering the type of items given to the same group of students.
Validity
The validity of an assessment is the accuracy of results. Accuracy of validity is identified
as the most significant component of assessment. the validity of an assessment is conducted on
the performance standards in which the teacher measures the student’s proficiency towards the
curricular aim, or how well the student mastered the content standard. According to Charles
Darwin University (2019),
Assessment validation is a quality review process to check that the assessment tools
produced valid, reliable, sufficient, current and authentic evidence for assessors to make
reasonable judgements as to whether the requirements of the units and training product
have been met and that assessment judgments are consistently applied. (p. 3)
Explicitly, the validity component of assessment focuses on the teacher’s valid interpretation
about the students, in which it helps them make strong instructional resolutions. For instance,
according to a student assessment collection, it highlights the student’s valid estimation of
performance on an educational standard. Therefore, the validations of a classroom assessment
have three types of validity evidence that can reinforce a dispute. The three types of validity
evidence incorporate content related, criterion-related, and construct related.
Content Related
The content-related evidence of validity stresses that the assessment of content represents
the content of the educational standards. If the examination provides less sufficient items related
to the curricular aim, then the examination is less on content-related evidence of validity. For
example, the student’s examination for algebra enhances the full range of content regarding the
curricular aim. In this case, two approaches that gathers the content-related evidence of validity.
As Yaghmaie (2003) stated, “For content validity two judgment are necessary: the measurable
extent of each item for defining the traits and the set of items that represents all aspects of the
traits” (p. 25). The first approach is called developmental care. It assures that the test items
present appropriate skills and knowledge. For example, if a teacher wants to develop a content-
related validity assessment, she needs to make sure that the curricular aim is aligned with the
items on the test. If students had learned about problem-solving, then the test items should be
about problem-solving. Also, if the teacher wants to reaffirm that the assessment is content-
related, the assessment should be taken to the second form of content-related evidence. External
reviews are people from outside of the classroom who rates the appropriateness of the test items.
Such people can include other teachers, administrators, staff, parents, and students. These people
will be able to review the test by using a form of guideline from the teacher to make suggestions
for the test. The content-related evidence of validity ensures that the educational assessment is
aligned to the objective, standard, and skills taught to the students.
Criterion Related
Another form of validity evidence is recognized as the criterion-related evidence of
validity. The criterion-related evidence intensifies on using an educational assessment process to
predict the student’s outcomes. An example of the criterion-evidence of validity specifies using a
standard-based test to predict the student’s performance at a higher educational level. According
to Popham (2008), “Test results, on predictor tests as well as on educational assessment
procedures, should always be used to make better education decisions” (p. 62). Explicitly, the
relationship between the predictor test and the criterion helps make inferences on improving the
teaching skill or the learning skill. If the students failed the assessment because of motivation,
then the teacher should help promote activities that can boost the student’s motivation before
taking the test. The criterion-related evidence is also known as the predictive validity because it
predicts the student’s criterion such the scores, grades, GPA, and other future references.
Construct Related
Lastly, the construct-related evidence of validity expresses on gathering empirical
evidence to identify if the measurements of the hypothetical construct are accurate. The evidence
provides feedback on how the collection of a student’s assessment proposes if the test measures
what needs to be measured. As Popham (2008) proposed, “Construct-related evidence of validity
for educational tests is typically gathered by way of a series of studies, not one whooper “it-
settles-the-issue-once-and-for-all investigation” (p. 63). The first component of construct related
evidence is intervention studies. Intervention studies are predictions that students will have
different response to the assessment after delivering a lesson. An example of this approach is
having students take a pretest, then the teacher teaches based on the student’s results, and
students take a post-test. The second component is known as the different population studies.
Different population studies refer to the different population that concludes different scores. For
instance, the class will conduct different results if they were to have an oral presentation test in
English because many of the students are English language learners. Therefore, the construct-
related evidence supports the fact that students’ scores differently because of their status. Lastly,
related-measure studies are the relationship of students scores on two different forms of
assessment that measures the same goal. This investigation hypothesizes that if an individual
mastered a reading exam in class on a particular skill, the same individual’s results should
correlate to his/her score for the same skill on a different assessment tool. Therefore, all three
strategies gather information to design an accurate assessment or construct-related evidence of
validity.
Conclusion
All educational assessments should be reliable and valid. The two concepts are important
within educational assessment because it eliminates the bias of an assessment. According to
Popham (2008), “If convincing evidence is gathered that a test is permitting valid scored-based
inferences, we can be assured the test is also yielding reasonably reliable scores” (p. 66).
Explicitly, the reliability of assessment leads to the accuracy and precision of student’s
measurement. When an educational assessment layout a reliable measurement of student
learning, the students’ results will bestow the same consistency. Reliability of assessment
appears in three types of evidence. The three types of reliability ensure that the curricular aim
and assessment must be aligned to have consisted of results from students’ scores. The validity
of assessment on the other hand, discovers on the accuracy of measurements between the
curricular aim and the assessment given to students. Validity of assessments consists of three
most common types of evidence. Content-related, criterion-related, and construct –related
establish evidence to illustrate the accuracy of an assessment-based inference. Reliability and
validity work hand in hand to assemble coherent and accurate results.
Moreover, an assessment can be reliable but not valid. However, an assessment cannot be
valid unless it is reliable. In more details, an assessment can provide a consistent result to make it
reliable, but it is not valid if the assessment is not measuring what is intended to measure.
According to John (2015), “If a measuring instrument measures what it purports to measure,
such instrument can be said to be valid because it has turned out to be a suitable measure” (p.
72). “For example, a broken clock in a room is reliable because the clock shows the same time
whenever the student goes to class. However, a working clock is reliable and valid because it
shows the logic and accurate time” (M. Langkilde, class discussion, August 2019). To elaborate
more, the broken clock is reliable because it displays the same time but it is not valid because it
does not give the exact time. Therefore, reliability and validity are crucial concepts in education
assessment because it helps teachers to make a generalized judgment about a student’s level of
learning and to eliminate biased and unfair penalization.

References
Alonzo, J., Lai, C., Anderson, D., Park, B. J., & Tindal, G. (2012). An examination of test-retest,
alternate form reliability, and generalization theory study of the easyCBM passage
reading fluency assessments: Behavioral Research and Teaching. University of Oregon.
Retrieved from https://dibels.uoregon.edu/docs/techreports/TechRpt1219.pdf
Charles Darwin University. (2019). VET assessment validation procedures. 3, 1-9. Retrieved
from https://www.cdu.edu.au/governance/doclibrary/pro-136.pdf
John, A. C. (2015). Reliability and validity: A sine qua non for fair assessment of undergraduate
technical and vocational education projects in Nigerian Universities. Journal of
Education and Practice, 6(34), 68-75. Retrieved from
https://files.eric.ed.gov/fulltext/EJ1086092.pdf
Paulsen, J. & BrckaLorenz, A. (2017). Internal consistency. FSSE Psychometric Portfolio.
Retrieved from
http://fsse.indiana.edu/pdf/pp/2017/FSSE17_Internal_Consistency_Reliability.pdf
Popham, W. J. (2008). Classroom assessment: What Teachers Need to Know (5th ed.). Boston,
MA: Pearson Education, Inc.
Yaghmaie, F. (2003). Content validity and its estimation. Journal of Medical Education, 3(1),
25-27. Retrieved from
https://www.researchgate.net/publication/277034169_Content_validity_and_its_estimatin

Running Head: Reliability and Validity 1

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Running Head: Reliability and Validity 1

Diunggah oleh

Hak Cipta:

Format Tersedia

Running head: RELIABILITY AND VALIDITY 1

Reliability and Validity

American Samoa Community College

Reliability and Validity

An assessment emphasizes the evaluation of someone's learning ability. It is a process

are considered as the validity and reliability of assessment.

emphasizes the accuracy of results.

evidence include stability, alternate form, and internal consistency.

The first type of reliability evidence is acknowledged as stability reliability or test-retest.

“Reliability is usually expressed numerically as a coefficient. A high coefficient indicates high

variation of results in test-retest. The variations of results incorporate the improvement of

correlational approach or a classification-consistency approach to the determination of a test’s

test is given to the same students at different times but twice.

form reliability is a measure of consistency acquired by contributing different forms of

assessment to the same group of students. As Popham (2008) proposed, “Alternate-form

measure all students learning abilities.

The validity of an assessment is the accuracy of results. Accuracy of validity is identified

as the most significant component of assessment. the validity of an assessment is conducted on

Darwin University (2019),

according to a student assessment collection, it highlights the student’s valid estimation of

performance on an educational standard. Therefore, the validations of a classroom assessment

evidence incorporate content related, criterion-related, and construct related.

aligned to the objective, standard, and skills taught to the students.

Another form of validity evidence is recognized as the criterion-related evidence of

validity. The criterion-related evidence intensifies on using an educational assessment process to

to Popham (2008), “Test results, on predictor tests as well as on educational assessment

Lastly, the construct-related evidence of validity expresses on gathering empirical

what needs to be measured. As Popham (2008) proposed, “Construct-related evidence of validity

settles-the-issue-once-and-for-all investigation” (p. 63). The first component of construct related

strategies gather information to design an accurate assessment or construct-related evidence of

within educational assessment because it eliminates the bias of an assessment. According to

measurement. When an educational assessment layout a reliable measurement of student

most common types of evidence. Content-related, criterion-related, and construct –related

establish evidence to illustrate the accuracy of an assessment-based inference. Reliability and

validity work hand in hand to assemble coherent and accurate results.

learning and to eliminate biased and unfair penalization.

reading fluency assessments: Behavioral Research and Teaching. University of Oregon.

Retrieved from https://dibels.uoregon.edu/docs/techreports/TechRpt1219.pdf

technical and vocational education projects in Nigerian Universities. Journal of

Education and Practice, 6(34), 68-75. Retrieved from

Paulsen, J. & BrckaLorenz, A. (2017). Internal consistency. FSSE Psychometric Portfolio.

MA: Pearson Education, Inc.

25-27. Retrieved from

Anda mungkin juga menyukai