Anda di halaman 1dari 10

Running head: RELIABILITY AND VALIDITY 1

Reliability and Validity

Lomialagi C. Tauiliili

American Samoa Community College


RELIABILITY AND VALIDITY 2

Reliability and Validity

An assessment emphasizes the evaluation of someone's learning ability. It is a process

that measures an individual’s knowledge of a skill that has been taught within the classroom. The

process monitors the students' progress in learning. It gathers student’s knowledge and

information for the teacher to make inferences on. Explicitly, assessment plays a crucial role in a

student’s learning by measuring how they learn, their motivation to learn, and how teachers

instruct them. In more details, assessing how the student’s learn refers to student’s proficiency in

mastering the instructional objective. Secondly, measuring the student’s motivation to learn

examines the student’s interests, values, and attitudes towards a given task. Lastly, an assessment

implies on how well the teacher delivered the instructional objective and procedures to the class.

In other words, the student’s proficiency in the assessment task reflects the teacher’s

performance. However, the two most critical characteristics of a well-prepared assessment task

are considered as the validity and reliability of assessment.

Reliability and validity of assessment highlight important evidence for teachers to make

inferences about students' knowledge, skills, or attitudes towards learning. The two terms may

sound the same but determine different meanings. Reliability of assessment stresses on the

consistency of an assessment. It implies the quality to which the grading scale produces

consistent results when measurements are made repeatedly. For instance, the result of the

student’s first test should correlate to his/her second test. On the other hand, validity is the

accuracy of the student’s results. It emphasizes the process in which the assessment measures

what is intended to measure. For example, the students were assessed on their comprehension

skills because the curricular aim and objective is about the comprehension skills. Therefore, all

assessments need to be reliable and valid. Reliability and validity are two most significant
RELIABILITY AND VALIDITY 3

concepts of assessment because reliability refers to the consistency of results, and validity

emphasizes the accuracy of results.

Reliability

To begin with, reliability of assessment refers to consistency with which the assessment

measures what is to be measured. Consistency of assessment is when the students are assessed on

what they were being taught in the classroom. For instance, the class was learning about animal

cells through presentations, and later had a presentation exam about animal cells. The assessment

given conducts consistency of results because it demonstrates the alignment of the objective to

the assessment. Therefore, if the assessment is reliable, the students' results should be reliable as

well. According to Popham (2008), “When you encounter the term reliability in any assessment

context, you should draw a mental equal sign between reliability and consistency, because

reliability refers to the consistency with which a test measures whatever it’s measuring” (p. 28).

Reliability of assessment consisted of three types of evidence. The three types of reliability

evidence include stability, alternate form, and internal consistency.

Stability (Test-Retest)

The first type of reliability evidence is acknowledged as stability reliability or test-retest.

Stability reliability is a measure of consistency acquired by distributing the same test to a group

of students on different occasions. The assessment is given to the students when they are ready.

Also, the assessment is given twice to the students. The two assessments should obtain a

relationship that indicates the stability reliability of assessment. As John (2015) stated,

“Reliability is usually expressed numerically as a coefficient. A high coefficient indicates high

reliability with a minimal error while a low correlation coefficient indicates low reliability with

maximal error” (p. 70). To elaborate on this point, the second test taken by an individual should
RELIABILITY AND VALIDITY 4

contribute the same result from the first test. Otherwise, the students will be confronted with the

variation of results in test-retest. The variations of results incorporate the improvement of

assessment, students remembering test answers, teacher re-teaching on the test, and other

students may be absent on the day of re-test. According to Popham (2008), “Whether you use a

correlational approach or a classification-consistency approach to the determination of a test’s

consistency over time, it is apparent you’ll need to test students twice in order to determine the

test stability” (p. 32). Therefore, stability-reliability focuses on the consistency of results when a

test is given to the same students at different times but twice.

Alternate Form

Also, the alternate form is considered as the second type of reliability evidence. Alternate

form reliability is a measure of consistency acquired by contributing different forms of

assessment to the same group of students. As Popham (2008) proposed, “Alternate-form

reliability deals with the question of whether two or more allegedly equivalent test forms are, in

fact, equivalent” (p. 33). Explicitly, the assessment tools are distributed in two different forms

but measure the same objective or challenges. For instance, half of the class were tested on form

A of the critical thinking assessment and the other half was tested on form B. After being

assessed on the first testing, the half who took form A will take form B and the half who took

form B will be tested on form A. Then two forms of assessment tools are compared to check if

both the forms of measurement is consistent. According to Alonzo (2012), “A key feature of

assessments designed for progress monitoring is that alternate forms must be as equivalent as

possible to allow meaningful interpretation of student performance data across time” (p. 3). In

more details, a high coefficient of equivalence presents that both the assessment tools measure

the same challenges. However, the low coefficient of equivalence demonstrates that the two
RELIABILITY AND VALIDITY 5

forms of assessment are not assessing the same goal. An assessment should be reliable to

measure all students learning abilities.

Internal Consistency

The last type of reliability evidence is called internal consistency. The internal

consistency emphasizes evaluating if all items of the test are aligned to the objective. This form

of evidence is quite different from the stability and alternate-form reliability. Internal consistency

does not focus on the students’ scores but focuses on items of the test if it is consisted with what

has been taught in the class. As Henson (2001) stated, “Internal consistency is the extent to

which a group of items measure the same construct, as evidenced by how well they vary

together, or interrelate” (as cited in Paulsen and BrckaLorenz (2017), par. 1). To elaborate more,

the items on the test are written accordingly the student’s behavior toward school. For example,

if the students are excelling in working on problem-solving, then the test items should correlate

to the students’ capabilities. Effectively, internal consistency measures the reliability of the

assessment tool by examining the items students are assessed on. Explicitly, each evidence about

assessment reliability are not interchangeable. Stability reliability measures on giving the same

test to the same group of students at different times. Alternate form reliability is the

comparability of two different forms that assess the same challenges and is given to the same

group of students. Lastly, internal consistency measures on the reliability of the assessment

through discovering the type of items given to the same group of students.

Validity

The validity of an assessment is the accuracy of results. Accuracy of validity is identified

as the most significant component of assessment. the validity of an assessment is conducted on

the performance standards in which the teacher measures the student’s proficiency towards the
RELIABILITY AND VALIDITY 6

curricular aim, or how well the student mastered the content standard. According to Charles

Darwin University (2019),

Assessment validation is a quality review process to check that the assessment tools

produced valid, reliable, sufficient, current and authentic evidence for assessors to make

reasonable judgements as to whether the requirements of the units and training product

have been met and that assessment judgments are consistently applied. (p. 3)

Explicitly, the validity component of assessment focuses on the teacher’s valid interpretation

about the students, in which it helps them make strong instructional resolutions. For instance,

according to a student assessment collection, it highlights the student’s valid estimation of

performance on an educational standard. Therefore, the validations of a classroom assessment

have three types of validity evidence that can reinforce a dispute. The three types of validity

evidence incorporate content related, criterion-related, and construct related.

Content Related

The content-related evidence of validity stresses that the assessment of content represents

the content of the educational standards. If the examination provides less sufficient items related

to the curricular aim, then the examination is less on content-related evidence of validity. For

example, the student’s examination for algebra enhances the full range of content regarding the

curricular aim. In this case, two approaches that gathers the content-related evidence of validity.

As Yaghmaie (2003) stated, “For content validity two judgment are necessary: the measurable

extent of each item for defining the traits and the set of items that represents all aspects of the

traits” (p. 25). The first approach is called developmental care. It assures that the test items

present appropriate skills and knowledge. For example, if a teacher wants to develop a content-

related validity assessment, she needs to make sure that the curricular aim is aligned with the
RELIABILITY AND VALIDITY 7

items on the test. If students had learned about problem-solving, then the test items should be

about problem-solving. Also, if the teacher wants to reaffirm that the assessment is content-

related, the assessment should be taken to the second form of content-related evidence. External

reviews are people from outside of the classroom who rates the appropriateness of the test items.

Such people can include other teachers, administrators, staff, parents, and students. These people

will be able to review the test by using a form of guideline from the teacher to make suggestions

for the test. The content-related evidence of validity ensures that the educational assessment is

aligned to the objective, standard, and skills taught to the students.

Criterion Related

Another form of validity evidence is recognized as the criterion-related evidence of

validity. The criterion-related evidence intensifies on using an educational assessment process to

predict the student’s outcomes. An example of the criterion-evidence of validity specifies using a

standard-based test to predict the student’s performance at a higher educational level. According

to Popham (2008), “Test results, on predictor tests as well as on educational assessment

procedures, should always be used to make better education decisions” (p. 62). Explicitly, the

relationship between the predictor test and the criterion helps make inferences on improving the

teaching skill or the learning skill. If the students failed the assessment because of motivation,

then the teacher should help promote activities that can boost the student’s motivation before

taking the test. The criterion-related evidence is also known as the predictive validity because it

predicts the student’s criterion such the scores, grades, GPA, and other future references.

Construct Related

Lastly, the construct-related evidence of validity expresses on gathering empirical

evidence to identify if the measurements of the hypothetical construct are accurate. The evidence
RELIABILITY AND VALIDITY 8

provides feedback on how the collection of a student’s assessment proposes if the test measures

what needs to be measured. As Popham (2008) proposed, “Construct-related evidence of validity

for educational tests is typically gathered by way of a series of studies, not one whooper “it-

settles-the-issue-once-and-for-all investigation” (p. 63). The first component of construct related

evidence is intervention studies. Intervention studies are predictions that students will have

different response to the assessment after delivering a lesson. An example of this approach is

having students take a pretest, then the teacher teaches based on the student’s results, and

students take a post-test. The second component is known as the different population studies.

Different population studies refer to the different population that concludes different scores. For

instance, the class will conduct different results if they were to have an oral presentation test in

English because many of the students are English language learners. Therefore, the construct-

related evidence supports the fact that students’ scores differently because of their status. Lastly,

related-measure studies are the relationship of students scores on two different forms of

assessment that measures the same goal. This investigation hypothesizes that if an individual

mastered a reading exam in class on a particular skill, the same individual’s results should

correlate to his/her score for the same skill on a different assessment tool. Therefore, all three

strategies gather information to design an accurate assessment or construct-related evidence of

validity.

Conclusion

All educational assessments should be reliable and valid. The two concepts are important

within educational assessment because it eliminates the bias of an assessment. According to

Popham (2008), “If convincing evidence is gathered that a test is permitting valid scored-based

inferences, we can be assured the test is also yielding reasonably reliable scores” (p. 66).
RELIABILITY AND VALIDITY 9

Explicitly, the reliability of assessment leads to the accuracy and precision of student’s

measurement. When an educational assessment layout a reliable measurement of student

learning, the students’ results will bestow the same consistency. Reliability of assessment

appears in three types of evidence. The three types of reliability ensure that the curricular aim

and assessment must be aligned to have consisted of results from students’ scores. The validity

of assessment on the other hand, discovers on the accuracy of measurements between the

curricular aim and the assessment given to students. Validity of assessments consists of three

most common types of evidence. Content-related, criterion-related, and construct –related

establish evidence to illustrate the accuracy of an assessment-based inference. Reliability and

validity work hand in hand to assemble coherent and accurate results.

Moreover, an assessment can be reliable but not valid. However, an assessment cannot be

valid unless it is reliable. In more details, an assessment can provide a consistent result to make it

reliable, but it is not valid if the assessment is not measuring what is intended to measure.

According to John (2015), “If a measuring instrument measures what it purports to measure,

such instrument can be said to be valid because it has turned out to be a suitable measure” (p.

72). “For example, a broken clock in a room is reliable because the clock shows the same time

whenever the student goes to class. However, a working clock is reliable and valid because it

shows the logic and accurate time” (M. Langkilde, class discussion, August 2019). To elaborate

more, the broken clock is reliable because it displays the same time but it is not valid because it

does not give the exact time. Therefore, reliability and validity are crucial concepts in education

assessment because it helps teachers to make a generalized judgment about a student’s level of

learning and to eliminate biased and unfair penalization.


RELIABILITY AND VALIDITY 10

References

Alonzo, J., Lai, C., Anderson, D., Park, B. J., & Tindal, G. (2012). An examination of test-retest,

alternate form reliability, and generalization theory study of the easyCBM passage

reading fluency assessments: Behavioral Research and Teaching. University of Oregon.

Retrieved from https://dibels.uoregon.edu/docs/techreports/TechRpt1219.pdf

Charles Darwin University. (2019). VET assessment validation procedures. 3, 1-9. Retrieved

from https://www.cdu.edu.au/governance/doclibrary/pro-136.pdf

John, A. C. (2015). Reliability and validity: A sine qua non for fair assessment of undergraduate

technical and vocational education projects in Nigerian Universities. Journal of

Education and Practice, 6(34), 68-75. Retrieved from

https://files.eric.ed.gov/fulltext/EJ1086092.pdf

Paulsen, J. & BrckaLorenz, A. (2017). Internal consistency. FSSE Psychometric Portfolio.

Retrieved from

http://fsse.indiana.edu/pdf/pp/2017/FSSE17_Internal_Consistency_Reliability.pdf

Popham, W. J. (2008). Classroom assessment: What Teachers Need to Know (5th ed.). Boston,

MA: Pearson Education, Inc.

Yaghmaie, F. (2003). Content validity and its estimation. Journal of Medical Education, 3(1),

25-27. Retrieved from

https://www.researchgate.net/publication/277034169_Content_validity_and_its_estimatin

Anda mungkin juga menyukai