Anda di halaman 1dari 5

hapter 4 Of Tests and Testing

Some Assumptions about Psychological Testing and Assessment


Assumption 1: Psychological Traits and States Exist
-
Trait
: any distinguishable, relatively enduring way in which one individual varies from
another
-
State
: distinguish one person from another but are relatively less enduring
- Based on observing a sample of behavior direct observation oder analysis of self-
report
statements oder pencil-and-papter test answers
- Psychological trait covers a wide range of possible characteristics - How they
exist?
o (fr das buch) psychological trait exists only as a
construct
: informed, scientific
concept developed or constructed to describe or explain behavior can infer their
existence from
overt behavior
: observable action or the product of an observable
action included test- or assessment-related responses
o Trait is not expected to be manifested in behavior 100% of the time manifestation
abhngig of strength of the trait in the individual + nature of the situation
o Context within which behavior occurs also plays a role in helping select
appropriate
trait terms for observed behavior
o Trait+state used to refer to a way in which one individual varies from another
Assessors make such comparisons with respect to the hypothetical average
person + comparisons among people who, because of their membership in some
group, are decidedly not average reference group can influence ones conclusions or
judgments
Assumption 2: Psychological Trait and States Can Be Quantified and Measured
- Specific traits and stats to be measured and quantified need to be carefully defined
- People in general, have many different ways of looking at and defining the same
phenomenon
- Once having defined the trait, state, or other construct considers the types of item
content that would provide insight into it from a universe of behaviors presumed to
be indicative of the targeted trait world of possible items that can be written to
gauge the strength of that trait in testtakers
- Weighing the comparative value of a tests items comes about as the result of a
complex interlay among many factors, including technical considerations, the way a
construct has been defined for the purposes of the test, and the value society
attaches to the behaviors evaluated
- Test score is presumed to represent the strength of the targeted ability or trait or
stat and is
frequently based on
cumulative scoring
: the more the testtaker responds in a particular direction as keyed by the test
manual as correct or consistent with a particular trait, the higher that testtaker is
presumed to be an the targeted ability or trait
Assumption 3: Test-Related Behavior Predicts Non-Test-Related Behavior
- Patterns of answers to true-false questions on one widely used test of personality
are used in
decision making regarding mental disorders
- Tasks in some tests mimic the actual behavior
- validity may also be questioned on grounds related to the interpretation of resulting
test
scores
- questions concerning the validity of a particular test may be raised at every stage
in the life
of a test
Other Considerations
- trained examiners can administer, score and interpret a good test with a minimum
of
difficulty
- good test=useful test, one that yields actionable results that will ultimately benefit
individual
test takers or society at large
Everyday Psychometrics Putting Tests to the Test
- Why Use This Particular Instrument or Method?
o Answers can be found in published sources of info. (test catalogues, test manuals,
published test reviews) as well as unpublished sources (correspondence with test
developers and publishers and with colleagues who have used th same or similar
tests)
- Are There Any Published Guidelines for the Use of This Test?
o Sometimes a published guideline for the use of a particular test will list other
measurement tools that should also be used along with it
o Published guidelines and research may also provide useful info. Regarding how
likely
the use of a particular test or measurement technique is to meet the Daubert or
other standards set by courts
- Is This Instrument Reliable?
o Careful reading of the tests manual and of published research on the test, test
reviews, and related sources
- Is This Instrument Valid?
o Starts with careful reading of the tests manual as well as published research on
the
test, test reviews, and related sources
o Need for multiple sources of data on which to base an opinion stems not only from
the ethical mandates published in the form of guidelines from professional
associations but also from the practical demands of meeting a burden of proof in
court
- Is This Instrument Cost-Effective?
o Z.B. in WW1+WW2 sollten viele soldaten gescant werden group tests had greater

utility than individual tests


- What Inferences May Reasonably Be Made from This Test Score, and How
Generalizable Are
the Findings?
o In evaluating a test it is critical to consider the inferences that may reasonably be
made as a result of administering that test
o People used to help develop a test has a great effect on the generalizability o
Another issue regarding the generalizability of findings concerns how a test was
admnistered
Norms
o
Equipercentile method
: equivalency of scores on different tests is calculated with
reference to corresponding percentile scores
o Scores must be obtained on the same sample each member of the sample took
both tests, and the equivalency tables were then calculated on the basis of these
data
- Subgroup norms
o
Subgroup norms
: a normative sample can be segmented (unterteilt) by any of the
criteria initially used in selecting subjects for sample
o The test manual or a supplement to it might report normative info by each of
these
subgroups
- Local norms
o
Local norms
: provide normative info with respect to the local populations
performance on some test
o Developed by test users themselves o Some test users use abbreviated
(abgekrtzt) forms of existing test require new
norms
Fixed Reference Group Scoring Systems
-
Fixed reference group scoring system
: another type of aid in providing a context for
interpretation
- Distribution of scores obtained on the test from one group of testtakers (fixed
reference
goup) is used as the basis for the calculation of testscores for future administrations
of the test
- Bsp: SAT 1990 fixed reference group of 2 million testtakers was immortalized as a
standard to be used in the conversion of raw scores on future administration of the
test Wenn Max 1990 50 items richtig hatte und Mia 2008 auch 50 items haben sie
nicht automatisch den selben score je nach schwierigkeitsgrad wird umgerechnet
- Fixed reference group scores are most typically interpreted by local decision-
making bodies
with respect to local norms
Norm-Referenced versus Criterion-Referenced Evaluation
-
Criterion
: standard on which a judgment or decision may be based
-
Criterion-referenced testing and assessment
: method of evaluation ad a way of deriving
meaning from test scores by evaluating an individuals score with reference to a set
standard
- Criterion in criterion-referenced assessments typically derives form the values or
standards
of an individual or organization (z.B. praktische Fahrprfung)
-
Domain- or contend- referenced testing and assessment
(anderer begriff fr criterion-
referenced testing and assessment)
- In norm-referenced interpretation of test data focus is how an individual performed

relative to other people who took the test


- Criterion-r. i. of test data focus is the testtakers performance darum auch
mastery
tests genannt
- Criterion-r. approach beliebt in computer-assisted education programs mastery of
segments of materials is assessed before the program user can proceed to the next
level
- Kritik: potentially important info about an individuals performance relative to
other
testtakers is lost; although this approach may have value with respect to the
assessment of mastery of basic knowledge, skills, or both, it has little or no
meaningful application at the
upper end of knowledge/skill continuum brilliance and superior abilities are
recognizable in norm-referenced interpretations
- All testing is ultimately normative, even if the scores are as seemingly criterion-
referenced as
pass-fail even in a pass-fail score there is an inherent acknowledgment of a
continuum of abilities at some point in that continuum, a dichotomizing cutoff point
has been applied

- Wenn raw scores in standard scores umgerechnet werden, wird entweder linear
transformation oder nonlinear transformation benutzt
-
Linear

transformation
: retains a direct numerical relationship to the original raw score;
magnitude of differences between such standard scores exactly parallels the
differences between corresponding raw scores
-
Nonlinear transformation
: wird bentigt wenn daten nicht normalverteilt sind aber mit
normal verteilungen verglichen werden mssen; resulting standard score does not
necessarily have a direct numerical relationship to the original, raw score; raw score
wurde normalized
o Normalized standard scores

Normalizing a distribution
: stretching the skewed curve into the shape of
a normal curve and creating a corresponding scale (
normalized standard
score scale
) of standard scores
Desirable for purposes of comparability standard score on one test can be
compared with a standard score on another test only if they have same
distributions (beide normal nicht eine normal die andere skewed)
Normalerweise besser den test abzustimmen im bezug auf den
Schwierigkeitsgrad oder andere relevante variablen damit am Ende eine
normalkurve entsteht, als den spter eine skewed curve zu normalisieren damit
man normalisieren kann muss test sample gro sein und reprsentativ + das
Misslingen muss auf das Messinstrument zurck zu fhren sein
Omrekenregels ruwe score naar gestandaardiseerde scores: Z = (Score-
gemiddelde)/SD T = Z*10 + 50 IQ = Z*15 + 100

Anda mungkin juga menyukai