Anda di halaman 1dari 227

P S YC H O L O G I C A L

ASSESSMENT
R E N Z L O U I S T. M O N T A N O , R P M
1. Administration of psychological tests in order to
obtain a score is specifically classified under _.
a.Psychological Assessment
b.Psychological Testing
c. Standardization
d.Norming
2. The use of various techniques such as intake interview,
testing, and observation to obtain a sample of behavior
is classified under _.
a. Psychological Assessment
b. Psychological Testing
c. Standardization
d. Norming
Testing Assessment
Objective To obtain some gauge (score) To answer a referral question, solve
a problem, arrive at a decision
Process Individual or group; Very Individualized; Involves subjectivity
objective (Nomothetic) (Idiographic)
Role of evaluator The tester is not key to the The assessor is key to the process
process (selecting test, drawing conclusions)
Skill of Evaluator Technician-like Skills Educated selection of tools of
evaluation, skills in eval., and
thoughtful organization and
integration of data
Outcome Test score/s Integration from many sources of
data to answer the referral question
3. A specific stimulus to which a person responds overtly
a. Psychological test
b. Norm
c. Criterion
d. Item
4. Which of the following is not under typical-
performance type of tests?
a. 16PF
b. EPPS
c. WAIS
d. MBTI
Types of tests
Maximal ability tests
1. Achievement tests – past learning (ex: exam)
2. Intelligence tests – mental abilities (reasoning)
3. Aptitude tests – ability to learn a certain skill
(screening test for applicants
Types of tests
Typical Ability Tests
1. Personality tests – traits, characteristics (objective
and projective)
2. Interest tests – preference
3. Attitude tests – degree of agreement or
disagreement
5. Test sophistication or experience with tests
significantly affects test performance because practiced
test takers _
a. have advantage over first time test takers
b. can overcome feeling of strangeness because they
have developed self-confidence
c. are familiar with the common item types and
practice with the use of objective answer sheets
d. all of the above are advantages of practice test
takers
6. Which of the following can be used in addressing the
issue of test sophistication?
a. a short orientation and practice sessions
b. preparing the test materials ahead of time
c. training the test administrators
d. reading the test manual
7. The test manual should provide the essential
information required for administering, scoring and
evaluating a particular test. This is primarily the
responsibility of the _.
a. test user
b. test developer
c. test publisher
d. test taker
8. Which of the following is TRUE according to the studies
of Lawrence (1962) and Paul and Eriksen (1964)?
a. Slight amount of anxiety is detrimental to
performance
b. Slight amount of anxiety is beneficial to
performance
c. Both small and large amount of anxiety are
beneficial to performance
d. Both small and large amount of anxiety are
detrimental to performance
9. The MOST important single requirement for good
testing procedure is _ and only in this way uniformity
can be assured.
a. Advanced preparation of the examiner
b. Orienting the test takers
c. Preparing the test materials
d. Objective scoring
10. Early experimental psychologists such as Wundt were
primarily concerned with _ in studying individuals in
their laboratories.
a. Intelligence Quotient
b. Personality
c. Sensory abilities
d. Emotions
11. It was in this setting that the first group intelligence
test was developed
a. Experiments lead by Wilhelm Wundt in Liepzig,
Germany
b. Screening American recruits in World War I lead by
Robert Yerkes
c. Identifying seriously disturbed individuals who
would be disqualified for military service lead by Robert
Woodworth
d. Creation of the Binet-Simon test of 1905
12. The following are characteristics of the normal
distribution except
a. The curve is symmetrical
b. Asymptotic
c. Skewed either negatively or positively
d. The mean, media and the mode are the same
13. In a negatively skewed distribution, we can say that
a. Mode > Median > Mean
b. Mean > Median > Mode
c. Mean = Median = Mode
d. The highest point represents the mean
14. A sample is a representative of a population. If a
researcher aims for a normally distributed data, it is
important to _.
a. Have a small number of sample
b. Have a homogeneous sample
c. Use a test with homogenously difficult items
d. Have a large number of sample
Sample – representative of the population
Homogenous sample – participants are
very similar to each other (all are college
students, all are from Manila)
Heterogeneous sample – variety of
participants
15. If a test has an item difficulty of .85, it means that
the test is _ and we can expect a _ distribution.
a. Easy; normal
b. Easy; Negatively skewed
c. Moderate; Negatively skewed
d. Difficult’ Positively skewed
Item difficulty index – indicates the percentage of test
takers who got a correct answer on an item. (.75
difficulty = 75% got correct)
The higher the difficulty index, the easier the item is

.3 - .7 – ideal difficulty
16. Mr. Natividad administered a 100-item test. Then item
analysis show that those who get a correct answer in item
number 25 were also able to obtain a high total score. Those
who were incorrect were able to get low total score. We can say
that item number 25
a. has high item difficulty
b. is biased
c. discriminates well between the good and the poor
performers
d. all of the above are TRUE about item number 25
Item discrimination index – the
capacity of an item to distinguish
between good and poor performers
17. Mr. Natividad administered a 100-item test. Then
item analysis show that only 23 out of his 100 students
get a correct answer on item number 15. We can say that
item number 15 has _ item difficulty and is _ for the test
takers.
a. .23; Difficult
b. .23; Easy
c. .77; Difficult
d. .77; Easy
18. Which of the following is TRUE about raw scores?
a. It indicates the relative standing of an individual in
comparison with other test takers
b. It allows for comparison with test scores from
different tests
c. It is meaningless unless accompanied with
interpretation
d. It shows how many test takers an individual
outperformed
19. If a student obtains a score of 25 from a quiz then
his classmates’ average score is 20 (SD=3), his z-score
will be _.
a. 1.66
b. 2.55
c. -1.66
d. -2.55
25 - 20
Z=
3
=
POINT 13
VALENTINE’S DAY (2.14)
13.59
3–4–1–3
20. If a test taker obtains a z-score of -1, assuming that
there is a normal distribution, we can also say that
a. He outperformed 16% of the total number of test
takers
b. His score is below the mean
c. 84% of the test takers performed better than him
d. All of the above are TRUE
Std. Mean Std. Deviation
Score
Z-Score 0 1
T-Score 50 10
Stanine 5 2
Sten 5.5 2
IQ 100 15
Std. Mean Std. Deviation
Score
Z-Score 0 1
T-Score 50 10
Stanine 5 2
Sten 5.5 2
IQ 100 15
Std. Mean Std. Deviation
Score
Z-Score 0 1
T-Score 50 10
Stanine 5 2
Sten 5.5 2
IQ 100 15
50%
21. One limitation with the use of percentile is that
a. It implies a zero raw score and a perfect raw score
b. There is inequality in its units especially at
extremes of the distribution
c. It is meaningless unless accompanied with
interpretation
d. It is limited to single digits
22. A person who obtains a score of T-score of 60 also
obtains
a. A Z-score of -2
b. A Stanine of 7
c. A percentile rank of 50
d. An IQ of 130
CONVERSION OF Z-SCORES TO OTHER STANDARD SCORE

Z=2
Std. Mean Std. Deviation
Score Z – score to
Z-Score 0 1 T-score
T-Score 50 10
Stanine 5 2 Formula: Mean + SD (Z-score)
Sten 5.5 2
IQ 100 15

50 + 10(2) = 70
CONVERSION OF Z-SCORES TO OTHER STANDARD SCORE

Z=2
Std. Mean Std. Deviation
Score Z – score
Z-Score 0 1 Stanine
T-Score 50 10
Stanine 5 2 Formula: Mean + SD (Z-score)
Sten 5.5 2
IQ 100 15

5 + 2(2) =9
CONVERSION OF Z-SCORES TO OTHER STANDARD SCORE

Z=2
Std. Mean Std. Deviation
Score Z – score
Z-Score 0 1 Sten
T-Score 50 10
Stanine 5 2 Formula: Mean + SD (Z-score)
Sten 5.5 2
IQ 100 15

5.5 + 2(2) = 9.5


CONVERSION OF Z-SCORES TO OTHER STANDARD SCORE

Z=2
Std. Mean Std. Deviation
Score Deviation IQ
Z-Score 0 1
T-Score 50 10
Stanine 5 2 Formula: Mean + SD (Z-score)
Sten 5.5 2
IQ 100 15

100 + 15(2) =130


EXERCISE
PART 1: CONVERSION FROM RAW
SCORE TO Z-SCORE

FORMULA:
Z = RAW SCORE – MEAN
STD. DEV.
CONVERSION OF Z-SCORES TO OTHER STANDARD SCORE
Std. Mean Std. Deviation 1.1. A student obtained a score of 40 in a test. His
Score
Z-Score 0 1
classmates averaged 35 with a standard deviation
T-Score 50 10 of 4. His Z-score is _.
Stanine 5 2
Sten 5.5 2
IQ 100 15

40 - 35 =
4 1.25
CONVERSION OF Z-SCORES TO OTHER STANDARD SCORE
Std. Mean Std. Deviation 1.2 A student obtained a score of 18 in a test. Her
Score
Z-Score 0 1
classmates averaged 23 with a standard deviation
T-Score 50 10 of 3. His Z-score is _.
Stanine 5 2
Sten 5.5 2
IQ 100 15

18 - 23
= - 1.67
3
PART 2: CONVERSION FROM Z-
SCORE TO OTHER STANDARD
SCORES

FORMULA: MEAN + SD (Z-SCORE)


CONVERSION OF Z-SCORES TO OTHER STANDARD SCORE
Std.
Score
Mean Std. Deviation
Convert 1.25 to T-Score
Z-Score 0 1
T-Score 50 10
Stanine 5 2
Sten 5.5 2
IQ 100 15

50 + 10 (1.25) = 62.5
CONVERSION OF Z-SCORES TO OTHER STANDARD SCORE
Std.
Score
Mean Std. Deviation
Convert 1.25 to Stanine
Z-Score 0 1
T-Score 50 10
Stanine 5 2
Sten 5.5 2
IQ 100 15

5 + 2 (1.25) = 7.5
CONVERSION OF Z-SCORES TO OTHER STANDARD SCORE
Std.
Score
Mean Std. Deviation
Convert 1.25 to Sten
Z-Score 0 1
T-Score 50 10
Stanine 5 2
Sten 5.5 2
IQ 100 15

5.5 +2 (1.25) =8
CONVERSION OF Z-SCORES TO OTHER STANDARD SCORE
Std.
Score
Mean Std. Deviation
Convert 1.25 to IQ
Z-Score 0 1
T-Score 50 10
Stanine
Sten
5
5.5
2
2 100 + 15 (1.25)
IQ 100 15

= 118.75
CONVERSION OF Z-SCORES TO OTHER STANDARD SCORE
Std.
Score
Mean Std. Deviation
Convert 1.67 to T-Score
Z-Score 0 1
T-Score 50 10
Stanine 5 2
Sten 5.5 2
IQ 100 15

50 +10 (-1.67) = 33.3


CONVERSION OF Z-SCORES TO OTHER STANDARD SCORE
Std.
Score
Mean Std. Deviation
Convert 1.67 to Stanine
Z-Score 0 1
T-Score 50 10
Stanine 5 2
Sten 5.5 2
IQ 100 15

5 + 2 (-1.67) = 1.66
CONVERSION OF Z-SCORES TO OTHER STANDARD SCORE
Std.
Score
Mean Std. Deviation
Convert 1.67 to Sten
Z-Score 0 1
T-Score 50 10
Stanine 5 2
Sten 5.5 2
IQ 100 15

5.5 +2 (-1.67) = 2.16


CONVERSION OF Z-SCORES TO OTHER STANDARD SCORE
Std.
Score
Mean Std. Deviation
Convert 1.25 to IQ
Z-Score 0 1
T-Score 50 10
Stanine
Sten
5
5.5
2
2 100 + 15 (-1.67)
IQ 100 15

= 74.95
23. The difference between 1 SD above the mean and 2
SD below the mean is approximately
a. 95.20
b. 50.00
c. 16.84
d. 81.86
.13 + 2.14 = 2.27 50 + 34.13 = 84.13 84.13 – 2.27 = 81.86
24. If time spent reviewing and test scores has a
correlation coefficient of .77, we can say that
a. Time spent reviewing predicts high test scores
b. As length of review increases, test scores tend to
increase as well
c. Time spent reviewing predicts low test scores
d. As length of review increases, test scores decreases
25. The reliability coefficient of a test is .88. It means that
a. 88% of the variances in scores attributable to true
differences while 12% it attributable to error
b. 12% of the variances in scores attributable to true
differences while 88% it attributable to error
c. 88% is the true score while 12% is the obtained score
d. 12% is the true score while 88% is the obtained score
Error - Discrepancies between true ability and
measurement of ability constitute errors of
measurement.
• Not mistakes
• there will always be some inaccuracy in our
measurements
• Find the magnitude of such errors and to develop ways
to minimize them
• Reliable = free from error
Random error – source of error in measuring a targeted
variable caused by unpredictable fluctuations and
inconsistencies of other variables in the measurement
process.

Systematic error – source of error in measuring a


variable that is typically constant or proportionate to
what is presumed to be the true value of the variable
being measured.
Classical Test Theory

X = T + E
Observed Score True Score Error

Error can be positive or negative


Error (E) can either be positive or negative.
• If E positive, the Obtained Score (X) will be higher than
the True Score (T);
• if E is negative, then X will be lower than T.
Possible Sources of Error
1. Item selection
2. Test Administration
3. Test Scoring
26. Dr. Darren would like to correlate the scores of his students
in the statistics final exam and scores on the diagnostic exam
given before the course began. It would be best to use _.
a. Pearson-r
b. Spearman rho
c. Spearman brown formula
d. ANOVA
e. Linear regression
27. Coefficient of stability, temporal stability and carryover
effect are associated with what type of reliability?
a. Parallel forms reliability
b. Test-retest reliability
c. Internal consistency
d. Inter-rater reliability
Reliability

•Consistency of test scores (dependability)


•The larger the number of items, the higher
the reliability
•The higher the reliability, the lower the error
Types of Reliability

1. Test – retest
2. Parallel or Alternate Forms
3. Internal Consistency: Split-half
Coefficient Alpha Kuder-
Richardson 20
4. Inter-rater, Inter-scorer, Inter-judge
Test-retest Reliability

•Consistency of scores across time


•Time or Temporal Sampling, Coefficient of
Stability
•Carryover Effect, Practice Effect
28. One advantage of alternate-form reliability over test-retest
reliability aside from minimizing carryover effect is that it also
a. shows the intercorrelation between the items
b. show agreement between raters
c. shows consistency of response to different item samples
d. all of the above
Test-retest Reliability

•Consistency of scores across time


•Time or Temporal Sampling, Coefficient of
Stability
•Used in measuring constructs that are stable
over time
Test-retest Reliability

• Carryover effect – occurs when the first testing session


influences the results of the second session and this can
affect the test-retest reliability of a psychological
measure.
• Practice effect – a type of carryover effect wherein
the scores on the second test administration are higher
than they were on the first.
If the results of the first and second administration has a
low correlation, it might mean that:
• The test has poor reliability
• A major change had occurred on the subjects between
the first and second administration.
• A combination of low reliability and major change
have occurred.
Alternate Forms Reliability

• It is also known as item sampling reliability or alternate forms


reliability since it compares two equivalent forms of a test that
measure the same attribute to make sure that the items indeed
assess a specific characteristic.

• The error of variance in this case represents fluctuations in


performance from one set of items to another, but not
fluctuations over time.
Factors Affectin Alternate Forms Reliability

• Number of items
• Must cover same content
• Range and level of difficulty must be equal
• Instructions
• time limits
• illustrative examples
• format
29. In alternate-form reliability, the second set of test can be
administered immediately or delayed. If the test user decides to
use immediate succession, then variances in scores represents
a. fluctuations in performance from one set of items to
another
b. fluctuations over time
c. fluctuations in the interpretation of different raters
d. low internal consistency
30. Which of the following is TRUE about alternate-forms?
a. Alternate-forms reliability eliminates practice effect
b. Alternate-forms reliability reduces practice effect
c. Alternate-forms reliability is not vulnerable to practice
effect
d. Alternate-forms of test are always parallel with the first
set
31. In _ method of reliability, the reliability coefficient of test
scores are compromised because the total number of items has
been reduced. To address this, _ can be used to compute for the
reliability coefficient
a. Split-half; KR 20
b. Split-half; Spearman-Brown formula
c. Alternate forms; Pearson r
d. Alternate forms; KR 20
Internal Consistency

This model of reliability measures the internal


consistency of the test which is the degree to which
each test item measures the same construct. It is
simply the intercorrelations among the items.
Split-Half Method

Splitting the items on a questionnaire or test in


half, computing a separate score for each half, and
then calculating the degree of consistency between
the two scores for a group of participants
Chronbach Alpha
• Average of all split halves.
• Items are not in right or wrong format (Likert Scale)
• Ex: Personality Tests, Attitude Tests

Kuder-Richardson 20
• The statistics used for calculating the reliability of a test in
which the items are dichotomous or scored as 0 or 1.
• Tests with right or wrong format.
• Ex: Achievement Tests
32. Homogeneity or similarity of items strongly affects the
reliability of a test. When tests have similar items or covers only
one construct, the test is said to be _ and tend to have _
reliability
a. Heterogenous; higher
b. Heterogenous; lower
c. Homogenous; higher
d. Homogenous; lower
33. Ms. Cortez administered a math exam to section 1 which only
contains items about statistics. In section 2, she administered a math
exam with items from algebra, statistics, and geometry. Which of the
following is true?
a. Section 1 will get a higher score because their exam is highly
homogenous
b. Section 2 will get a higher score because their exam is highly
heterogenous
c. Section 1 will get a higher score because their exam is highly
homogenous
d. Section 2 will get a higher score because their exam is highly
heterogenous
34. Construct underrepresentation and passage of time
are sources of error in which type of reliability?
a. Test-retest
b. Alternate-forms (immediate)
c. Alternate-forms (delayed)
d. Internal consistency
35. Tests designed for clinical purposes such as projective
tests are most vulnerable to which type of error?
a. Content sampling
b. Time sampling
c. Content heterogeneity
d. Interscorer differences
Interrater Reliability

Kappa statistic - assess the level of agreement


among raters in nominal scale.

▪Cohen’s Kappa – used to know the agreement


among 2 raters
▪Fleiss’ Kappa – used to know the agreement
among 3 or more raters.
36. Kuder-Richardson 20 can be used in which of the
following?
a. Dichotomous tests
b. Test with right or wrong answers
c. Personality tests with no right or wrong answers
d. Both a and b
CHRONBACH’S ALPHA KUDER-RICHARDSON 20
Multiple-scored Tests Single Scored Tests
Tests with no right or wrong Tests with right or wrong answers
answers
Polytomous Format Dichotomous Format
Ex: Personality Tests, Attitude Tests, Ex: Aptitude Tests, Achievement
Interests Tests Tests, Intelligence Tests
Which of the following is FALSE about speed tests?
a. The distribution of scores will most likely be
negatively skewed
b. It contains items with increasing difficulty
c. It uses homogenously easy items
d. Very small amount of time is allotted per item
SPEED TESTS POWER TESTS
Timed No time limit
Homogenously easy items Items with increasing difficulty
Perfect score cannot be attained in both
38. Jen obtained a score of 37 in a mental ability test
with an SEM of 5. We can say that there is a 68% chance
her true score lies between
a. 27 – 47
b. 42 - 47
c. 32 – 42
d. 20 – 40
68 %

27 32 37 42 47

There is a 68 % chance that the TRUE SCORE falls between 32 - 42


There is a 95% chance that the TRUE SCORE falls between 27 - 47
39. Criterion-referenced testing is associated with the
following EXCEPT
a. Content-referenced
b. Domain-referenced
c. Mastery testing
d. Norm-referenced
40. The result of the Ten Item Personality Inventory (TIPI)
shows that Ana is conscientious but in real life, she is
laid back. We can say that TIPI is not _.
a. Valid
b. Reliable
c. Standardized
d. Biased
41. Which of the following can be done to increase the
reliability of a test
a. Increasing the number of items
b. Use factor analysis to determine subscales or
dimensions then compute for the validity of each
c. Eliminating items that does not represent the
construct
d. All of the above
42. The board of psychology follows a table of
specification as a guide to determine which topics will be
covered in the licensure exam. This method ensures the _
validity of the exam.
a. Face
b. Content
c. Criterion
d. Construct
• The degree to which the
measurement procedure • Content
measures the variable that it • Criterion Related
claims to measure (strength
1. Concurrent
and usefulness).
• Gives evidence for inferences 2. Predictive
made about a test score. 3. Contrasted Groups
• Basically, it is the agreement
between a test score or • Construct
measure and the characteristic 1. Convergent
it is believed to measure.
2. Discriminant/Diverg
ent
Content Validity - concerned with the
extent to which the test is
representative of a defined body of
content consisting of topics and
processes.

Content validation is not done by


statistical analysis but by the
inspection of items.
A panel of experts can review the test
items and rate them in terms of how
closely they match the objective or
domain specification.
43. Norming is concerned with representation of the _
while content validity is concerned with representation of
the _
a. Population; Domain
b. Domain; Population
c. Sample; Population
d. Population; Sample
44. If a school uses the entrance exam scores obtained by
their own students, then the school is using _ norms.
a. Global
b. National
c. Subgroup
d. Local
45. Content-validity will be MOST useful in which type of
test?
a. Projective Tests
b. Norm-referenced
c. Domain-referenced
d. Standardized
46. A job analysis was conducted to be able to come up
with a screening test wherein items closely resembles
actual job activities. It aims to measure if the applicant
has the skills and knowledge required to perform the job.
This is MOST concerned with _ validity.
a. Face
b. Content
c. Criterion
d. Construct
47. A company requests for feedback from its adult
clients by asking them to fill out a form. The form
contains smiley faces ranging from disappointed to
satisfied which the clients find childish. This is MOST
concerned with _ validity.
a. Face
b. Content
c. Criterion
d. Construct
• Face validity – is the simplest and least scientific form
of validity and it is demonstrated when the face value
or superficial appearance of a measurement measures
what it is supposed to measure.
• Item seems to be reasonably related to the perceived
purpose of the test.
• Often used to motivate test takers because they can
see that the test is relevant.
48. The Personal Data Sheet by Woodworth aimed to
screen army recruits by measuring emotional
disturbance. The aim is screen out applicants who are
more likely to develop emotional disorders. This is MOST
concerned with _ validity.
a. Content
b. Predictive
c. Concurrent
d. Construct
Concurrent Validity
• Both the test scores and the criterion measures are
obtained at present.

Predictive Validity
• Test scores may be obtained at one time and the
criterion measure may be obtained in the future after
an intervening event.
49. A supervisor was able to know that an employee
obtained a high score in the aptitude test used by the
company during his application. Because of this, the
supervisor gives the employee high scores in performance
appraisal. This is an example of
a. Criterion Contamination
b. Construct underrepresentation
c. Establishing rapport
d. Non-uniformity of procedures
50. A school psychometrician correlated the aptitude test
scores of graduating students with the general weighted
average available at the time the test results were
interpreted. This is MOST concerned with _ validity.
a. Content
b. Predictive
c. Concurrent
d. Construct
51. A depression test was administered to patients
diagnosed with major depressive disorder and to people
that never met the criteria for MDD. The patients yielded
a higher score on the test. This is an evidence of _
validity.
a. Face
b. Content
c. Criterion
d. Construct
52. Mental ability tests such as the Standford-Binet and
other preschool test employed the concept of age
differentiation. Since mental abilities are expected to
increase with age, it is argued that test scores should
also increase. Such is an evidence of _ validity.
a. Face
b. Content
c. Criterion
d. Construct
• Construct – An informed scientific idea developed or
hypothesized to describe or explain a behavior;
something built by mental synthesis.
• Unobservable, presupposed traits; something that the
researcher thought to have either high or low
correlation with other variables.
• Construct validity is like proving a theory through
evidences and statistical analysis.
1. Test is homogenous, measuring a single construct.
2. Test score increases or decreases as a function of age,
passage of time, or experimental manipulation.
3. Pretest, posttest differences
4. Test scores differ from groups.
5. Test scores correlate with scores on other test in
accordance to what is predicted.
53. A newly established test can be correlated with a previously
established psychometrically sound test can be both an evidence
of criterion and construct validity. Although both use correlation
between the two tests, what differentiates the two types of
validity is that the aim for criterion validity evidence is _
correlation while _ correlation for construct.
a. Low; High
b. High; Low
c. High; Moderate
d. Moderate; High
54. Obtaining a very high correlation coefficient between a
newly and a previously established test as evidence of construct
validity is
a. Desirable, because it shows a very high correspondence
between the new test and the established test
b. Desirable, because it shows that items represents the
domain that the test aim to measure
c. Undesirable, because it indicates redundancy of test items
d. Undesirable, because it defeats the purpose of
establishing a new test
55. A group of students constructed a Pagkamasayahin scale
and correlated it with foreign extraversion scale and playfulness
scale. Positive correlation between the pagkamasayahin scale
and the two foreign scales is an evidence of _ validity
a. Convergent
b. Divergent
c. Content
d. Criterion
56. A group of students constructed a Pagkamasayahin scale
and correlated it with foreign rumination on sadness scale and
negative affect scale. Negative correlation between the
pagkamasayahin scale and the two foreign scales is an evidence
of _ validity.
a. Convergent
b. Divergent
c. Content
d. Criterion
57. Monica administered a verbal ability test and a spatial
ability test. Low to zero correlation between the two scales is an
evident of _ validity.
a. Convergent
b. Divergent
c. Content
d. Criterion
58. An HR used an achievement test to serve as pretest and posttest in a
3-day training program. To prove that the test is construct valid, the HR
should
a. Aim to construct or use a test that represents the all the skills and
knowledge taught in the training program
b. Aim for increase in scores from pretest to posttest
c. Aim for a test that correlates highly with performance appraisal
scores available at the moment
d. Aim for a test that will predict who will be able to employ the
skills taught in training in the actual job in the future.
59. Construct validity relies heavily on _
a. The extent to which a test measures a theoretical construct or trait
b. Representation of the entirety of the construct or domain
c. Correlation with a criterion
d. Making sure that the test appear to measure what it purports to
60. In most instances, this statistical treatment is used in validating tests.
a. Pearson-r
b. ANOVA
c. Coefficient Alpha
d. Kuder-Richardson 20
61. Which of the following can be obtained in single administration?
a. Test-retest
b. Alternate forms delayed
c. Time sampling
d. Kuder-Richardson 20
62. If there is disagreement among raters in interpreting an individual’s
score, then we can expect a low _
a. Test-retest reliability
b. Content validity
c. Inter-scorer reliability
d. Internal consistency
63. If there is disagreement among experts in judging whether an item in
a newly constructed test represent the construct, then we can expect a low
_
a. Test-retest reliability
b. Content validity
c. Inter-scorer reliability
d. Internal consistency
64. One downside of using split-half method of reliability is
that the number of items is reduced into half and the reliability
is compromised. To address this issue, one can use _ to estimate
for the reliability of the test as a whole.
a. Spearman Rho Formula
b. Spearman Brown Formula
c. Pearson-r
d. Kuder-Richardson 20
65. A group of students constructed a Pagkamasayahin scale
and used factor analysis to be able to group similar items. They
named the group of similar items or subscales as
Pagkamapagbiro, Pagkamadaldal, and Pagiging Positibo. In
this example, Pagkamasayahin is the _ while the subcales are
the _.
a. Dimension; Constructs
b. Criterion; Dimensions
c. Construct; Dimensions
d. Dimension; Criterions
66. A student would like to adapt a resilience questionnaire that
they can use in their thesis. He would use the scale to study
resilience among Filipino adults. To make sure that the
questionnaire is valid to his respondents, he must consider
a. If the operational definition of resilience in his thesis and
the definition of resilience in the scale is similar if not the same.
b. If the test items can be understood by the respondents
c. If the test items are applicable to Filipinos
d. All of the above
67. If the coefficient of determination is .88, then it means that
a. 88% of the variances in Y is attributable to X
b. 88% of the variances in X is attributable to Y
c. As X increases, Y also increases
d. As X increases, Y decreases
68. This type of study is a conglomeration of previous studies
wherein the researcher conducts an analysis of results obtained
in earlier studies to determine the effect size.
a. Factor-analysis
b. Meta-analysis
c. Descriptive
d. Experimental
Meta-analysis - is a statistical
procedure in which the results of
numerous studies are averaged in a
single, overall investigation.
69. The most recurrent number in the distribution is the _
a. Mean
b. Median
c. Mode
d. Standard Deviation
70. To establish a cause-and-effect relationship between two
variables one should use _ design.
a. Correlational
b. Causal
c. Comparative
d. Experimental
71. Qualitative item analysis is usually done after _
a. Pilot testing
b. First trial run
c. Writing the first draft of test items
d. Revising the test items
72. Which of the following is TRUE?
a. A longer test is always more valid and reliable
than a shorter one
b. A shorter test may be more valid and reliable
than the original longer instrument
c. Shortening a test always compromises its
psychometric soundness
d. Length of tests has nothing to do with
reliability and validity
73. When all test takers fail an item, we can say
the following EXCEPT
a. It is an ideal item and it must be retained
b. It does not show individual differences
c. It does not affect the variability of scores
d. It is as undesirable as an item that every test
taker passes
74. The optimal item difficulty of a 4-choice
multiple item scale is _.
a. .600
b. .750
c. .800
d. .625
75. If a test is difficult for ALL the test takers, we
can also say that
a. The test floor is too low
b. The test fails to discriminate at the higher end
of the distribution
c. It lacks difficult items
d. The scores pile of the lower end of the
distribution
76. Applying the concept of item difficulty to
testing purpose, one should
a. Always aim for difficulty clustering around .50
b. Aim for item difficulty that suits the purpose of
the test
c. Create a test with homogenously difficult items
d. Create a test with homogenously easy items
77. Selo obtained a raw score of 15 out of 25 in the
counseling quiz while he is in the 88th percentile in the
group dynamics quiz. Which of the following is TRUE?
a. He did better in the counseling quiz than in the
group dynamics quiz
b. He did better in the group dynamics quiz than in the
counseling quiz
c. He did equally good in both quizzes
d. Nothing, we cannot compare the two scores
78. Joseph obtained a z-score of 1.5 while Athena
obtained a T-score of 67. We can say that
a. Athena did better than Joseph
b. Joseph did better than Athena
c. Athena and Joseph obtained the same score
d. Nothing, we cannot compare the two scores
79. Qualitative item analysis is to _ while quantitative
item analysis is to _
a. Content Validity; Face Validity
a. Criterion Validity; Construct Validity
b. Content Validity; Item Difficulty and Discrimination
c. Internal Consistency; Test-retest Reliability
80. In choosing test items, if one wishes to be effective in
predicting actual performance, one should
a. Choose items that correlate highly with the external
criterion
b. Choose items that correlate highly with the total score
c. Choose items that has very high internal consistency
d. Choose items that has high agreement among experts
81. _ is the diminished utility of an assessment tool for
distinguishing test takers at the high of end the ability or trait
being measured
a. Hawthorne effect
b. Aunt Fanny effect
c. Floor effect
d. Ceiling effect
82. In a/an _, the test is organized into subtests by category of
items
a. Age scale
b. Point scale
c. Likert scale
d. Guttman scale
Guttman Scale
Attitude towards Cheating
__ I will allow my classmate to copy my assignment
__ I will allow my classmate to copy my answers in a quiz
__ I will allow my classmate to copy my answers in a
major exam
83. Dr. Della would like to know the most typical diagnosis of
patients coming in the psychiatric emergency room. It would be
best to use _ because the data are in the _ scale.
a. Mode; Nominal
b. Mode; Ordinal
c. Median; Interval
d. Mean; Ratio
84. A drug test show that an individual is positive on the use of
drugs. But in reality, the person never really used any
substances, this is an example of
a. True positive
b. True negative
c. False positive
d. False negative
True Positive – Sinabi mong meron at meron
talaga

True Negative – Sinabi mong wala at wala talaga

False Positive (Type I Error)– Sinabi mong meron


pero wala talaga

False Negative (Type II Error) – Sinabi mong wala


pero meron talaga
85. A statistician accepted a false Ho. This is an example of
a. True positive
b. True negative
c. False positive
d. False negative
Reality
Ho is false Ho is true
Decision Accept Correct Type I Error
Decision
Reject Type II Error Correct
Decision
86. Typical ability tests (ex: personality, attitude, interest) is to
_ while maximal ability tests (ex: intelligence, aptitude,
achievement) is to _.
a. KR 20; Coefficient Alpha
b. Coefficient Alpha; KR 20
c. Coefficient Alpha; Interrater
d. KR 20; Pearson r
87. Individual achievement tests such as WAIS and SB5 are
classified under
a. Level A
b. Level B
c. Level C
d. Level D
Level A • Achievement Test
• Specialized Aptitude Test (skill based)

• Group Intelligence Tests


Level B • Personality Tests

• Projective Test
Level C • Individual Intelligence Tests
• Diagnostic Test

161
88. This type of scale have little to no reliance to language and
sometimes involve manipulation of objects or apparatus.
a. Verbal scales
b. Age scales
c. Point scales
d. Performance scales
89. One can obtain the coefficient of determination between 2
variables by squaring its correlation coefficient. It is TRUE that
a. The larger the correlation coefficient, the larger the
coefficient of determination
b. The larger the correlation coefficient, the smaller the
coefficient of determination
c. The correlation coefficient and the coefficient of
determination is always equal
d. None of the above
90. A group of seminar delegates were administered pretest and
posttest to assess learning. To know if there is difference
between pretest and posttest, one should use _.
a. Pearson r
b. Linear Regression
c. T-test for dependent samples
d. T-test for independent samples
91. The most challenging part of using parallel-forms reliability
is _.
a. Minimizing the effect of passage of time
b. Creating an alternate form that is equivalent to the first
c. Minimizing the heterogeneity of items
d. The disagreement between the scorers
92. If a client discloses information about wanting to harm
another person, should you warn the potential victim?
a. No, one should wait until the client takes action
b. No, information shared by the client should not be
disclosed regardless of any reason
c. Yes, one is free to disclose any information shared by the
client
d. Yes, practitioners have duty to warn when another person
may be involved in potential harm or danger
93. All are example of test bias EXCEPT
a. Filipinos tend to get a lower score on an American
intelligence test because of failure to comprehend test items
b. Only certain number of students were able to get a correct
answer on difficult items
c. Males were able to answer more items on an ability test
than females
d. Whites were able to get hired than any other race because
of high test scores
94. If one wishes to measure how much an individual prefer
taking a particular program in college, _ must be used
a. Interest
b. Aptitude
c. Personality
d. Intelligence
95. Aptitude is to _ while achievement is to _
a. Content validity; Concurrent validity
b. Predictive validity; Face validity
c. Predictive validity; Content validity
d. Construct validity; Content validity
96. Testing is _ while assessment is _
a. Nomothetic; Idiographic
b. Idiographic; Nomothetic
c. Subjective; Objective
d. Varied; Simple
97. When one test is valid on one group but it became no longer
valid when used on another group
a. Tracking
b. Drift
c. Validity Shrinkage
d. Co-norming
98. Factors such as test anxiety and testing condition that affects
test takers test scores are considered
a. Construct irrelevant variance
b. Random Error
c. Systematic Error
d. Both a and b
99. The tendency of a test taker to respond TRUE or AGREE to all
statements regardless if contradictory to previous answers
a. Acquiescence
b. Faking good
c. Faking bad
d. Guessing
100. A registered psychometrician is responsible of
a. Keeping confidentiality of test data
b. Ensuring fair and unbiased testing, interpretation and
evaluation
c. Proper and competent use of tests
d. All of above
101. This is the hypothesis being tested by the statistics
a. Research hypothesis
b. Null hypothesis
c. Alternative hypothesis
d. Directional hypothesis
Null hypothesis – hypothesis that there is no relationship
between variables in a population

Research/Alternative hypothesis - hypothesis that there


is a relationship between variables in a population
Alpha Level (significance level)
Probability value that states the
likelihood that a statistical result
occurred by chance.
*It is more difficult
to reject the null
hypothesis in a
two-tailed study
because it requires
a larger statistic to
reject
Two types of research hypothesis

Two-tailed (non-directional) hypothesis– hypothesis that


does NOT state the nature of relationship between the
variables in the population.

Research/Alternative hypothesis – hypothesis that states


the nature of relationship between the variables in the
population.
102. A researcher would like to know if the self-reported
happiness of one section of graduate school students differ from
the entire graduate school population. However, the only data
that the researcher have is the data of the section and not the
entire graduate school. In this scenario, it would be to use _
a. Independent samples t-test
b. ANOVA
c. Z-test
d. One sample t-test
e. Linear regression
Z-test
Parametric statistical tool that allows us to compare a
sample to the population from which it was drawn when
the population parameters are known.

Parameters needed: Population mean, Population


standard deviation
One-sample t-test
Parametric statistical tool that allows us to learn whether
a sample differs from the population from which it was
drawn when the population SD is unknown.

Parameters needed: Population mean, Population


standard deviation
Independent Samples t-test
Used to compare the means of two mutually exclusive
groups of people.
Used in a between subjects design – you are assigned to
one experimental group.
103. Maybelle would like to determine if there is a difference in
the amount of donation made by participants in the gratitude
group compared to the control group. It would be best to use the
_
a. Independent samples t-test
b. ANOVA
c. Z-test
d. One sample t-test
e. Linear regression
104. The effect size is the amount of variability in the dependent
variable that can be traced to the independent variable. Which
of the following does not show the effect size?
a. R2
b. r
c. Coefficient of determination
d. Cohen’s d
To get the R2 or the coefficient of determination,
just square the correlation coefficient or r

If r = .6 then 2
R =
.36
36% of the variance in y is accounted
for by x
105. Students enrolled in a review program will significantly
perform better than students that are not enrolled. This is an
example of _
a. Null hypothesis
b. One-tailed hypothesis
c. Two-tailed hypothesis
d. Both a and c
106. There is a correlation of .90 between the number of hours
studying and the scores in the licensure exam. We can say that
a. .19 percent of the variability in the scores can be accounted for in
the number of hours spent studying
b. .10 percent of the variability in the scores can be accounted for in
the number of hours spent studying
c. .90 percent of the variability in the scores can be accounted for in
the number of hours spent studying
d. .81 percent of the variability in the scores can be accounted for in
the number of hours spent studying
107. When the coefficient of determination is .64, we can say that
the remaining .36 can be accounted to _.
a. Error
b. Chance
c. Both of these
d. None of these
108. If you are going to correlate being a psychology student or not
to the scores in the general psychology final exam, it would be best
to use _.
a. Point Biserial r
b. Biserial r
c. Tetrachoric r
d. Pearson r
e. Spearman rho
109. A researcher obtained a correlation coefficient of .98 between
grit and conscientiousness. The two variables are considered to be
different constructs. This result is _.
a. Desirable, because it shows very high similarity between the two
constructs
b. Undesirable, because it is suggesting that the two variables,
although treated differently, may be in fact the same
c. Desirable, because it clearly shows that grit predicts
conscientiousness
d. Undesirable, because the correlation coefficient is positive
110. Privileged communication is _.
a. More legal than ethical
b. More ethical than legal
c. Equally legal and ethical
d. None of the above
111. Doc Manuel took and intelligence test that was developed
and normed in the year 2000. The interpretation of his raw
score is 50th percentile. In the year 2010, the test was re-
normed. He took the same test and obtained the same score.
Due to the Flynn effect, we can say that Doc Manuel’s score in
2010 is
a. Higher than 50th percentile
b. Lower than 50th percentile
c. Still in the 50th percentile
d. None of the above
112. Which of the following is TRUE about testing and
assessment?
a. Testing and Assessment requires equal competency
b. Testing is more varied in approach compared to
assessment
c. Assessment is used to arrive at a decision
d. Assessment can be administered either individually or by
group
113. Which of the following is the objective of collegiate
entrance exams?
a. To screen among college applicants who would most likely
finish the program
b. To measure previous learning from elementary, high
school and senior high school
c. To find the most suitable program for the individual
d. To know if the individual can think rationally and act
purposefully and deal effectively with the environment
114. This measure of internal consistency focuses on the
degree of differences between scores on items of a test.
a. Cronbach’s Alpha
b. Average Proportional Distance
c. Kuder-Richardson 20
d. Split-half
115. One weakness of Chronbach Alpha is
a. It tends to be affected by the subjectivity between the
raters
b. It compromises the reliability of the test after being split
into halves
c. It tends to be higher when a measure has more than 25
items
d. It tends to be lower when a measure has more than 25
items
116. A school would like to know if their entrance exam it a
good predictor of collegiate success by correlating entrance
exam scores and the students’ GWA. However, the school only
keeps the score of successful applicants and dispose the scores
of unsuccessful ones. The statistical treatment would not yield
ideal results because of _.
a. Test bias
b. Restriction of range
c. Unreliable entrance exam
d. Extraneous variables
Range
restriction
117. Reliability is a function of _
a. The test, not the score
b. The score, not the test
c. Both the test and the score
d. None of the above
118. _ is a subprocess of _
A. psychological testing; observation
B. psychological testing; interviewing
C. interviewing; psychological assessment
D. psychological assessment; observation
119. In which way psychological tests are better than other
assessment techniques?
A. are cheaper
B. have norms
C. have face validity
D. are suitable for measuring all psychological constructs
120. Psychological testing is _____ as part of psychological
assessment
A. rarely used
B. always used
C. over-used
D. used, if appropriate,
121. Before administering a psychological test, a
psychologist should ensure that
A. the test has local norms
B. the test does not have any copyright restrictions
C. the test has been reviewed in the Mental
Measurements Yearbook
D. the test is appropriate for use with the particular
client in terms of his/her demographics
122. A psychological report should
A. A directly and adequately answer the referral
question
B. Report the raw scores obtain by the individual in
different tests administered
C. Use technical terms
D. Be read only by the client
123. Which of the following statements is correct?
A. ethics is the same as morality
B. unlike laws, codes of ethics are readily amended
C. ethics is something that cannot be taught
D. psychologists who are not members of the
Psychological Association of the Philippines are
not bound by its code of ethics
124. The proportion of observed score variance
attributable to random error is known as
A. the reliability coefficient
B. the coefficient of nondetermination
C. the error coefficient
D. one minus the reliability coefficient
125. The domain sampling model proposes that
A. items in a test are a random sample from a
population of possible items
B. the only items possible have been used in the test
C. items have been sampled without replacement
D. the majority of items have the same content
126. Which of the following procedures does not yield an
estimate of the reliability of a test?
A. correlating the total of all even-numbered items with the
total of all odd-numbered items
B. correlating the total of items in the first half of the test
with the total of items in the second half of the test
C. correlating each item with the total score on the test
D. finding the average of the correlation of each item with
every other item
127. The concept of ‘domain sampling’ in the psychometric
theory of reliability refers to
A. sampling persons from the population with whom a test
may be used
B. sampling items from the population of possible items that
could be used in a test
C. sampling tests from the population of tests available to
measure a construct 18
D. sampling methods from the population that could be used
to construct a test
128. When a test has a high coefficient
alpha, it indicates that
A. the test has high generalizability
B. scores on the test are stable
C. the test has high internal consistency
D. the test has only one factor
129. In determining predictive validity we need to
have
A. a highly select group with respect to the
construct being assessed
B. a way of judging the appropriateness of the
content of the test items
C. another test of the same construct
D. a criterion relevant to performance on the test
but external to it
130. To show some evidence of construct validity
a test of moral development should
A. show differences between older and younger
children
B. show stability over the life span
C. show higher scores for adolescents than
adults
D. be unrelated to age trends
131. The following are responsibilities of a psychometrician
when referring a client to another professional. Which of the
following is incorrect?
a. Ensure that the recipient of referral is competent in
providing service
b. Ensure that the referral is consented by the client
c. Assess the adequacy of the client’s consent
d. Turnover all responsibility to the recipient of the referral
prior to the beginning of the service
132. Cattell’s T-data (Test data) which includes
responses to ambiguous stimuli. He referred to
this type of test as _.
a. Projective tests
b. Objective tests
c. Unstructured tests
d. 16PF
133. A person responded TRUE to both items:
1. I prefer to be alone during weekends
2. I would rather go to party than stay at home
This is an example of:
a. Faking Good
b. Acquiescence
c. Faking Bad
d. Central Tendency Error
134. If your responses to a test use frequency such as
never, rarely, sometimes, often. Then if would be
suitable to use
a. Chronbach’s alpha
b. Kuder-Richardson 20
c. Spearman Brown Formula
d. Average Proportional Distance
135. If your responses to a test use frequency such as
never, rarely, sometimes, often. Then if would be
suitable to use
a. Chronbach’s alpha
b. Kuder-Richardson 20
c. Spearman Brown Formula
d. Average Proportional Distance
136. Psychological tests
a. pertain only to overt behavior.
b. always have right or wrong answers.
d. measure characteristics of human behavior.
c. do not attempt to measure traits.
136. Tests that measure an individual's typical
behavior are called
a. ability tests.
b. personality tests.
c. intelligence tests.
d. group tests.
137. Structured personality tests
a. require you to produce something spontaneously.
b. require you to choose between two or more
alternative responses.
c. involve an ambiguous test stimulus about which
the response is structured..
d. involve an ambiguous test response.
138. If a particular test has been shown to
accurately predict success in a particular job, then
the test is said to be
a. valid.
b. structured.
c. ambiguous.
d. reliable.
139. A numerical ability test has face validity when
a. It covers prior learning in mathematics
b. It contains items with numbers, equations and
numerical problem solving
c. It is able to predict a test taker’s potential in
learning mathematics
d. It has items that correlate well with each other
140. Which of the following assumptions of psychological
assessment is correct?
a. Assessment procedures are essentially error free
b. One source of information is enough for the
assessment process
c. Psychological constructs can be measured
d. There is only one way to measure a construct
141. A person obtains a score that is equivalent to one
standard deviation above the mean. Assuming that the
distribution is normal, then his score is equivalent to the
following EXCEPT
a. Z-score of +1
b. 75th percentile
c. T-score of 60
d. Stanine of 7
142.