Anda di halaman 1dari 147

1.

Mengaplikasikan Analisis cara


taksiran formatif mengajar 4
untuk
mengenalpasti Penilaian
pencapaian hasil •Pemulihan/
pembelajaran pembetulan dan
2. Mengenalpasti pengukuhan/
teknik pengajaran pengayaan
dan pembelajaran
yang sesuai
…but have you answered the questions all
learners need to know?

 Where do I need to go?


 Why should I go there?
 How will I get there?
 How will I know when I’ve arrived?
Common Test Types And Characteristics
Type Advantages Disadvantages Best Utilized

True – False • Easy to construct • Can be ambigeous • To measure recall and


Yes - No •Can reinforce comprehension of facts
incorrect
information
Enables guessing
Multiple Choice • Easy to score and • Difficult to construct • To measure
statistically analyse • Enables students to comprehension
• Can be constructed to answer by process of • To measure higher
measure analyse and unintentionally hidden cognitive skills
synthesis of information clues
Matching •Popular with students • Difficult to construct • To measure
•Can be constructed to • Enables students to comprehension by
include broad range of answer by process of comparing information
information

Short Answer Open- •Easy to construct •Difficult to score as •To measure to recall of
Ended •Adaptable to specific more than one answer facts and specific
subject content may be correct knowledge

Fill in The Blank •Can be more focused •Diffuclt to score when •To measure recall of
and easily scored more than one answer facts and specific
may be correct knowledge
Essay •Easy to construct •Scoring is quite time •To measure application
•Enables students to consuming and higher cognitive
demonstrate a broad skills
knowledge base
Selected- response Short-answer test Essay test
test

Characteristics Objective; Choose Objective; Ask to Ask to discuss one


among supply into from or more related
alternatives; memory; Assess ideas according to
Assess foundational certain criteria
foundational knowledge
knowledge
Advantages Efficiency Relatively easy to Assess higher-level
write; Allow for abilities
breadth

Disadvantages Focus on verbatim Focus on verbatim Lack of


memorization memorization consistency of
grading
 Written test
- Selected-response tests
- Short- answer tests
- Essay tests
 Performance tests
- Direct writing assessments
- Portfolios
- Exhibitions
- Demonstrations
1. Learning as a quantitative increase in knowledge.
Learning is acquiring information or “ knowing a
lot”
2. Learning as memorising . Learning is storing
information that can be reproduced.
3. Learning as acquiring facts, skills and method that
can be retained and used as necessary.
4. Learning as making sense or abstracting meaning.
Learning involves relating parts of the subject
matter to each other and to the real world.
5. Learning as interpreting and understanding reality
in a different way. Learning involves
comprehending the world by re-interpreting
knowledge.
Type CCategory DeDescription

I Teacher- A Teaching as transmitting concepts of syllabus


Focused

B Teacher as transmitting teacher’s knowledge

II Student C Teacher as helping students acquire concepts of


Focused syllabus

D Teacher as helping students acquire teacher’s


knowledge
“ [ Teaching ] is a transfer of knowledge from
somebody who accumulates certain amount
of knowledge to people who are recipient[s]
of the knowledge” (Professor of Medicine)

* Focus on transfer of information


* Students’ prior knowledge not
considered
* Students are passive recipients
“I don’t give [students] recipes. I expect them to understand the
concepts behind what they’re doing and I expect them to learn so
they are able to do the experiments on their own. And you know,
problem solve and design experiments later. So I guess I’m always
teaching so they can function on their own as scientist. But they are
expected to know, the concepts and not just how you do it, but why
you do it, I guess.” ( Preventative Medicine)
 Teaching viewed as facilitative prosess, not a
matter of transmission
 Focus of teaching on helping students discover
knowledge that he as an expert already holds
 Student’s ability to understand the relationship
among provied concepts is a valued component
of leaning & professional development
The goal of teaching is to bring about changes in the student so that
they can better understand the given area, so it’s about changing the
conceptual representations of studentss such that they can go out and
not only be able to recive, go out on their own and recover more
information outside of the classroom but also generate their own
questions. ( Teacher of Linguistics)

 Teaching helps students develop & change their


conceptions of subject matter
 Does not expect student to adopt his own
woridview, but to create own perspective on
subject material
Result

Behavior

Learning

Reactions

Kirkpatrick’s Four
Levels of Evalution
Figure 1
1- Reactions : Measures how students have reacted to the
training – Program evalution sheets
2- Learning : Measures what students have learned from the
training – Individual pre- and post test for comparisons
3- Behavior : Measures whether what was learned is being
applied in the life – Observations and feedback from
others
4- Result : Measures whether the application of learning in
class is achieving result – difficult to measure

* Each successive level of evalution build upon the evalutions


of the previous level. Each successive level of evalutions
adds precision to the measure of effectiveness but
requires more time consuming analiysis and increased
costs.
Level 4 : Does it matter? Does it advance
strategy?
Level 3 : are the doing it (objectives)
consistently and appropriately?

++++++++++++++++++++++++++++
+++
Level 2 : Can they do it (objectives)? Do they
show the skills and abilities?
Level 1 : did they like the experience?
Satisfaction? Use? Repeat use?
• Student Records
- portfolio
- report cards
- info cards
- anecdotal notes
 Interest Inventory
 Observations
 KWL (focus on K and W)
 Class discussions
 Observational Checklist
 Anecdotal Notes
 Class work
 Conference Notes
(writing/reading)
 Questioning
Traditional
* paper/pencil test

Altenative
* Projects
* Portfolios
* Presentations
 Alternative Response
 Matching
 Multiple Choice
• Stem contains a declarative statement

• Response choice (true/false, yes/no,


right/wrong, correct/incorrect, fact/opinion,
agree/disagree

• Measures the ability to correct identify the


correctness of statements of fact, definitions
of terms, or statements of principles (simple
learning outcomes)
T F 1. The green coloring material in a
plant leaf is called cholophyll

Y N 2. Is 50% of 38 more than 18?

T F 3. The Constition of the United States is


the highest law of the country

T F 4. The earth is a planet

T F O 5. There are intelligent life forms


on other planets
• If you are the simply doing T/F stay away
from opinion statements
• Keep the stem clear and concise (avoid
complex sentences)
• Do not use subjective words such as
frequently, most, some, few, usually, often
ect
• Do not use absolute terms such as always,
never, all, none, or only
• Avoid the use of negative terms : no, not
• Keep a balanced response set
 Efficient
 Easy to construct
 Provides for a wide-sampling of material

 Limited to measuring at the knowledge level


 Susceptibility to guessing
 Measures simple associations/knowledge
level
 Student given a stem to match with correct
respose
Write the letter for the term in Column B that matches the
description in Column A. Each term is used only once
Column A Column B
___ A number divisble by itself and one A. Integer

___ A symbolic represntation of a whole number B. Irrational


number

___ Number that can be represnted by a ratio C. Numeral


of whole numbers

___ A positive or negative whole number D. Integer

E. Rational number
Read each example. The write the name of the literary
technique beside the example. A literary technique may be used
more than once
Personification Simile Metaphor Alliteration

____ The flame of Nadia’s newfound knowledge burned inside her

____ The kitten studied the ball of clay carefully-taking stock of its shape and
size- triyng to decide whether it was going to attack him

____ He helped himself to a helty helping of hash brown potatoes

____ He aparment was like the duty dump

____ Juan felt as light as air-filled with joy of the news

____ The stump sat upright, looking down over the clear-cut valley with disdain
 Advantages : compact easy form, easy to
construct, easy to score

 Limitations : only measures basic recall,


sometimes difficult to have enough items to
develop a homegeneous set
 Measures simple to
complex learning
goals/objectives
 Typical items includes a
stem and list of
distacters
• Stem must be clear and concise, the reader should know
what the answer should be without looking at the
responses
• Do not leave a dangling stem – use a blank or convert to a
question
• Blanks should be left near the end of the stem
• Convert stems to questions when possible
• Avoid negatives (not, no, except, leats, ect)
• Only one correct answer! Avoid “A and B”, “B and C”, “B and
A but not C”, “All of the above,” “None of the above”
• Responses should be grammatically correct, approximately
the same length
• All distracters should be plausible
• Avoid clues in the stem
 Who was the main character of the story?
a. Granpa Jones
b. Granma Jones
c. Cousin Ralph
d. Anabel Jones
 Mary had tickets to the movies. Each tickets
cost 6 dollars.
What was the total cost of tickets?
A. 12 dollars
B. 18 dollars
C. 24 dollars
D. 30 dollars
• Advantages : Measures a variety of learning
outcomes (knowledge – application), specific
item – eliminates vagueness, forces student
to know what is correct, greater reliability
compared to alternative respons items, can
identify misconceptions
• Limitations : Does not move beyond
application phase, writing appropriate
distracters can be difficult
 Completion
 Short – answer
 Essay
 Explanations
 Writing Prompt
• Short Answer items uses a direct question or
task
• Completion items uses consists of an
incomplete
• Used for measuring a wide variety of simple
learning outcomes, knowledge of
terminology, facts, principle, methods or
procedures, interpretation of data, solving
numerial problems
• What is the name of the man who invented
the steam boat?
__________ __________
(first name) (last name)

 What device is used to detect whether an


electric charge is positive or negative? _______

 Name three organs in the digestive system.


__________ __________ _________
 The name of the man who invented the
steamboat is _____________

 A member of the United States Senate is


elected to a term of ___________ years
 Easy to construct
 Student must supply the answer, limit
guessing

 Not suitable for measuring complex learning


outcomes (knowledge/comprehension)
 Difficulty of scoring (partial answer)
 Do not start the item with a blank
 Provide enough clues to lead to the correct
answer
 Do not include too many blanks (at most two
blanks)
 Blanks for answer shoulf be equal in length
 If answer is expressed in numerial units
indicate the type of answer ___lb. ___oz
 Response elicits one or more paragraphs
from a student
 Measures student’s ability to synthesize,
evalute, and compose
 Two types – Restricted response
- Extended response
Restricted Response Restricted Response
 Limits the form and • Few boundaries
content of response • More extensive
response
Example
• May want to limit
In a paragraph,
leght (“use no more
describe two
functions of the than two pages”)
digestive system. (6
points) Example
 Nurture concise response, convey to students
celar expectations for rsponse
- structure questions “name two” “list tree”
“in a paragraph…”
 Provide a value to each item
- “2 points” versus “5 points”
 Proof read your items carefully
 Is the language grade level appropriate?
 Is the layout grade level appropriate?
 Did you provide a space for name and date?
 Check for aligment with intructianal
objectives!
 Make sure each item measures an intrustional
obejctive
 The assessment should consist of more than
one item per objective
 Evaluates the quality of each item
 Rationale : the quality of items determines
the quality of test (i.e., reliability & validity)
 May suggest ways of improving the
measurement of a test
 Can help with understanding why certain
tests predict some criteria but not others
•When analyzing the test item, we have several
questions about the performance of each item.
Some of these questions include :
•Are the item congruent with the test objectives?
•Are the item valid? Do they measure what they
supposed to measure?
•Are the item reliable? Do they measure consistently?
•How long does it take an examinee to complete each
item?
•What item are most difficult to answer correctly?
•What item are easy?
•Are they any poor performing items that need to be
discarded?
 Three major types :

1. Assess quality of the distracters

2. Assess difficulty of the items

3. Assess how well an item differentiates between


high and low performers
 To select the best available items for the final
form of the test.
 To identify structural or content defects in
the items.
 To detect learning difficulties of the class as a
whole
 To identify the areas of weakness of students
in need of remediation
1. Examination of the difficulty level of the
items.

2. Determination of the discriminating power


of each item, and

3. Examination of the effectiveness of


distractors in a multiple choice or
matching items.
Index of difficulty is the percentage of students
answering correctly each item in the test
Index of discrimination refer to the percentage of
high-scoring individuals responding correctly
versus the number of low-scoring individuals
responding responding correctly to an item.
This numeric index indicates how effectively an item
differentiates between the students who did well
and those who did poorly on the test.
1. Arrange test score from highest to lowest.

2. Ger one-third of the papers from the


highest scores and the other third from the
lowest scores.

3. Record separately the number of times


each alternative was chosen by the
students in both groups.
4. Add the number of correct answers to
each item made by the combined upper and
lower groups.
5. Compute the index of difficulty for each
item, following formula :
IDF = (NRC/TS)100

Where IDF = index of difficulty


NRC = number of students responding correctly to an item
TS = total number of an students in the upper and lower
groups.
6. Compute thee index of discrimination,
based on the formula :
IDN = (CU – CL)
NSG
Where IDN = index of discrimination
CU = number of correct responses of the upper group
CL = number of correct responses of the lower group
NSG = number of student per group
The difficulty index of a test items tells a
teacher about the comprehension of or
performance on material or task contained
in an item.
For an item to be considered a good item, its
difficulty index should be 50%. An item
with 50% difficulty index is neither easy nor
difficult.
If an item has a difficulty index of 67.5%, this
means that it is 67.5% easy and 32.5%
difficult.
Information on the index of difficulty of an
item can help a teacher decide whether a
test should be revised, retained or
modified.
Range Difficulty Level

20 & below Very difficult


21-40 Difficult
41-60 Average
61-80 Easy
81 & above Very easy
 The Index Of Discrimination tells a teacher the
degree to which a test item differentiates the high
achievers from the low achievers in is class. A test
item may have positive or negative discriminating
power.
 An item has a positive discriminating power when
more student from the upper group got the right
answer than those from the lowest group.
 When more student from the upper group got the
correct answer on an item than those from the
upper group, the item has a negative
discriminating power.
There are instance when an item has zero
discriminating power – when equal number
of students from upper and lower group got
the right answer to a test item.
In the given example, item 5 has the highest
discriminating power. This means that it
can differentiate high and low achievers.
Range Verbal Description

.40 & above Very Good Item


.30 - .39 Good Item
.20 - .29 Fair Item
.09 - .19 Poor Item
A test item can be retained when its level of
difficulty is average and discriminating
power is positive.
It has to rejected when it is either easy / very
easy or difficult / very difficult and its
discriminating power is negative or zero.
An item can be modified when its difficulty
level is average and its discrimination index
is negative.
An ideal item is one that all student in the
upper group answer correctly and all
students in the lower group answer
wrongly. And the responses of the lower
group have to be evenly distributed among
the incorrect alternatives.
 Encourage teachers to undertake an item analysis
as often as practical
 Allowing for accumulated data to be used to make
item analysis more reliable
 Providing for a wider choice of item format and
objectives
 Facilitating the revision of items
 Accumulating a large pool of items as to allow for
some items to be shared with the students for
study purposes.
 It cannot be used for essay items.
 Teacher must be cautious about what
damage may be due to the table of
specifications when items not meeting the
criteria are deleted from the test. These
items are to be rewritten or replaced.
 Generally, student who did well on the
exam should select the correct answer to
any given item on the exam.
 The Discrimination Index distinguishes for
each item between the performance of
students who did poorly.
 for each item, subtract the number in the
lower group who answered correctly from
the number of students in the upper group
who answered correctly.
 Divide the result by the number of students
in one group.
 The discrimination Index is listed in decimal
format and ranges between -1 and 1.
Number of correct answers in group Item
Item
Discriminati
no.
Upper 1/4 Lower 1/4 on Index

1 90 20 0.7
2 80 70 0.1
3 100 0 1
4 100 100 0
5 50 50 0
6 20 60 -04
 Use the following table as a guideline to
determine whether an item ( or its
corresponding instruction) should be
considered for revision.
Item Discrimination Item Difficulty
(D)
High Medium Low
D = < 0% review review review

0 % < D < 30 % ok review ok

D > = 30 % ok ok ok
First question of item analysis : how many
people choose each response?
If there only one best response, then all other
response options are distracters.
Example from in class assignment (N=35):
Which method has best internal consistensy ?
a) Projective test 1
b) Peer ratings 1
c) Forced choice 21
d) Differences n.s. 12
 A perfect test item would have 2 characteristics :
1. Everyone who knows the item gets it right
2. People who do know the item will have responses equality
distributed across the wrong answer.

 It is not desirable to have one of the distracters chosen more


often then the correct answer.

 This result indicates a potential problem with the question.


This distracters may be too similar to the correct answer and
/or these maybe something in either the stem or the
alternatives that is misleading.
 Calculate the # of people expected to choose each
of the distracters. If random same expected
number for each wrong response (Figure 10-1).

# of Persons N answering incorrectly 14


Exp. To Choose ___________________ = __ =4.7
Distracter number of distracters 3
When the number of person choosing a distracter
significantly exceeds the number expected, these
are 2 possibilities:

1. It is possible that choice reflects partial


knowledge
2. The item is a poorly worded trick question

 Unpopular distracters may lower item and test


difficulty because it is easily eliminated
 Extremely popular likely to lower the reliability and
validity of the test
 Compare the performance of the highest
and lowest scoring 25% of the student on
the distracter option (i.e. the incorrect
answers presented on the exam)
 Fewer of the top performers should choose
each of the distracters as their answer
compared to the bottom performers.
Item 1 A B C D E Omit
% of student in upper 1/4 2 5 0 0 0 0
0
% of student in middle 1 1 1 1 5 0
5 0 0 0
% of student in lower 1/4 5 5 5 1 0 0
0
Item 2 A B C D E Omit
% of student in upper ¼ 0 5 5 1 0 0
5
% of student in middle 0 1 1 5 2 0
0 5 0
% of student in lower 1/4 0 5 1 0 1 0
0 0
 What is the purpose of a good distracter?

 Which distracters should you consider


throwing out?
 Review the sample report.
 Identify any exam items that may require
revision.
 For each identify item, list your observation
and hypothesis of the nature of the
problem.
Multiple Choice Exam Strategies
-improve odds by eliminating 1 or more
infeasible or unlikely answer options

Description Exam Strategies


-brain dumping
-part marks
-consideration for perfect answers to
questions that were not asked
Depends on the number
of answer options per question
and the number of questions!
Percent Pass ( >50%) by Chance
Number of
Questions 2 choice 3 choice 4 choice 5 choice

1 50 33 25 20

2 75 56 44 36

4 69 41 26 18

6 66 32 17 10

10 62 21 8 3

20 59 9.2 1.4 .3

50 56 1 .01 .0004
Negative Marking…
- Elimination strategy reduces odds of
wrong answer penalty
- subtracting a percentage of the number
of wrong answer obtained from the final
grade
- give a grade of 4 a correct answer and a
score of – 1 for a wrong on a 4 choice
question
- A score of less than zero is possible
-students hate negative marking
-negative marking is not practised in
descriptive examinations
- A poor substitute for a test that is too short
with too few answer options
Educational Measurement
and Evaluation

Myrna E. Lahoylahoy, Ph.D.


 Process of quantifiying individual’s achievement,
personality, attitudes, habits and skills
 Quantification appraisal of observable
phenomena
 Process of assigning symbols to dimensions of
phenomena
 An operation peformed on the physical world by
an observer
 Process by which information about the
attributes or characteristics of things are
determined differentiated
 Qualitative aspect of determining the outcomes of
learning.
 Process of ranking with respect to attributes or trait
 Appraising the extent of learning
 Judging effectiveness of educ. Experience
 Interpreting and analyzing changes in behavior
 Describing accurately quantity and quality of thing
 Summing up results of measurement or tests giving
meaning based on value judgments
 Systematic process of determining the extent to which
instructional objectives are achieved
 Considering evidence in the light of value standard
and in terms of particular situations and goals which
the group of individuals are striving to attain
 TESTING- a technique of obtaining
information needed for evolution purposes

◦ Test, Quizzes, measuring, instruments- are devices


used to obtain such information
1. INSTRUCTIONAL
a)principal (basic purpose)
-to determine what knowledge, skills,
abilities, habits and attitudes have been
acquired
-to determine what progress or extent of
learning attained
-to determine strengths, weaknesses,
difficults and needs of students
1. Evaluation assesses or make appraisal of
-Educational objectives, programs, curricula,
instructional materials, facilities
- teacher
- Learner
-Public relations of the school
- achievement scores of the learner
2. Evaluation conducts research
 Evaluation should be
1. Based on clearly stated objectives
2. Comprehensive
3. Cooperative
4. Used Judiciously
5. Continuous and integral part of the
teaching-learning process
1. Diagnostic Evaluation-detects pupil’s
learning difficulties which somehow are not
revealed by formative tests. It is more
comprehensive and specific
2. Formative Evaluation- it provides feedback
regarding the student’s performance in
attaining instructional objectives. It identifies
learning errors that needed to be corrected
and it provides information to make
instruction more effective
3.Placement evaluation- it defines student’s
entry behaviors. It determines knowledge and
skills he possesses which are necessary at the
beginning of instruction
4. Summative Evaluation-it determines the
extent to which objectives of instruction have
been attained and is used for assigning
grades/marks and to provides feedback to
students.
1. VALIDILITY
content, concurrent, predictive, construct
2. RELIABILITY
adequacy, objectivity, testing condition,
test administration procedures
3. USABILITY
(practicality) ease in administration,
scoring, interpretation and application, low
cost, proper mechanical make-up
Content validity- face validity or logically
validity used in evaluating achievement test
Concurrent validity- test agrees with or
correlates with a criterion (ex. Entrance
examination)
predictive validity-degree of accuracy of
how activity which it intends to foretell
Construct validity-agreement of the test
with a theoretical construct or trait (ex.IQ)
 Methods of estimating reliability
1. Test –retest Method (uses spearmen rank
correlation coefficient)
2. Parallel forms/alternate forms (paired
observations are correlated)
3. Split-half method (odd-even halves and
computed using spearmen brown formula)
4. Internal-consistency method (kuder -
Richardson formula 20)
5. Scorer reliability method(two examiners
independently score a set of test papers then
correlate their scores)
1. Standard Tests
a) psychological test-intelligence test,
Aptitude test, Personality (rating scale)

test, vocational and professional interest


inventory
b) Educational Test
2. Teacher-made test
planning, Preparing, Reproducing,
Administering , Scoring, Evaluating,
Interpreting
Norm-Reference Tests
It compares a student’s performance of other
students in the class
It uses the normal curve in distributing grades of
students by placing them either above or below
the mean.
The teacher’s main concern is the variability of the
score.
The more variable the score is the better because it
can determine how individual differs from the
other.
Uses percentiles and standard scores
It tends to be of average difficulty.
 Measures of central
Tendency
Mean, Median, Mode
 Measures of Variability
Range, Quartile Deviation, Standard
Deviation
MODE-the crude of inspectional average
measure. It is most frequently occurring score.
It is the poorest measure of central tendency.
Advantage: Mode is always a real value since it
does not fall on zero. It is simple to
approximate by observation for small cases. It
does not necessitates arrangement of values.
Disadvantage: it is not rigidly defined and is
inapplicable to irregular distribution
What is the mode of these scores?
75,60,78,75 76 75 88 75 81 75
MEDIAN-the scores that divides the distribution
into halves. It is sometimes called the counting
average.
Advantage : it is the best measure when the
distribution is irregular or skewed. It can be located
in an open – ended distribution or when the data is
incomplete (ex. 80% of the cases is reported)
Disadvantage: It necessitates arranging of items
according to size before it can be computed
What is the medium?
75,60,78,75 76 75 88 75 81 75
MEAN-The most widely used and familiar
average. The most reliable and the most stable
of all measures of central tendency
Advantage: It is the best measure for regular
distribution.
Disadvantage: It is affected by extreme values
What is the mean?
75,60,78,75 76 75 88 75 81 75
 It is the most important and the best measure
of variability of test scores.
 A small standard deviation means that the
group has small variability or relatively
homogeneous.
 It is used with mean.
Letter grade Criterion- Norm- Self-referenced
Referenced referenced

B Very Good or Very Good; Very Good;


Proficient; performs above some
complete the average of improvement on
knowledge of the class most or all the
most content, objectives
skills ; mastery
of most
objectives
C Acceptable or Average; Acceptable;
basic; command performs at the some
of only the basic class average improvement on
content skills; some of the
mastery of some objectives
objectives
Letter Grade Criterion- Norm- Self-Referenced
referenced referenced
D Lacking ; little Poor ; below the Lacking ;
knowledge of class average minimal
most content; progress on
master of only a most objectives
few objectives
F Unsatisfactory; Unsatisfactory; Unsatisfactory;
lacks knowledge far below the no improvement
of content; no class average; on any
mastery of among the objectives.
objectives worst in the
class
- What meaning should each grade symbol carry?
- What should “failure” mean?
- What elements or performances should be
incorporated?
- How should the grades in a class be distributed?
- What should the components be like that go into
a final grade?
- What method should be used to assign grades?
- Should borderline cases be reviewed?
- What other factors can influence the philosophy
of grading?
 Grade: A symbol that represents the degree to
which students have met a set of well-defined
instructional objectives.
 Absolute Grading: Absolute grading, or criterion-
referenced grading, consists of comparisons
between a student’s performance and some
previously defined criteria. Thus, student’s are
not compared to other students. When using
absolute grading, one must be careful in
designing the criteria that will be used determine
the student’s grades.
 Relative Grading:
-relative grading, or norm-referenced grading
-consists of comparisons between a student and
others in the same class, the norm group.
- those that perform better than most other
students that will be assigned certain grades.
. If using the normal curve in relative grading
then 3.6% of the students should be assigned As,
23.8%Bs, 45.2%Cs, 23.8%Ds, and 3.6% Fs.
-emphasizes competition among group members
and does not accurately reflect any objective level of
achievement.
 Growth Grading : (self-referenced grading)
-consists of comparisons between a
student’s performance and their perceived
ability/capability.
. Overachievers would be assign highed
grades, while underachievers would be
assigned lower grades.
-Growth grading, while de-emphasizing
competition, tends to produce invalid
grades relative to achievement levels.
 Advantages
 easy to use
 Easy to interpret(theoretically)
 Concise

• Disadvantages
 Meaning of a grade may very widely
 Does not address strengths & weaknesses
 K-2 student’s may feel threatened by them
Advantages
- Easy to use.
- Easy to interpret (theoretically)
- Concise
- More continuous than Letter Grades
- May be combined with Letter Grades
Disadvantage
- Meaning of grade may vary widely
- Does not address strengths & weaknesses.
- K-2 students may feel threatened by them
- Meaning may need to be explained/interpreted.
 Advantages
- less emotional for younger students.
-can encourage risk taking for students
that may not want to take the course for a
grade
 Disadvantages
- Less reliable than a continuous measure
- Does not contain much information relative to a
student’s achievement.
 Advantages
- results in a detailed list of student
achievements.
- may be combined with other measures.
 Disadvantages
- may become too detailed to easily
comprehend.
-Difficult for record keeping
 Advantages
- Involves a personal discussion of
achievement.
- May be used as a formative, ongoing
measure
 Disadvantages
- Teachers needs to be skilled in discussion and
offering+ and-feedback.
- Time consuming.
-Some students may feel threatened.
- Difficult for record keeping.
 Advantages
- Involves personal discussion of achievement and
may alleviate misunderstanding.
-Teacher can show samples of work and rational for
assessment.
-May improve relations with parents.
• Disadvantages
- teachers need to be skilled in discussion and
offering=and- feedback
-time consuming
-may provoke parent-teacher anxiety
-may be inconvenient for parents
-Difficult for record keeping
 Advantages
- most useful as an addition form of
communication
 Disadvantages
- short letters may not adequately communicate
a student’s achievement.
- require good writing skills
-time consuming.
 Discuss with students ( and parents when
approprite0 the basis of all grading, and all
grading procedures, at the beginning of the
course/school year
 Grades should reflect, and be based on,
student’s level of achievement, using only
those assessments that validly measure
achievement
 Grade should reflect, and be based on, a
composite of several valid assessments.
 When combining several valid assessments,
each assessment should be appropriately
weighted
 An appropriate type of grading framework
should be adopted, given the ultimate use of
the grade.
 All borderline grades should be re-evaluated
based on a careful examination of all
achievement evidence
 Emphasize fair grading and scoring.
 Grade relative to specific learning objectives.
 Base grades primarily on current
performance.
 Provide accurate, timely and helpful feedback.
 Use a sufficient number of assessments.
 Don’t lower grades due to misbehaviors or
attendances.
 Use professsional judgment.
 Harmful to a students psyche.
 Do not motivate but may provide disincentive
 Mastery may not be the purpose of the
activity-or 100% performance may be
necessary
 Performance may be necessary to determine
acquisition of skill (e.g.,piano, computer)
 Written activities do not emphasize oral
communication which may be a more
functional skill
 There are vast differences in grading
practices between teachers and schools.
 Most schools lack a standardized and
codified grading policy.
 A grade, a simple symbol, is incapable of
conveying the complexity of a student’s
achievement.
 Grading is not always valued by teachers and
thus often suffers from carelessness.
 Teachers often use grading as form of
discipline and motivation, rather than as an
assessment report
 Select content
 Develop of instructional strategy
 Develop and select instructional materials
 Constructs tests and other instruments for
assessing and evaluating
 Improve you as a teacher, and our overall
program
 Learning outcomes Formula
 Bloom’s Taxonomy
 Characteristic of Good Learning Outcomes
 Learning Outcomes Exercise
 Write your Learning Outcomes
5 Questions for Instructional Design

1. What do you want the student to be able to do?


(outcome)
2. What does the student need to know in order to
do this well? (curriculum)
3. What activity will facilitate the learning?
(pedagogy)
4. How will the student demonstrate the learning?
(assessment)
5. How will I know the student has done this well?
(criteria)
This question asks you to develop the
outcome.

For example:
Students identifies, consults and evaluates
reference books appropriate to the topic in
order to locate background information and
statistics.
 Bad Outcome
- Use Illiad and Texshare in order to access
materials not available at UT Arlington
Library.
 Good Outcome
- Utilize retrieval services in order to obtain
materials not owned by UT Arlington library.
 Bad Outcome
- Students will construct bibliographies and
in-text references using discipline
appropriate styles in order to contribute to
academic discourse in their discipline.
 Good Outcome
- Construct bibliographies and in-text
references using discipline appropriate styles
in order to correctly attribute other’s work
and ideas.
 We’re taking a friend camping for the first
time (not roughing it too much).
 What do they need to know?
 We’ll concentrate on how to build a fire
 Why do we want our friend to be able to
properly build a fire?
 Now let’s write the learning outcome
 What is our verb (use Bloom’s)
 Why?
A test is reliable when it yields consistent results.
To establish reliability researchers establish
different procedures :

1. Split-half Reliability: Dividing the test into two


equal halves and assessing how consistent the
scores are.
2. Reliability using different tests: Using different
forms of the test to measure consistency
between them.
3. Test-Retest Reliability : Using the same test on
two occasions to measure consistency.
Reliability of a test does not ensure validity. Validity of a test refers to what
the test is supposed to measure or predict.

1. Content Validity: Refer to the extent a test measures your definition of


the construct
2. Criterion-related validity: Relationship between scores on a test and an
independent measure of what the test is supposed to measure

1. predictive Validity: Refers to the function of a test in predicting a


particular behavior or trait. For instance, we might theorize that a
measure of math ability should be able to predict how well a person
will do in an engineering-based profesion.
2. Convergent Validity: we examine the degree to which the
operationalization is similar to (converges on) other
operationalizations we might correlate the scores on our test with
scores on other tests that purport to measure basic math ability,
where high correlations would be evidence of convergent validity.
Test score distribution
Test score distribution (average group)
Test score distribution (poor group)
Test score distribution (good group)
 Assessment – The process of measuring
something with the purpose of assigning a
numerical value.
 Scoring – The procedure of assigning a
numerical value to assessment task.
 Evaluation – The process of determining the
worth of something in relation to established
benchmarks using assessment information.
 Formative – for performance enhancement
 Summative - for performance enhancement
 Formal – quizzes, tests, essays, lab
reports,etc.
 Informal – active questioning during and at
end of class
 Traditional – tests, quizzes, homework, lab
reports, teacher
 Alternative – PBL’s, presentations, essays,
book reviews, peers
 Alternative to what? Paper & pencil exams
 Alternatives:
-lab work / research projects
-portfolios
-presentations
-research paper
-essays
-self-assessment / peer assessment
-lab practical
-classroom “clickers” or responder pads
 Rube Goldberg projects
 Bridge building / rocketry / mousetrap cars
 Writing a computer program
 Research project
 Term paper
 Create web page
 Create movie
 Role playing
 Building models
 Academic competitions
 Quick-fire questions
 Minute paper
 1) what did you learn today?
 2) what questions do you have?
 Directed paraphrasing (explain a concept to a
particular audience)
 The “muddiest” point (what is it about the
topic that remains unclear to you?)
 The National Science Education Standards
draft (1994) states, “Authentic assessment
exerxuces require students to apply scientific
information and reasoning to situations like
those they will encounter in the world outside
the classroom as well situations that
approximate how scientists do their work. ”
 Validity – is the test assessing what’s
intended?
- are test items based on stated objectives?
- are test items properly constructed?
 Difficulty – are questions too hard? (e.g.,30%
to 70% of students should answer a given
item correctly)
 Discriminability – are the performance on
individual test item positively correlated with
overall student performances? (e.g., only best
students do well on most difficult questions)
 Based on a predetermined set of criteria.
 For instance,
-90% and up = A
-80% to 89.99% =B
-70% to 79.99 =C
-60% to 69.99% =D
-59.99% and below =F
 Pros:
 Sets minimum performance expectations.
 Demonstrate what students can and cannot
do in relation to important content-area
standards (e.g, ILS)
 Cons:
 Some times it’s hard to know just where to
set boundary conditions.
 Lack of comparison data with other students
and/or schools.
 Based upon the assumption of a standard
normal (Gaussian) distribution with n>30.
 Employs the z score:
- A = top 10% (z>+1.28)
- B = next 20% (+0.53<z<+1.28)
- C = central 40% (-0.53<z<+0.53)
- D = next 20% (-1.28<z<-0.53)
- F = bottom 10% (z<-1.28)
 Pros:
-Ensures a “spread” between top and bottom of
the class for clear grade setting.
-Shows student performance relative to group.
-Con: In a group with great performance, some
will be ensured an “F”.
 Cons:
-Top and bottom performances can sometime be
very close.
-Dispenses with absolute criteria for
performance.
-Being above average does not necessarily imply
“A” performance.
 Norm-Referenced:
-Ensures a competitive classroom atmosphere
-Assumes a standard normal distribution
-Small-group statistics a problem
-Assumes “this” class like all others

 Criterion-Referenced:
-Allows for a cooperative classroom atmosphere
-No assumptions about form of distribution
-Small group statistics not a problem
-Difficult to know just where to set criteria

Anda mungkin juga menyukai