Test Scores and Teacher Selection

TEACHERS COLLEGE, COLUMBIA UNIVERSITY
TEST SCORES AND TEACHER SELECTION
AN EMPIRICAL ANALYSIS FOR TURKEY
M. ALPER DINCER
4/26/2011
[In 2002 in Turkey, a decentralized model of teacher hiring was replaced with a teacher selection
model which operates through centralized testing. This study evaluates the impact of this new
teacher selection policy on mathematics and science test scores of 8 th graders. The findings show
that a 0.17 standard deviation increase in test scores can be attributed to the new teacher
selection policy and the estimated impact is much higher for below median achievers and
students with female teachers. The findings also provide evidence exhibiting that the new teacher
selection policy assigns more teachers to relatively poor schools and classrooms.]
1. Introduction: test scores
The primary and secondary education systems in Turkey have been undergoing a
restructuring since late 1990s in response to swift developments in the formation of its
economy and the demographics of its young population. One of the main goals of this
restructuring is to increase the quality of learning outcomes in Turkey (Aksit, 2007).
Thus it is important to investigate empirically whether these reform efforts achieve the
intended outcomes or not.
The Trends in International Mathematics and Science Study (TIMSS) and Program for
International Student Assessment (PISA) periodically measure the student achievement
on an international scale and assemble information about students, their families and
schools. Thus with the help of these projects it is possible to track student achievement in
participating countries and make cross-country comparisons. Therefore these projects
provide the necessary data in order to analyze the trend of learning outcomes in Turkey.
A representative set of the student body in 8 th grade which is the final grade of mandatory
schooling in Turkey participated in TIMSS 1999 and 2007. The average mathematics and
science scores of students in Turkey in 1999 were 429 and 433 whereas the international
average scores were 487 and 488, respectively. Similarly the average mathematics and
science scores of students in Turkey in 2007 were 433 and 454 whereas the international
average scores were 488 and 500, respectively. Thus the students in Turkey performed
lower than the average international student achievement. The following table gives the
percentages of students reaching the TIMSS international benchmarks:

Table 1 the percentages of students reaching the TIMSS international benchmarks
Advanced High Intermediate Low
1999 1 6 20 38
Mathematics
2007 5 10 18 26
1999 1 5 19 37
Science
2007 3 13 24 31
Source: (Martin et al., 2001a), (Martin et al., 2001b), (Martin, Mullis, Foy, & Olson,
2008a), (Martin, Mullis, Foy, & Olson, 2008b)
As a cautionary note, it should be stated that these percentages are not directly
comparable between 1999 and 2007 for Turkey (Martin, et al., 2008a, 2008b). However
these figures present the same pattern in mathematics and science for the students in
Turkey. There are more students in advanced and high international benchmark levels
and there are fewer students in low international benchmark levels 1.
PISA offers more definitive information about the trend of learning outcomes of students
in Turkey. Similar to TIMSS PISA measures the reading, mathematics and science test
scores of a student body which is representative for the 15-year old student population in
each participating country. Turkey has participated PISA in 2003, 2006 and 2009 and the
trend in mathematics score is comparable between 2003 and 2009 and the trend in
science test score is comparable between 2006 and 2009 (OECD, 2010).
According to PISA results average mathematics score of 15-year old students in Turkey
increased by 22 points (more than 0.2 standard deviation) and average science score of
1
For the description of these benchmark proficiency levels please see (Martin, et al., 2008b) and
(Martin, et al., 2008a).
15-year old students in Turkey increased by 30 points (approximately 0.3 standard
deviation) (OECD, 2010).
PISA data also shows that in which segment of the student body these improvements
occurred. The percentage of students who falls below the proficiency level 2 decreased
from 52 to 42 in mathematics and in science the same percentage dropped from 47 to 30.
On the other hand the percentages of top performers did not show any increase or
decrease between the respective periods (Figure 1).
These figures on the trend of the average student achievement in mathematics and
science in Turkey highlight at least three important facts. First, for a period which follows
1999, average student achievement in mathematics and science is increasing for the
student population which is either in grade 8 or 15 years old. Second, this increase in
average student achievement is not homogenous. Indeed it is much more intensive on the
lower end of the student achievement distribution in these subjects. Third, these
improvements in average student achievement in Turkey are not due to inflation in test
score scales; average performance of students in Turkey is converging to international
benchmarks as it is defined either by TIMSS or PISA. This convergence is pretty quick at
least according to the measure PISA provided.
These facts immediately raise several questions: Are these changes in student
achievement related to restructuring in the education system in Turkey? If yes, which
aspects of the reform initiative in Turkey did lead to higher learning outcomes in
mathematics and science? Is it possible to identify the channels through which the policy
intervention leads to increases in student achievement? This study attempts to offer some
candidate answers to these questions.

Score point change in science performance between 2006 and 2009 Score point change in mathematics performance between 2003 and 2009
-25
-20
-15
-10
10
15
20
25
30
35
40
-5
0
5
-20
-15
-10
10
15
20
25
30
35
-5
0
5
Mexico 0
Qatar 0 Brazil 0
Turkey 0 Turkey 1
Portugal 0
Greece 0
Korea 0
Tunisia 0 Portugal 0
Brazil 0 Italy 0
Colombia 1
Italy 0
Tunisia 0
Norway 1 Indonesia 5
United States 3 Germany 4
Poland 2
Romania 10
Switzerland 15
Argentina 23 Serbia 29
Source: (OECD, 2010)

Chile 11 Poland 29
Japan 14
Kyrgyzstan 12
Uruguay 33
Figure 1: PISA indicators
Serbia 12 United States 37

Hong Kong-China 13 Hong Kong-China 46
Mexico 13
Bulgaria 56 Korea 47
Switzerland 31 Norway 48
Iceland 15 Thailand 74
Germany 38
Latvia 38 Liechtenstein 97
Thailand 34 Hungary 97
Lithuania 47
OECD average-28 98
Denmark 47
France 59 Russian Federation 92
OECD average -33 24 Latvia 78
Slovak Republic 70
Slovak Republic 76
New Zealand 72
Israel 86 Spain 67
Australia 93 Macao-China 58
Macao-China 94
Finland 28
Spain 97
Ireland 95 Luxembourg 10
Uruguay 84 New Zealand 27
United Kingdom 80
Japan 36
Russian Federation 83
Hungary 79 Canada 7
Liechtenstein 70 Iceland 0
Luxembourg 43
Australia 1
Netherlands 69
Greece 57 Denmark 1
Estonia 43 Netherlands 5
Belgium 39
Belgium 0
Canada 11
Jordan 21 France 0
Croatia 13 Sweden 0
Slovenia 2
Ireland 0
Sweden 6
Azerbaijan 6 Czech Republic 0
Finland 2
Montenegro 0
Indonesia 14
Chinese Taipei 2
Czech Republic 2 Percentage of students below proficiency Level 2
0
10
20
30
40
50
60
70
80
90
Percentage of students below proficiency Level 2
10
20
30
40
50
60
70
80
90
0
100
Finland o
Finland +
Korea - Korea o
Hong Kong-China o Hong Kong-China o
Estonia o
Canada o Liechtenstein o
Macao-China o Macao-China o
Japan o
Chinese Taipei o
Canada o
Liechtenstein o Japan o
Australia o
Poland -
Netherlands o
Netherlands o Switzerland o
New Zealand o
New Zealand o
Switzerland o
Hungary o Australia o
Latvia o
Iceland +
2009
Slovenia o
Germany o Denmark o
United Kingdom o
Norway o
Ireland o
Norway - Germany o
Portugal -
Belgium +
Denmark o
2009
Lithuania - Poland o
Czech Republic o
2003
Ireland +
Iceland -
Belgium o Slovak Republic o
2006
United States - Sweden +

Spain o
Croatia o Hungary o
Sweden + Czech Republic +
Slovak Republic o
France o France +
Italy - Latvia o
Russian Federation o
Luxembourg o
United States o
Greece o Portugal -
Turkey -
Chile -
Spain o
Israel o Luxembourg +
Serbia -
Bulgaria o
Italy -
Romania o Russian Federation o
Uruguay o
Greece -
Thailand o
Jordan o Serbia o
Mexico -
Turkey -
Argentina o
Montenegro + Uruguay o
Tunisia -
Mexico -
Colombia -
Brazil - Thailand o
Qatar - Brazil -
Indonesia o
Azerbaijan o Tunisia -
Kyrgyzstan - Indonesia o
2. Possible explanations
OECD (2010) stresses the role of the Basic Education Programme (BEP) in increasing
learning outcomes in Turkey. The World Bank supported programme defined the
framework for the education reform initiative in Turkey according to the Law No. 4306 2.
With this legislation Ministry of National Education (MONE) aimed to achieve
increasing primary school education, improving the quality of education and overall
student outcomes, closing the performance gap between boys and girls, providing equal
opportunities, matching the performance indicators of the European Union, developing
school libraries, increasing the efficiency of the education system, ensuring that qualified
personnel were employed, integrating information and communication technologies into
the education system and creating local learning centers, based in schools, that are open
to everyone3.
In response to these efforts the attendance rate in the eight-year primary education system
soared from 85 to 100 percent. Similarly, the attendance rate in pre-primary education
system increased from 10 to 25 percent. These increases led to an expansion of the
education system by 3.5 million pupils. These quantitative expansions of the education
system were accompanied by qualitative improvements: During the same period average
class size was reduced from approximately 40 to 30; conditions were improved in all
rural schools and computer laboratories were established in every primary school and
lastly the cost of the BEP exceed the equivalent of USD 11 billion (OECD, 2010).
2
http://mevzuat.meb.gov.tr/html/24.html
3
http://www.meb.gov.tr/Stats/Apk2002/502.htm
OECD (2010) as well as MONE also highlights the importance of recent curriculum
change in mathematics and science (TTKB, 2008): New curricula were launched in the
2006-2007 school year, starting from the 6 th grade. Similarly, mathematics and language
curricula were also updated and starting from the 9 th grade in the 2008-2009 school year a
new curriculum of science was in force. According to the Board of Education (TTKB)
the aim of this change was to update the content of school education as well as to change
the teaching philosophy and culture within schools.
Although the new curricula is the preferred explanation of MONE and some other
research institutions in Turkey 4 for the increased learning outcomes the connection is not
clear and there is a problem with this specific explanation: First, given that the TIMSS
covers the period between 1999 and 2007 the new curricula explanation does not explain
the improvement in learning outcomes which is evident in TIMSS data. Second, average
achievement in mathematics in PISA is not comparable between 2006 and 2009.
Therefore the timing of the inception of the new curricula and the increase in average
mathematics achievement in Turkey do not overlap. Third, the students who were subject
to the curricula change in science are 9 th graders which constitute only a portion of the
PISA 2009 sample in Turkey; moreover they experienced the new curricula only for two
semesters. It is not clear whether these students may drive a 0.3 standard deviation
increase in the average student achievement in science between 2006 and 2009.
As mentioned earlier, one of the targets of the BEP was to ensure that qualified personnel
were employed. In line with this goal teacher selection policy was changed in 2002 in
4
http://www.setav.org/public/HaberDetay.aspx?Dil=tr&hid=57559&q=pisa-yi-dogru-okumak,
http://www.tepav.org.tr/upload/files/1292255907-
8.PISA_2009_Sonuclarina_Iliskin_Bir_Degerlendirme.pdf
Turkey which might have affected teacher quality in public primary and secondary
institutions. In the following I will present a brief review on teacher quality and then go
on with the nature of this policy intervention.
3. Why teacher quality is important? How is teacher quality measured?
Learning outcomes are affected by many factors, including: students’ ability, potential,
enthusiasm and behavior; school management, resources and atmosphere; curriculum and
content; and teacher ability, preparation, attitudes and practices. Schools and classrooms
are elaborate and dynamic mediums and identifying the education production function
and the underlying technology continues to be a major challenge of educational research.
This problem has many aspects ranging from research design, methodology and data
availability. Usually researchers are forced to use measures which are only partial
indicators of learning and in many cases it is not possible to apply the relevant
methodologies. Therefore the results, interpretations and policy implications of such
studies are regularly questioned.
Keeping this caveat in mind some general inferences can be drawn from the body of
research on the determinants of learning. First, out-of-school factors such as the ability,
motivation, parental characteristics, neighborhood and socioeconomic status are the
strongest predictors of learning and it is not easy to change these factors through policy
intervention in the short run.
Second, among the factors which are open to policy influence teacher quality is the most
important school input affecting learning. Santiago (2002), Schacter and Thum (2004)
and Eide, Goldhaber and Brewer (2004) present extensive and detailed reviews of this
line of research.
The difference in teacher quality may lead to substantial difference in student
achievement. In order to understand the relative significance of teacher quality Rivkin et
al. (2005) analyze a unique matched panel data from the UTD Texas Schools Project
which allows them to identify teacher quality based on student performance. They
conclude that the contribution of a ten student reduction in class size to learning is less
than that of a standard deviation increase in teacher quality.
In another study, Rockoff (2004) analyzes a 10-year panel data of test scores and teacher
assignments to understand how much teachers affect learning. The panel structure allows
him to focus on differences in the performance of the same student with different teachers
and to decompose the variation in teacher quality from variation in students’
characteristics. His analysis shows that variation in teacher quality explains 23 percent of
the variation in the test scores which is potentially open to policy influence.
Third, teacher characteristics such as qualifications, teaching experience and teacher
education do not exhibit consistently clear and strong effects on student achievement:
Hanushek (2002, 2003) reviews the studies focusing on United States and concludes that
overall there are no systematic effects of characteristics such as teacher education or
teacher experience. Thus it is a challenging inquiry to identify the components which
characterize the quality of teachers.
In the same reviews Hanushek (2002, 2003) also highlights that there is convincingly
strong support for the effects of teachers’ academic ability as measured by teacher test
scores. In line with Hanushek’s inference National Center on Teacher Quality (NCTQ)
(2004) reports that teacher’s academic aptitude has a clear, measurable effect on learning
and this finding is robust and consistent. The same reports emphasizes that a teacher’s
literacy ability as measured by standardized tests has an impact on learning more than
any other measureable teacher characteristics. Thus a broad conclusion emerges from
research connecting teacher quality to teachers’ test scores: Teachers’ test scores may be
a good measure for teacher quality if these tests are measuring academic aptitude.
Interestingly, there are some studies from Turkey which is in line with these findings.
Several studies which analyze PISA 2006 data for Turkey show that students who were
taught by teachers who passed rigorous testing procedures are associated with higher test
scores (Alacaci & Erbas, 2010; Dincer & Uysal, 2010).
The literature leads to two main conclusions in these aspects: First, teacher quality is an
essential ingredient of education production and it is open to policy influence. Second,
screening teachers with testing which measures academic ability may lead to an increase
in the teacher quality.
4. Basic characteristics of teacher labor market in Turkey
The main characteristic of teacher labor market in Turkey is the excess supply of
teachers. As of 2010, approximately 327 thousand teachers wait to be employed by the
public sector and the number of applicants is three to four times higher than the number
of the opening teaching positions (Figure 2). This army of inactive teachers represents a
significant population given that the number of employed teachers in the public sector is
680 thousand. MONE also predicts that the optimal number of employed teachers in
public education system 717 thousand 5. Under these circumstances the gap between the
supply and the demand of teachers widens cumulatively.
5
http://icden.meb.gov.tr/digeryaziler/MEB_ic_denetim_faaliyet_raporu_2009.pdf
As of 2010, MONE demanded 782 mathematics teachers and it received 2798
applications. For science these figures are 861 and 3546 6, respectively and the gap more
or less is evident in every subject; thus excess supply is not specific to some of the
subjects.
Figure 2: The number of open positions and applicants by subject

4000
3500
3000
2500
2000
1500
1000
500
0
Math Science Physics Biology Chemistry
and Tech
# Open positions # Applicants
Source: Author’s own calculations from http://personel.meb.gov.tr/ana_sayfa.asp
A candidate rationalization of this excess supply may be the presence of very attractive
teacher salaries in Turkey. However the teacher salaries are not attractive at all in Turkey.
In the public sector the starting salary of a teacher is around 14000$ and it does not
improve much with experience (Figure 3). The salary of a teacher with 15 years of
experience is around 16000$ (OECD, 2009).
Dolton and Gutierrez (2011) present a cross-country analysis of teacher pay and
performance by taking the relative earning distribution in each country into account.
6
http://personel.meb.gov.tr/ana_sayfa.asp
Their analysis confirms that the teacher salaries are not especially attractive in Turkey
and the salary-experience profile is flat (Figure 4).
Figure 3: Ratio of salary after 15 years of experience to GDP per capita

2.5
1.5
0.5
United States
Israel
Netherlands
Austria
Greece
Sweden
Korea
Portugal
Italy
Germany
Denmark
France
Norway
OECD average
Slovenia
Scotland
Estonia
Switzerland
Hungary
Spain
Chile
Luxembourg
Japan
Australia
Belgium (Fl.)
Belgium (Fr.)
Finland
Mexico
Ireland
Iceland
New Zealand
England
Turkey
Czech Republic
Source: (OECD, 2009)
Figure 4: Average teacher wage-experience profile in Turkey
Source: (Dolton & Marcenaro Gutierrez, 2011)
Therefore the starting salaries and the expectation of relatively higher salaries in the
teaching profession cannot explain the excess supply in the teacher labor in Turkey.
Another important feature of the teacher labor market is that all the public servants in
Turkey are protected by law and unions and the job separation is a very unlikely event.
As a result teaching profession offers substantial job security and given the presence of
very high chronic unemployment rates individuals value job security heavily. One study
(Caner & Okten, 2010) analyzes the college major choice decision in a risk and return
framework using university entrance exam data from Turkey and show that individuals
are very sensitive to risk during career choice.
It should be also noted total enrollment in education faculties in Turkey also increased
steadily in time: The total enrollment increased from 33 thousand in 2007 to 45 thousand
in 2008 and 54 thousand in 2009 and MONE expands the teaching force by
approximately 40 thousand each year 7.
Thus a combination of an intense demand for job security and increased quotas of
education faculties may provide a more sensible explanation for the excess supply in
teacher labor market in Turkey.
5. Legal framework of teacher selection in Turkey
There are three main legal sources which regulates the hiring of teachers in Turkey. First,
teachers working in the public sector are subject to Law No. 657. This law defines the
rights as well as legal obligations of public servants since 1965. Second, the regulation of
the tests concerning the assignments of public servant candidates describes the testing
procedure for public servant posts since 2002. Third, MONE’s regulation of teacher
assignment and replacement explains how the testing procedure and test results apply to
7
http://www.ogretmenportali.net/HaberGoster/228716e4-64bf-4b55-bb17-
fc0ee89baf38/atanmayan-ogretmen-ordusu-buyuyor.aspx
teacher selection process. The current version of this regulation is legislated in 2010 and
it has changed many times in the past according to the needs of MONE.
The regulation of the tests concerning the assignments of public servant candidates
basically forms a turning point in teacher selection; because it causes a radical change in
teacher selection policy in Turkey.
In teacher selection system before the legislation of this regulation, i.e. prior 2002, any
eligible teacher candidate was able to apply to any available position announced by
MONE. The applications were processed in provincial offices of MONE and then the
final decision was given by the headquarters of MONE in the capital, Ankara (Figure 5).
Figure 5: A presentation of teacher selection system before 2002
This system was a cause of concern in MONE as well as in State Planning Organization
(SPO) (SPO, 1989). One of the main issues of the pre-2002 system was highlighted by
MONE as a constant imbalance of teacher population across regions. According to the
Research and Development department of MONE, one preliminary report of the 1993
National Education Assembly stressed that more than 10 percent of teachers employed by
MONE in urban areas did not teach a single class. Another issue documented in MONE’s
record associated with the pre-2002 was that political pressures and interventions
damaged the fairness and equality principles in teacher employment and caused unrest
among teachers (EARGED, 1995). Indeed this was well-known publicly that to have
connections in provincial offices as well as in the capital was essential to get hired. Thus
nepotism was a general worry about this selection process.
Following the legislation of the above mentioned testing regulation Center of Student
Selection and Placement (OSYM) launched a central examination process which is
known as Public Servant Selection Examination (KPSS). This exam has two sessions:
For the first session the teacher candidates have to answer 120 multiple choice questions
about Turkish, Mathematics, History, Citizenship, General Culture and Geography in 180
minutes. In the second the teacher candidates have to answer 120 multiple choice
questions about educational psychology, educational programs and teaching and
educational guidance in 180 minutes. Then applicants are assigned to teaching positions
centrally by MONE according to their test scores in the central examination and their
ranked list of preferred teaching positions (Figure 6). OSYM conducts the exam annually
and if a teacher candidate fails to be placed to a teaching position then s/he has to take the
exam again in the following year.
Figure 6: A hypothetical presentation of teacher selection after 2002
In this teacher selection system it is not possible to game the hiring process and it is also
not possible to leverage nepotism in order to get a teaching position. Thus it is reasonable
to claim that the central examination and allocation of teaching positions based on test
scores address the problem of lack of fairness. However two questions remain to be
answered: Does the new system ensure that the qualified teachers are employed? Does
this system have an impact on the regional imbalance of teacher population? The first
question is critical because it was one of the main goals of BEP. The second question is
critical because it was the chronic problem of education system (EARGED, 1995; SPO,
1989).
6. Data
In order to answer these research questions I employed TIMSS 19998 and TIMSS 20079
data sets for Turkey. These data sets have some very important qualities which render
them very suitable to analyze the questions in interest.
First, as mentioned earlier, these projects assess a representative set of 8th graders in the
participating countries. 8 th grade is the final grade of primary education in Turkey and
thus students in the sample should have spent at least a couple of years in their current
institutions.
Second, it is possible to link teachers to students in the same classroom which makes
these data sets especially attractive for this analysis.
Third, the TIMSS project conducts four questionnaires, i.e. student, school, mathematics
teacher and science teacher questionnaires. The student and teacher questionnaires
contain extensive information about demographic and socioeconomic characteristics of
8
http://timss.bc.edu/timss1999.html
9
http://timss.bc.edu/timss2007/index.html
students and teachers. In addition, the school questionnaire contains information on
school location, resources and governance.
Fourth, the information collected in 1999 and 2007 is comparable to a certain extent. The
questionnaires in 1999 and 2007 are not overlapping extensively; however most of the
essential information is available in both data sets.
Fifth and most importantly, the policy change which is subject to the evaluation in this
study falls into the middle of 1999 and 2007, the dates Turkey participated to TIMSS.
This allows me to have a reasonable number of observations who are subject to the policy
change which was launched in 2002.
Lastly, the teacher experience is reported in years such as 1, 2, 3 etc. but not in year
categories such as 0-4, 5-8 etc. This distinction is crucial for this analysis because the
data on teacher experience in TIMSS allow me to define the treatment and control groups
with respect to the inception date of the policy change.
7. Methodology and empirical analysis
For the empirical analysis, first, I merged the student, school and teachers data sets for
1999 and 2007 and compiled the 1999 and 2007 TIMSS data sets. Then I defined the
treatment group as the students whose teachers have four or less years of experience. This
assumption is necessary because I do not observe whether the teachers were selected via
central examination or not. Thus I claim that this definition of treatment group
approximates the ideal case.
The justification of this assumption is based on the timing of the TIMSS application and
the central examination. The first central examination in Turkey was conducted in July
2002; OSYM announced the test scores in August 200210 and MONE distributed the
teaching posts based on announced test scores in September, October and November
200211. On the other hand The TIMSS 2007 application in Turkey was conducted in
April, May and June 2007(Olson, Martin, Mullis, & Arora, 2008). Thus a teacher who
was selected with the first central examination should have assigned to the post as early
as September 2002 and the same teacher should have answered the TIMSS teacher
questionnaire as late as June 2007. According to this hypothetical example this teacher
should not have five years of experience at the time of TIMSS application. Therefore the
treatment group is assumed to be as defined above.
However this is an imperfect measure of selection via central examination: First, teacher
turnover leads to measurement error; because it is possible to quit and return teaching
which may be especially an issue for female teachers who may substitute teaching with
child raising for a couple of years. Second, OSYM conducted another central
examination which is known as Central Elimination Examination for Institutions (KMS)
in 200112. KMS was different then KPSS and it is not clear how many teaching posts
were distributed based on KMS scores as well as whether KMS scores were the sole
determinant of the teacher assignments. This issue may also lead to measurement error.
Keeping these shortcomings in mind I basically compared the difference of average
student achievement between treatment and control groups in 1999 and 2007 with a basic
differences-in-differences approach. The main assumption of this approach is that the
10
http://www.osym.gov.tr/belge/1-6128/2002-sinavlari.html
11
http://personel.meb.gov.tr/sayfa_goster.asp?ID=207
12
http://www.osym.gov.tr/belge/1-12485/2001-sinavlari.html
change in mean test scores that the control group experiences over time reflects the same
change that the treatment group would have experienced had they not been exposed to the
treatment. Another important assumption of differences-in-differences approach is that
unobserved characteristics have the same distribution across time points and across
treatment groups. I will discuss the validity of these assumptions in the subsequent
sections.
For the differences-in-differences analysis I have estimated the following regression
models:
Table 2: Difference-in-Differences estimations

In these regression models represents the dependent variable which is either the
mathematics or science test score. However it should be mentioned that TIMSS does not
provide point estimates of mathematics and science test scores instead TIMSS gives five
plausible values of mathematics and science ability. For the sake of simplicity I averaged
the five plausible values for each subject and then used the averaged plausible values as
the measure of the subject test score. TIMSS 2007 Technical Report highlights that
taking the average of the plausible values will not yield suitable estimates of individual
student scores (Olson, et al., 2008). In this analysis I repeated some of the estimations
with plausible values and then compared the point estimates and the standard errors of the
population parameter in interest, i.e. . In all cases the point estimates were very close to
each and the standard errors were slightly larger which did not affect the statistical
significance levels.
In these regression models stands for the TIMSS cycle (1999 and 2007),
defines the treatment variable which equals to 1 if the subject teacher has four
or less years of experience. Observed information regarding teachers, students, classes
and schools enters the regression models as control variables (Table 3).
The list of control variables was basically constructed within the data limitations. The
variables available in TIMSS 1999 and 2007 data sets are not overlapping to a significant
degree and in some cases although the necessary variables are available in both data sets
the scales of measurement are different. For example this was a serious issue in terms of
school location variable. All in all I experimented with every variable which is available
in both data sets. The number of missing observations partially had an impact on the list
of control variables.
Table 3: List of Control Variables
Teacher Class Student School
characteristics characteristics characteristics resources
Sex Diversity in Sex An indicator
academic ability for school
resources
Age Diversity in Age Location
socioeconomic
background
Subject degree Presence of Parental
disruptive students education
Experience Class size # books at home
Instructional time Computer at
home
Language
spoken at home
Following the difference-in-differences analysis with mathematics and science
achievement I utilized another aspect of the data structure: The treatment variable offers
variation by subject. This means that same student may have a mathematics teacher who
has four or less years of experience whereas his science teacher may have more than four
years of experience (or vice versa). Given that both the mathematics and science test
scores are observed for each student this structure allows me to employed individual
fixed effects. For that purpose I compiled the mathematics and science data sets and
incorporated student fixed effects into the regression models defined in Table 2. This
approach allowed me to relax one of the assumptions which are associated with
difference-in-differences approach. After adding student fixed effects into the model I do
not have assume that unobserved student and school characteristics have the same
distribution across time points and across treatment groups. However I still have to
assume that unobserved class characteristics have the same distribution across time points
and across treatment groups (Table 4). Lastly it should be also mentioned that there are
other examples which employs very similar identification strategies such as the study of
Lavy (2010). In this study the researcher establishes a causal link between instructional
time and student achievement by making use of the within-individual variation in the test
scores and within-subject variation in the instructional time. In its essence the
identification strategy I am employing is identical to the approach Lavy (2010) used with
one exception that I embedded it into a difference-in-differences framework (Table 4).
Table 4: Fixed effects and difference-in-differences estimations
Although this identification strategy allows me to relax some of the assumptions of the
differences-in-differences approach it has also its own shortcomings: First, it leads to a
reduction in the sample size automatically and this problem becomes more pronounced in
sub-group analysis. Second, it is not possible to decompose the effect into two parts as
learning gains in mathematics and learning gains science.
8. Findings
The following table gives the estimated values for the coefficient of interest under
different specification as described in Table 2 as well as it also presents sub-group
estimates of this coefficient. The analysis has been conducted separately for mathematics
and science test scores (Table 5).

Table 5: Estimation results of difference-in-differences
Mathematics
Whole sample Female teacher sample Male teacher sample Below median achievers Above median achievers
sample sample
Coef Std Err Adj R² Coef Std Err Adj R² Coef Std Err Adj R² Coef Std Err Adj R² Coef Std Err Adj R²
Model 1 -17.86 [13.94] 0.04 -23.53 [22.02] 0.07 -15.50 [17.44] 0.03 -2.38 [6.32] 0.03 -11.90 [8.50] 0.05
Model 2 -28.15* [16.46] 0.08 -25.76 [20.89] 0.19 -20.10 [21.03] 0.06 -4.34 [7.47] 0.04 -18.43* [10.30] 0.08
Model 3 -27.12 [17.26] 0.10 -10.76 [21.23] 0.26 -28.35 [24.38] 0.09 -2.80 [7.12] 0.05 -16.59 [11.18] 0.10
Model 4 -14.19 [14.02] 0.27 -7.01 [18.09] 0.38 -10.20 [20.20] 0.24 -2.40 [6.82] 0.10 -8.38 [9.29] 0.21
Model 5 -0.61 [13.56] 0.30 6.45 [17.84] 0.40 0.55 [19.41] 0.26 0.22 [6.66] 0.11 -0.26 [9.14] 0.23
Obs. 6,750 2,757 3,993 3,354 3,396
Science
-8.46 [6.25] 0.12
Model 1 -15.49 [12.30] 0.07 -42.53** [16.51] 0.10 4.36 [17.24] 0.05 -3.41 [6.31] 0.01
Model 2 -17.42 [12.34] 0.09 -28.14 [17.63] 0.12 -2.40 [17.70] 0.08 -5.99 [5.76] 0.03 -5.87 [6.94] 0.14
Model 3 -17.85 [13.71] 0.14 -15.22 [21.73] 0.18 -12.49 [17.03] 0.17 -4.83 [6.05] 0.04 -4.79 [7.31] 0.18
Model 4 -6.89 [10.63] 0.31 -4.98 [17.31] 0.35 -5.00 [13.05] 0.31 -2.70 [5.23] 0.10 -2.21 [6.52] 0.27
Model 5 -6.29 [10.56] 0.31 -11.90 [18.50] 0.36 -2.73 [12.71] 0.31 -4.15 [5.32] 0.11 -1.38 [6.53] 0.27
Obs. 7,085 3,131 3,954 3,536 3,549
Robust standard errors in brackets clustered at the class level, *** p<0.01, ** p<0.05, * p<0.1
The results in Table 5 draw attention to several important issues: First, standard errors are
very large. Among 50 point estimates of the treatment effect only three of them are
statistically different than different than zero at least at 10 percent significance level.
Second, almost all of the point estimates have a negative sign. Third, the point estimates
are not stable. In the Model 1 without any control variables the point estimates are
negative and large; however the addition of teacher, class, student and school
characteristics into the regression model rasps this negative treatment effect towards zero.
In some sub-groups addition of these control variables also led to sign changes. A closer
look to the female teacher and male teacher sub-groups highlights that this problem is
much more severe in female teacher sub-group. All in all, the difference-in-differences
analysis does not provide any information about the possible impact of treatment on
student learning. Because of the very large standard errors the treatment effect may be
negative, zero or positive. However it also shows that observed class, student and school
characteristics do not have the same distribution across time points and across treatment
groups given that the point estimates are instable and change signs. Therefore it is also
very likely that unobserved class, student and school characteristics do not have the same
distribution across time points and across treatment groups which is a violation of the
assumptions underlying difference-in-differences approach. This may also be a sign of
differential assignment of teachers with four or less years of experience to classrooms
between 1999 and 2007. In the following I incorporate the student fixed effects into the
regression models in order to take into account the factors at the student and school levels
(Table 6). However teacher and class characteristics vary between the subjects; thus the
regressions contain controls for observed teacher and class characteristics.

Table 6: Estimation results of student fixed effects and difference-in-differences
Mathematics & science scores combined
Whole sample Female teacher sample Male teacher sample Below median achievers Above median achievers
sample sample
Model 1 3.68 [10.62] 0.01 15.10 [12.92] 0.04 7.11 [10.85] 0.03 3.94 [12.59] 0.02 2.94 [8.44] 0.00
Model 2 4.28 [9.36] 0.09 16.07 [12.61] 0.10 8.59 [13.17] 0.14 -0.60 [12.49] 0.20 2.42 [8.13] 0.03
Model 3 14.77** [6.89] 0.22 41.56** [18.32] 0.30 17.63 [14.13] 0.23 20.67*** [5.32] 0.52 6.17 [9.91] 0.06
Obs. 4619 612 1166 2959 1675
Table 7: Alternative treatment definitions

Math & science scores combined – Model 5
Whole sample Female sample Male sample Below median achievers Above median achievers
sample sample
5-8 years -4.63 [8.18] 0.22 6.64 [19.58] 0.30 0.55 [23.93] 0.23 -14.89** [6.64] 0.51 -0.29 [8.58] 0.06
9-20 years -0.55 [7.01] 0.23 -16.63 [16.99] 0.31 1.53 [10.58] 0.27 -13.69* [8.04] 0.53 6.92 [5.85] 0.06
20+ years -7.34 [8.32] 0.22 4.23 [17.16] 0.30 -33.28** [14.63] 0.24 12.04 [9.98] 0.51 -7.36 [6.14] 0.06
The results in Table 6 are in contrast with the result in Table 5. Generally the standard
errors are smaller; more interestingly with one exception all of the point estimates of the
treatment effect are positive. The point estimates are not sensitive to the addition of the
teacher characteristics to the regression; however they are very sensitive to the addition
of class characteristics. According to the Model 3, i.e. after controlling for teacher and
class characteristics, the impact of the treatment is estimated precisely for the whole,
female teacher and below median achievers samples.
The standard deviation of the dependent variable in the whole sample is 89. Thus the
impact of the policy change in 2002 on student achievement is around 0.17 standard
deviations. However the sub-group analysis exhibits that this impact is channeled mostly
through female teachers. The estimated impact of the treatment effect in the female
teacher sample is 2.8 times higher than the whole sample whereas in the male teacher
sample the impact is not precisely estimated. Another important inference which can be
drawn from Table 6 is that the below median achievers benefit more from the new
teacher selection compared to above median achievers. Thus the treatment effect is
concentrated on below median achievers. Lastly, the sensitivity of the point estimates to
the addition of class characteristics are in line with the findings in Table 5. This may be
due to the within-school (between classroom) differential assignment of teachers with 4
or less years of experience to classrooms between 1999 and 2007.
The findings in Table 6 provide evidence in favor of a positive and moderately large
treatment effect. Thus it may be claimed that within the contextual framework in Turkey
teacher selection with centralized testing may lead to higher learning outcomes compared
to a decentralized recruitment system. However, there may be other underlying reasons

which can potentially explain the findings in Table 6: For example, there may be a
secular increase in the quality of education faculties in Turkey. If this is the case the
estimated impact may be due to the quality increase in education faculties instead of the
new teacher selection policy. In the same line of thought it can be said that more and
more high school students with higher ability opt for education faculties; thus ability
distribution of the pool of teacher candidates may shift in time. However if these
arguments are true I should expect to detect positive estimates of treatment effect for
different segments of teachers. In order to test these arguments I divided the sample of
teachers who have more than four years experience into three parts such that the sizes of
the subsamples are equal. These segments are 5-8, 9-20 and 20+ years of experience.
Thus these categories defined the alternative treatment variables for each case and I
repeated the individual fixed effects exercise with the full model which includes teacher
and class characteristics as controls. In Table 7 none of the point estimates are
statistically significant and positive; additionally statistically insignificant point estimates
are small when compared with the positive point estimates in Table 6. Thus I failed to
detect any positive impact of the treatment effect with alternative treatment definitions.
Therefore it is more likely that the estimated impact is due to the new selection policy
rather than a secular increase in the quality of education faculties or student body.
9. Conclusion
These findings are suggestive in their nature and they are not suitable to make causal
inferences: Combining individual fixed effects with difference-in-differences allows for a
relatively precise estimate of the treatment effect. The remaining problem with this
approach is the lack of a complete set of classroom characteristics. The point estimates
are sensitive to the classroom characteristics and unobserved classroom characteristics
may cause a bias on the estimate. Although all of this analysis shows that the possible
direction of this bias is downward.
The findings also provide a reasonable explanation for the trend in TIMSS and PISA
results. First, since the analyzed period precedes the curriculum reform in Turkey the
findings cannot be attributed to the curriculum reform. Second, the findings present a
concentrated impact on below median achievers whereas no impact for above median
achievers. This is perfectly in line with what we observe in PISA cycles for students in
Turkey.
The findings are also in accordance with the literature on teacher quality: As mentioned
earlier teacher’s academic ability is one of most robust indicators of teacher’s
effectiveness (Hanushek, 2002, 2003; NCTQ, 2004). Basturk (2008) shows that test
scores in college entrance exam are highly predictive for the KPSS test score. Therefore
it should be reasonable to interpret success in KPSS as an indication of higher academic
ability.
Lastly, the following table depicts the degree of differential assignment of teachers into
schools and classrooms. These tables can be interpreted as MONE attempts to ensure a
more balanced distribution of teacher assignment across resource rich and poor regions.
As mentioned earlier MONE as well as SPO were concerned about the imbalance of
teaching force across regions (Table 8).
After the introduction of the central examination the teaching force became much more
female, the new teachers were assigned to classrooms which were much more diverse in
terms of socioeconomic background and have fewer resources for instruction. The
students in these classrooms were more likely to speak Turkish sometimes (but not
always), had fewer books at home and their parents were more likely to have less than
lower secondary education.
Table 8: Differential teacher assignment between 199 and 2007

1999 2007
TREAT=0 TREAT=1 TREAT=0 TREAT=1
Teacher's sex (%)
Female 41 40 35 67
Male 59 60 65 33
Wide range of backgrounds in class (%)
not at all 12 6 16 28
a little 49 48 35 20
quite a lot 31 36 38 23
a great deal 8 10 11 29
Resources for math instruction (%)
low 32 27 19 31
medium 65 66 72 65
high 4 7 9 5
Language at home
Always Turkish 93 84 94 78
Sometimes Turkish 6 14 6 20
Never Turkish 1 2 1 2
# books at home
0-10 20 27 20 37
11-25 36 40 36 41
26-100 29 21 27 15
101-200 9 5 10 5
200+ 6 6 7 2
Parental education
University Degree 10 5 10 2
Completed Post-
Secondary 21 13 4 2
Completed Secondary 68 80 72 71
Less Than Lower-
Secondary 2 2 13 22
Do Not Know 0 0 1 2
References
Aksit, N. (2007). Educational reform in Turkey. International Journal of Educational
Development, 27(2), 129-137.
Alacaci, C., & Erbas, A.K. (2010). Unpacking the inequality among Turkish schools:
Findings from PISA 2006. International Journal of Educational Development,
30(2), 182-192.
Basturk, R. (2008). Predictive validity of the science and technology pre-service teachers’
civil servant selection examination. Elementary Education Online, 7(2), 323-332.
Caner, A., & Okten, C. (2010). Risk and career choice: Evidence from Turkey.
Economics of Education Review, 29(6), 1060-1075.
Dincer, M.A., & Uysal, G. (2010). The determinants of student achievement in Turkey.
International Journal of Educational Development, 30(6), 7.
Dolton, P., & Marcenaro Gutierrez, O.D. (2011). If you pay peanuts do you get
monkeys? A cross country analysis of teacher pay and pupil performance.
Economic policy, 26(65), 5-55.
EARGED. (1995). Ogretim Yukunun Analizi. Ankara: MONE.
Eide, E., Goldhaber, D., & Brewer, D. (2004). The teacher labour market and teacher
quality. Oxford Review of Economic Policy, 20(2), 230.
Hanushek, E.A. (2002). Publicly provided education: National Bureau of Economic
Research Cambridge, Mass., USA.
Hanushek, E.A. (2003). The Failure of Input based Schooling Policies*. The economic
journal, 113(485), F64-F98.

Lavy, V. (2010). Do Differences in School’s Instruction Time Explain International
Achievement Gaps in Math, Science, and Reading? Evidence from Developed
and Developing Countries: National Bureau of Economic Research.
Martin, M.O., Mullis, I.V.S., Foy, P., & Olson, J.F. (2008a). TIMSS 2007: International
Mathematics Report: Findings from IEA's Trends in International Mathematics
and Science Study at the Fourth and Eighth Grades: IEA TIMSS & PIRLS
International Study Center, Lynch School of Education, Boston College.
Martin, M.O., Mullis, I.V.S., Foy, P., & Olson, J.F. (2008b). TIMSS 2007: International
Science Report: Findings from IEA's Trends in International Mathematics and
Science Study at the Fourth and Eighth Grades: IEA TIMSS & PIRLS
International Study Center, Lynch School of Education, Boston College.
Martin, M.O., Mullis, I.V.S., O’Connor, K.M., Chrostowski, S.J., Gregory, K.D., Smith,
T.A., & Garden, R.A. (2001a). Mathematics benchmarking report: TIMSS
1999—Eighth grade. Chestnut Hill, MA: International Study Center.
Martin, M.O., Mullis, I.V.S., O’Connor, K.M., Chrostowski, S.J., Gregory, K.D., Smith,
T.A., & Garden, R.A. (2001b). Science benchmarking report: TIMSS 1999—
Eighth grade. Chestnut Hill, MA: International Study Center, Lynch School of
Education, Boston College.
NCTQ. (2004). Increasing the Odds How Good Policies Can Yield Better Teachers:
NCTQ.
OECD. (2009). Education at a Glance 2009: OECD Indicators: Organization for
Economic Cooperation and Development.
OECD. (2010). PISA 2009 Results: Learning Trends: OECD.

Olson, J.F., Martin, M.O., Mullis, I.V.S., & Arora, A. (2008). TIMSS 2007: Technical
Report: International Association for the Evaluation of Educational Achievement.
Rivkin, S.G., Hanushek, E.A., & Kain, J.F. (2005). Teachers, schools, and academic
achievement. Econometrica, 73(2), 417-458.
Rockoff, J.E. (2004). The impact of individual teachers on student achievement:
Evidence from panel data. The American Economic Review, 94(2), 247-252.
Santiago, P. (2002). Teacher demand and supply: Improving teaching quality and
addressing teacher shortages. OECD Education Working Papers.
Schacter, J., & Thum, Y.M. (2004). Paying for high-and low-quality teaching. Economics
of Education Review, 23(4), 411-430.
SPO. (1989). Altinci bes yillik kalkinma plani 1990-1994. Ankara: SPO.
TTKB. (2008). İlkögretim Matematik Dersi 6–8 Sınıflar Öğretim Programı ve Kılavuzu
(Teaching Syllabus and Curriculum Guidebook for Elementary school mathematics
course: Grades 6 to 8). Ankara: Ministry of National Education (MONE)

Test Scores and Teacher Selection

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Test Scores and Teacher Selection

Diunggah oleh

Hak Cipta:

Format Tersedia

TEACHERS COLLEGE, COLUMBIA UNIVERSITY

TEST SCORES AND TEACHER SELECTION

AN EMPIRICAL ANALYSIS FOR TURKEY

restructuring is to increase the quality of learning outcomes in Turkey (Aksit, 2007).

intended outcomes or not.

International Student Assessment (PISA) periodically measure the student achievement

participating countries and make cross-country comparisons. Therefore these projects

percentages of students reaching the TIMSS international benchmarks:

Advanced High Intermediate Low

and there are fewer students in low international benchmark levels 1.

deviation) (OECD, 2010).

decrease between the respective periods (Figure 1).

score scales; average performance of students in Turkey is converging to international

benchmarks as it is defined either by TIMSS or PISA. This convergence is pretty quick at

least according to the measure PISA provided.

achievement related to restructuring in the education system in Turkey? If yes, which

candidate answers to these questions.

Source: (OECD, 2010)

Serbia 12 United States 37

Percentage of students below proficiency Level 2

United States - Sweden +

With this legislation Ministry of National Education (MONE) aimed to achieve

opportunities, matching the performance indicators of the European Union, developing

personnel were employed, integrating information and communication technologies into

system increased from 10 to 25 percent. These increases led to an expansion of the

the teaching philosophy and culture within schools.

achievement in mathematics in PISA is not comparable between 2006 and 2009.

on with the nature of this policy intervention.

3. Why teacher quality is important? How is teacher quality measured?

and the underlying technology continues to be a major challenge of educational research.

methodologies. Therefore the results, interpretations and policy implications of such

studies are regularly questioned.

motivation, parental characteristics, neighborhood and socioeconomic status are the

intervention in the short run.

achievement. In order to understand the relative significance of teacher quality Rivkin et

than that of a standard deviation increase in teacher quality.

and to decompose the variation in teacher quality from variation in students’

Third, teacher characteristics such as qualifications, teaching experience and teacher

overall there are no systematic effects of characteristics such as teacher education or

teacher experience. Thus it is a challenging inquiry to identify the components which

characterize the quality of teachers.

scores (Alacaci & Erbas, 2010; Dincer & Uysal, 2010).

essential ingredient of education production and it is open to policy influence. Second,

in the teacher quality.

4. Basic characteristics of teacher labor market in Turkey

teachers. As of 2010, approximately 327 thousand teachers wait to be employed by the

supply and the demand of teachers widens cumulatively.

Figure 2: The number of open positions and applicants by subject

Source: Author’s own calculations from http://personel.meb.gov.tr/ana_sayfa.asp

experience is around 16000$ (OECD, 2009).

and the salary-experience profile is flat (Figure 4).

Figure 3: Ratio of salary after 15 years of experience to GDP per capita

Source: (OECD, 2009)

Figure 4: Average teacher wage-experience profile in Turkey

Source: (Dolton & Marcenaro Gutierrez, 2011)

are very sensitive to risk during career choice.

approximately 40 thousand each year 7.

teacher labor market in Turkey.

5. Legal framework of teacher selection in Turkey

teacher selection policy in Turkey.

Figure 5: A presentation of teacher selection system before 2002

MONE as a constant imbalance of teacher population across regions. According to the

nepotism was a general worry about this selection process.