Anda di halaman 1dari 30

Res Sci Educ (2009) 39:595–624

DOI 10.1007/s11165-008-9108-7

The Roles of Substantive and Procedural Understanding


in Open-Ended Science Investigations: Using Fuzzy Set
Qualitative Comparative Analysis to Compare Two
Different Tasks

Judith Glaesser & Richard Gott & Ros Roberts &


Barry Cooper

Published online: 6 December 2008


# Springer Science + Business Media B.V. 2008

Abstract We examine the respective roles of substantive understanding (i.e., understanding


of factual knowledge, concepts, laws and theories) and procedural understanding (an
understanding of ideas about evidence; concepts such as reliability and validity,
measurement and calibration, data collection, measurement error, the ability to interpret
evidence and the like) required to carry out an open-ended science investigation. Our
chosen method of analysis is Charles Ragin’s Fuzzy Set Qualitative Comparative Analysis
which we introduce in the paper. Comparing the performance of undergraduate students on
two investigation tasks which differ with regard to the amount of substantive content, we
demonstrate that both substantive understanding and an understanding of ideas about
evidence are jointly involved in carrying out such tasks competently. It might be expected
that substantive knowledge is less important when carrying out an investigation with little
substantive demand. However, we find that the contribution of substantive understanding
and an understanding of ideas about evidence is remarkably similar for both tasks. We
discuss possible reasons for our findings.

Keywords Fuzzy set qualitative comparative analysis . Open-ended investigations .


Ideas about evidence . Procedural knowledge . Undergraduate students

J. Glaesser (*) : R. Gott : R. Roberts : B. Cooper


School of Education, Durham University, Leazes Road, Durham DH1 1TA, UK
e-mail: Judith.Glaesser@durham.ac.uk
R. Gott
e-mail: Richard.Gott@durham.ac.uk
R. Roberts
e-mail: Rosalyn.Roberts@durham.ac.uk
B. Cooper
e-mail: Barry.Cooper@durham.ac.uk
596 Res Sci Educ (2009) 39:595–624

Introduction

In a recent paper (Glaesser et al. forthcoming) we explored the relative effects of procedural
and substantive understanding in undergraduate students’ ability to perform an open-ended
science investigation. In the interest of space, we shall review that paper very briefly here
but refer the reader there for more of the background to the research. For the investigation
task we used there we found, using crisp QCA (Qualitative Comparative Analysis, see
Ragin 1987), that both substantive understanding of science and procedural understanding
(which we define below) contribute to the performance on an open-ended investigation: to
conduct the investigation competently, the joint presence of substantive and procedural
understanding and prior attainment was a sufficient condition, whereas the presence of at
least one of them was a necessary condition. These two different types of understanding
were closely related, that is, we did not find many students who had one but not the other.
In this paper we shall develop this theme in two directions: We shall now widen the
scope of the investigations to include a second example. And we shall make a comparison
of the two tasks using fuzzy set QCA (Ragin 2000), which is similar in principle to crisp set
QCA but, we shall argue, an alternative which allows for a more subtle comparison
between task outcomes of different overall facility.

Theoretical Background and Rationale for this Research

Recent curriculum frameworks around the world have reflected science as more than a body
of facts (see for example Duschl et al. 2006, Qualifications and Curriculum Authority
(QCA) undated, Curriculum Council, Western Australia 1998.) Students are now expected
also to engage with the central role of evidence in science.
Open-ended investigations enable them to do this. Open-ended investigations are those in
which students are unaware of any ‘correct’ answer, where there are many different routes to a
valid solution and where different sources of uncertainty lead to variations in repeated data so
that students reflect and modify their practice in the light of the evidence they have collected.
The evidence produced, then, is messy rather than the necessarily cleaned up version common
in practical work contrived to illustrate ideas to students. Open-ended investigations play an
important role in ‘inquiry’ based approaches to science curricula across the world (Abd-El-
Khalick et al. 2004). They reflect ‘science as practice’ and are important for understanding not
just the practice of scientific investigation but also provide a context for understanding why
science needs empirical evidence (Duschl et al. 2006).
In this small scale research we seek to explore, empirically, the importance of students’
understanding to conduct open-ended investigations. The findings may help to inform
curricula, specifically about what to teach, which aim to develop students’ proficiency in
investigations.

What Type of Understanding is Important?

Our view is constructivist in that understanding requires the learner to construct meaning
from ideas. In science the traditional curriculum emphasis in classroom teaching has largely
been on the content knowledge ideas of the subject matter such as mechanics in physics or
genetics in biology, to name but two. In other words, we are dealing with the familiar
factual knowledge, concepts, laws and theories of science. This is traditionally known as
substantive or conceptual understanding.
Res Sci Educ (2009) 39:595–624 597

The facts, concepts, laws and theories that contribute to this substantive understanding
are, of course, themselves supported by empirical evidence or are subject to investigation.
Science, therefore, encompasses more than just an understanding of the familiar factual
knowledge, concepts, laws and theories. Traditional substantive understanding alone is not
sufficient to describe the ideas of science. Another key curriculum component is concerned
with the procedures of science. Yet this component is often down-played in curriculum
documents, at least in terms of the number of pages devoted to its specification. The
procedural component is often represented in curricula in terms of behavioural objectives
(Duschl et al. 2006) and is implicit in text books (Roberts and Gott 2000); a performance
emphasis rather than emphasising understanding per se.
The term ‘procedural knowledge’ is found in a number of areas (Star 2000), not just in
science, and implies ‘knowing how to proceed’; in effect, in science, a synthesis of manual
skills, ideas about evidence, tacit understanding from doing and the substantive content
knowledge ideas relevant to the context. In science education in the UK and in our work,
the term procedural understanding has been used to describe the understanding of ideas
about evidence, which underpin an understanding of how to proceed. The term procedural
understanding has been used to distinguish ideas about evidence from other more
traditional substantive ideas. We have argued that a lack of ideas about evidence prevents
students from exhibiting an understanding of how to proceed.
This research questions whether understanding the traditional substantive content
knowledge ideas of science is sufficient for students to be proficient in an open-ended
science investigation. Or are other factors necessary or sufficient conditions?
The specification and teaching of the traditional substantive ideas of science is not the
focus of this paper. That is not to down-play its importance in science education. Our focus
is on the procedural component of the curriculum.
The procedural component of the curriculum has been conceptualised and articulated
differently in the literature by researchers with different research agendas. We will briefly
characterise these in a later section and position the rationale for this research. But first we
need to delimit what we are discussing.

Other Perspectives

Piagetian Schema We start with extensive research that has been strongly influenced by
Piagetian psychology (Inhelder and Piaget 1958; see for instance Adey 1992; Klahr and
Nigam 2004; Kuhn et al. 1988; Schauble 1996; Toth et al. 2000).
The development of ‘higher order thinking skills’ are thought to be important in helping
students better understand science; the development of strategies about control of variables
and causal relationships, for example. The focus of such work is on the development of
appropriate schemas in pupils so that they can become formal reasoners. Psychology-
informed research has shown that students can be explicitly taught to develop this
understanding; it is not just a ‘skill’ that develops only with practice (cf. the ‘skills’
perspective developed in the next section). Klahr and Nigam (2004) demonstrated the effects
of explicit teaching, which seemed to develop understanding which lasted much longer than
in students who were just left to practise. Others have shown similar results (see Chen and
Klahr 1999; Kuhn and Dean 2005; Shayer and Adey 1992 and Toth et al. 2000, for instance).
However, in interventions from this psychological perspective, what is taught, such as the
control of variables strategy, is seen not so much as part of science per se, thereby legitimising
it within curriculum structures, but as an ‘additional’ psychological component necessary for
developing understanding in science (Jones and Gott 1998). In the UK, for instance, in the
598 Res Sci Educ (2009) 39:595–624

Piagetian-based CASE (cognitive acceleration through science education) programme’s


interventions, what was taught was selected from the psychological perspective of developing
formal reasoning (Shayer and Adey 1992). The interventions, which arguably addressed some
of the procedural component of a science curriculum, were seen more as ways of developing
underlying cognitive structures so that the learner can better understand substantive science.
Jones and Gott (1998) contrasted this with the ‘understanding ideas about evidence’
perspective which underpins this research and is expanded below: the crucial difference is that
from the ‘understanding ideas about evidence’ perspective, the ideas of evidence are
perceived to be important elements of science, and as such can be selected to construct
curricula, can be taught and can be assessed. Thus, although research from a Piagetian
perspective has much to inform the procedural component in science education and there is
overlap in what is taught, the ‘understanding ideas about evidence’ perspective focuses on
ideas integral to science and covers, as a consequence, more ideas from science than does the
psychology-focused development of schemas.

Inquiry The word ‘inquiry’ is prominent in literature about the procedural component. For
instance, an inquiry (or inquiry task) can be similar to an open-ended investigation, as we have
described above. Inquiry is also used as a term for more psychology-focused processes akin to
‘scientific thinking’. Inquiry-based curricula employ an inquiry-based teaching and learning
approach. But even as a pedagogical approach the word can imply different things; from simple
‘discovery learning’ (often contrasted with direct instruction) to planned learning progressions
which include more explicit teaching in the sequence of lessons (Duschl et al. 2006). In the UK
the inquiry approach was typified by Nuffield curricula (Jenkins 1979) and emphasised
learning substantive ideas through practical work, and the US view is not dissimilar (see for
instance Duschl et al. 2006). Within nearly all the uses of the term inquiry is the view, implicit
or explicit, that both the substantive and procedural components of the curriculum are
inseparable; the two components are addressed together in inquiry with the resultant emphasis
in practice being largely on the substantive ideas. This is in contrast to the argument we
develop in this paper where we have clearly distinguished the substantive and procedural
components of the curriculum to structure the discussion. Inevitably there is an overlap
between our work and research framed in terms of inquiry but the focus is different.

Nature of Science Research and curriculum developments concerning epistemology and the
nature of science (see for instance, Hart et al. 2000; Osborne et al. 2003; Qualifications and
Curriculum Authority 2004; Sandoval 2005) address the ways in which science, and more
particularly scientists, work. The emphasis is largely epistemological and sociological. We
are not concerned with these emphases here, although there is a lot of common ground. The
procedural component of the curriculum is important in the nature of science but much of
the research comes from a more philosophical perspective. The ideas about evidence
(expanded in the next section), we would argue, are a sub-set of the ideas involved in such
an understanding of the Nature of Science.

The Procedural Component of the Curriculum

The procedural component of the curriculum is concerned with ‘doing science’ (Hodson
1991). Polanyi (1966) considered some forms of expert procedural knowledge, ‘know-
how’, to be tacit and unable to be codified. From this perspective, expertise is more than the
Res Sci Educ (2009) 39:595–624 599

sum of the component parts. But what, at least, are these component parts? What do we
need to teach students to help them to investigate?
The different perspectives of how the procedural component is conceived can be
characterised by examining two contrasting approaches: ‘a skills’ perspective and ‘an
understanding of ideas about evidence’ perspective. While we see these as two
fundamentally different perspectives we recognise that some literature contains some
elements from both. The perspectives taken will, in turn, influence both how the procedural
component is taught and assessed. We will attempt to do illustrate this in the following
account.1

The Skills Perspective

The skills perspective is characterised by performance, often termed as ‘process skills’. The
main characteristic of such a perspective is that the procedural component is to be learned
by repeated exposure to practical work. The procedural component is largely implicit in
teaching and any guidance given to students is through a simple exemplification of the
process. An early version of this is typified by the Science: A Process Approach developed
from work by Gagné which identified isolated ‘process skills’ (American Association for
the Advancement of Science 1967) which was followed by others including, in the UK,
Warwick Process science (Screen 1986) which emphasised ‘process skills’ such as
observing, classifying and interpreting. Paralleling the developments to teach process skills
were assessment schemes such as The Assessment of Practical Science (TAPS; Bryce et al.
1983) and the Graded Assessment in Science Project (GASP; Davis 1989) which were both
based on the assessment of ‘process’ in the context of performance of isolated practical
skills.
Research has shown that “children failed to develop meaningful understanding under
science-as-process instructional programs … but its legacy persists in both policy and
practice” (Duschl et al. 2006 Chp8 pp2–3). Elements of that legacy can still be seen in
curricula that either have procedural components specified as behavioural objectives, since
these may be translated into classroom practice and assessment as just ‘doing’, or in
curricula that emphasise using investigations as a pedagogical approach, a way of teaching,
for mainly substantive understanding. In such pedagogical approaches the ‘doing’ of
science is considered to be sufficient to meet the procedural component of the curriculum;
students ‘discover’ the procedural element with practice.

An Understanding of Ideas About Evidence Perspective

An understanding of ideas about evidence perspective is a different way of conceptualising


the procedural component of the curriculum. The procedural component is seen to be
underpinned by a set of ideas about evidence. It requires the learner to construct meaning,
specifically about validity and reliability, from specific ideas about evidence. The focus is
on a set of ideas that are an integral part of science and that can then be learned, understood
and applied, rather than a set of skills that develop implicitly by practice.
These ideas can be applied and synthesised in open-ended investigations, together, of
course, with the traditional substantive ideas of science. (We also consider them to be
important in empowered forms of scientific literacy, to enable students engage with

1
For more details see Gott and Roberts (2008).
600 Res Sci Educ (2009) 39:595–624

scientists’ claims and scientific argumentation (Gott and Duggan 2007; Gott and Roberts
2008; Tytler et al. 2001a, b) but that is not the focus of this paper.) As Buffler et al argue:
“Procedural knowledge (in the context of experimental work) will inform decisions, for
example, when planning experimental investigations, processing data and using data to
support conclusions” (p. 1137).
We have referred to these ideas about evidence as ‘the thinking behind the doing’ (not to be
confused with meta-cognitive notions of ‘thinking about one’s own thinking’) and have created
a tentative list numbering some 80 or so of them which we have called the concepts of evidence
(see http://www.dur.ac.uk/rosalyn.roberts/Evidence/cofev.htm). They serve, we argue, as a
domain specification of ideas necessary for procedural understanding. From this perspective,
the procedural component of a curriculum consists of ideas (in effect, a sub-set of substantive
ideas) that form a knowledge-base of evidence that can be explicitly taught and assessed, in a
similar way to the more traditional substantive elements in the curriculum.
The concepts of evidence include ideas about the uncertainty of data (as taught and
researched by Buffler et al. 2001). They also include ideas important to understanding
measurement and data processing, presentation and analysis which may be considered to
be part of the mathematics curriculum but which are essential for understanding
evidence.
The concepts of evidence are a toolkit of ideas integral to the planning and carrying out
of practical investigations with understanding (rather than as a routinised procedure,
Roberts and Gott 2003). We have argued that they are necessary but not sufficient for an
investigation: manual skills, the tacit knowledge described by Polanyi (1966) and, of
course, the more traditional substantive ideas of science are important as well.

This Research

In this research, we seek to explore the importance of students’ substantive understanding and
their understanding of ideas about evidence to conduct open-ended investigations. The
evidence with respect to the importance of these two types of understanding in conducting an
investigation is not clear cut. Previous research (Erickson et al. 1992; Gott and Murphy 1987;
Millar et al. 1994; Ryder and Leach, 1999) points to both playing a role, but these findings do
not give any insight into whether either or both are necessary or sufficient conditions.
Research from the psychological perspective into the role of prior substantive knowledge
and beliefs about causal mechanisms has been shown to affect hypothesis formation, the design
of investigations and the evaluation of evidence (summarised in Duschl et al. 2006 Chapter 5).
Schauble (1996) found that, from a developmental point of view, children’s ability to
investigate in ‘knowledge-rich’, substantively demanding, contexts developed in line with
both their understanding about the procedures of science and their substantive knowledge.
These changes appear to bootstrap each other, so that appropriate knowledge supports the
selection of appropriate experimentation strategies, and systematic and valid experimentation
strategies support the development of more accurate and complete knowledge.
Gotwals and Songer (2006) used a multi-dimensional modelling based on Rasch that
identified 3 dimensions in students ‘inquiry’: content (substantive) knowledge, creating
scientific explanations (which includes aspects such as identifying claims and evidence for
them and associated reasoning), and interpreting data (which includes reading a table and
graph and drawing conclusions from it).
It is also pertinent to note that this research, along with most other work in this area, is
based on post hoc surveys rather than lengthy interventions of the kind we describe here.
Within the English school curriculum, it is not straightforward to disentangle the role of
Res Sci Educ (2009) 39:595–624 601

substantive understanding and understanding of ideas about evidence since there is little
time devoted to the pupils’ conducting their own experiments (House of Commons, Science
and Technology Committee 2002; Roberts and Gott 2004).
In this research, we propose (below) a simple typology of investigations that seem to
require different degrees of substantive understanding for their successful solution
(knowledge-lean through to knowledge-rich).
In our previous research we have taken the view that understanding about evidence is a
necessary ingredient in any open-ended investigation, but is it?
Our research questions, with respect to our sample of undergraduate students, are:
& What are the necessary and sufficient conditions for success in an open-ended
investigation?
& Are the conditions the same in two investigations that involve different degrees of
substantive knowledge?

A Simplified Typology of Tasks

To start with we introduce a simple typology of open-ended tasks within which to


contextualise the research. Of course the boundaries between types are artificial to some
degree. Gotwals and Songer (2006) created tasks with 2 dimensions: the difficulty of the
substantive (content) knowledge and the difficulty of the procedural component. Our
typology categorises tasks along the substantive (content) knowledge dimension. The open-
ended tasks are similar to ‘hands-on’ tasks as described in Solano-Flores and Shavelson
(1997). They identified tasks with ‘high and low inquiry level’: our tasks posed a problem
with just one independent variable—a ‘low inquiry level’. Types 1 and 2 are like Solano-
Flores and Shavelson’s (1997) ‘comparative investigation’ with a categoric independent
variable.2 The tasks presented students with a question (another aspect of ‘low inquiry
level’) but provided no guidance (an aspect of ‘high inquiry level’) and by providing
students with all the available equipment in the lab from which they had to choose
demanded that they make all their own decisions about design and measurement (‘high
inquiry level’).

Type 1—Low on Substantive Understanding (Knowledge-lean)

Here we are talking about tasks which, while being recognisably scientific, are not heavily
dependent on substantive theoretical structures. The example we use in this paper will serve
to illustrate our point. The task is to determine: “Does the material of a ‘helicopter’ affect
the time it takes to fall?”
A ‘helicopter’, made from paper or card, falls in much the same way as an ash or
sycamore seed. We see that, to carry out this task, the student will require very little in the
way of substantive understanding; on the face of it at least. They will rely on simple
measurements of time and distance. We must stress here that explanation of the results does
depend very much on theory but, we argue, a lack of such theory will not preclude a
sensible empirical investigation.

2
We have a Type 3 task, which has high substantive and procedural demands but that is not used in this
research.
602 Res Sci Educ (2009) 39:595–624

Type 2—Substantive Understanding Plays a Helpful Part (Somewhat Knowledge-rich)

The task we reported in our earlier paper falls into this category: “How much do different
surfaces affect how easy it is for a shoe to slide?”
Whilst it is possible to carry out this task with little or no understanding of force and
friction, our experience over a number of years, as well as common sense, tells us that such
an understanding will lead to a better focussed and more efficient solution. More details can
be found at Glaesser et al. (forthcoming).
In both tasks, however, there is reliance on procedural understanding to decide on the
values to be used for the various independent variables, the number of repeats required,
how to handle the repeat data and so on.
As part of our larger research programme, using samples of students on an initial teacher
education programme in the UK, we have found that substantive understanding conjoined
with procedural understanding was important in one particular investigation which had
some substantive content, corresponding to type 2 above (Glaesser et al. forthcoming). In
the study reported here, we employ an additional sample who conducted a different
investigation which had practically no substantive content, corresponding to type 1 above.
The groups who conducted the two investigations are similar with respect to age, sex, social
background and prior attainment, and they have been taught the same programme. The
main difference then is that they carried out investigation tasks with varying degrees of
substantive content. We expect only procedural understanding to be linked to the
performance on the investigation corresponding to type 1. In addition, using a slightly
different method, we expect to confirm the finding from the earlier paper where both
substantive and procedural understanding could be shown to be linked to performance on
the investigation which involved some substantive content, corresponding to type 2 above.
The method we have chosen to analyse this question and the respective roles of
substantive and procedural understanding is Charles Ragin’s Fuzzy Set Qualitative
Comparative Analysis (fs/QCA) (Ragin 2000). It will be explained in some detail below.
Here we shall just note that it is well suited to uncovering the structure in the relationship
among cases and variables, focussing on conjunctions of causes in doing so. Thus, it stands
in contrast to correlational approaches such as regression analysis which attempt to
determine the net effects of variables on some outcome, while quantifying average effects
and differences. In addition, fs/QCA is suitable for dealing with the problem that the two
groups we are comparing have performed rather differently, as we know from preliminary
analysis. This is because it is more concerned with the structure of the relationship than
with absolute levels of outcomes achieved.

Teaching Programme

During the first year of their undergraduate course, the students took a science module of
22 weeks duration which teaches ideas necessary for understanding fundamental ideas in
chemistry (substance and chemical change) and physics (force); approximately 11 weeks of
each. The course is specifically targeted at developing a deep understanding of a limited
number of basic ideas. This module spends far longer on force and motion (ideas arguably
relevant to our type 2 task) than is traditional in schools as our experience is that little real
understanding of the ideas remains by the time they reach university, even amongst those
with science A levels (the UK post-16 qualifications).
Res Sci Educ (2009) 39:595–624 603

During the second year, they were taught a module on ideas about evidence. There was
minimal substantive science content involved and the context was kept as simple as
possible in order to be able to concentrate on the procedural ideas. The course covered
explicit teaching of some of the concepts of evidence, such things as ideas underpinning
validity and reliability, experimental design, measuring instruments, uncertainty and
variation in repeated readings, descriptive statistics and presenting data in tables and
graphs.
We should note here that the students were taught and assessed by one of us. The
research project had been granted approval by the university’s ethics committee, and the
students knew that participation in the research was entirely voluntary and that their data
were to be treated confidentially. Nevertheless, we are aware that this situation is not ideal
and we cannot rule out that some of them felt compelled to take part in the research because
of the overlap with the teaching.

Sample and Measures

Sample

We are using a sample of undergraduate university students in the UK who carried out an
open-ended investigation and wrote up their procedure and findings. We do not make any
claims as to the generalisability of our findings, rather, we present this as a case study
which may serve to give an indication of the respective roles of substantive understanding
and an understanding of the ideas about evidence in these particular investigations and also
to illustrate a possible approach to the problem of identifying necessary and sufficient
conditions associated with successfully conducting an investigation.
The sample consists of second year undergraduate primary education students. They are
fairly typical for this group of students in that most of them are female and there are some
mature students. Table 1 gives details of the sample. We have named the two groups
“helicopter group” and “shoe group” respectively to indicate which investigation task they
carried out.

Table 1 The sample

Helicopter group Shoe group

N 51 73
Age Mean: 23.9 years. 33% are over 23 Mean: 22.9 years. 26% are over 23
Sex 49 (96.1%) female, 2 (3.9%) male 62 (84.9%) female, 11 (15.1%) male
Social background 37 (72.5%): service class 50 (68.5%): service class
(i.e. professional/managers) (i.e. professional/managers)
14 (27.5%): others 23 (31.5%) others
GCSE in science 28 (54.9%) have at least B or equivalent 40 (54.8%) have at least B or equivalent
Access to university 26 (51%): at least three A levelsa 49 (67.1%): at least three A levels
25 (49%): other access route 20 (27.4%): other access route
4 (5.5%): no data
a
A level is short for Advanced Level, the highest secondary school qualification on offer in the United
Kingdom. Three A levels are usually the university entry requirement. For the course our students attend, the
required A level grades are 3 Bs.
604 Res Sci Educ (2009) 39:595–624

Instruments

It is worth noting that while we have attempted to measure each of the variables employed,
we cannot say that the score on any of these instruments is a measure of understanding
alone; the effect of confidence, stress of assessment etc. may also play a part.

Prior Attainment: GCSE

The General Certificate of Secondary Education (GCSE) is an examination taken at the age
of 16, at the end of compulsory schooling in England and Wales. Pass marks range from A*
to G, and five or more A*-C grades are usually required in order to continue to A level
which offers the opportunity of going on to university.
We have used here the data on students’ GCSE (or equivalent) grades in science as a
measure of prior attainment in science. The GCSE grades of our two groups are given in
Table 1.

Science Module Exam

This was taken approximately 4 months prior to the start of the evidence module, at the end
of the science module which was taught during the first year of the course. The examination
attempted to measure understanding rather than depending on the complex recall associated
with GCSEs (House of Commons, Science and Technology Committee 2002). As might be
expected, correlations between university exam marks and GCSE scores are relatively low,
separated in time and conception by some distance as they are. The exam was marked
against a percentage scale. The results from the questions relating to physics (forces) were
highly correlated with those relating to chemistry, which is why we use the score of the
exam as a whole in the analysis. The mean for the helicopter group was 56.6, standard
deviation 13.7. For the shoe group, the mean was 60.9, with a standard deviation of 13.2.
We can see that the two groups are fairly similar on this measure.

Evidence Test

Students’ understanding of the procedural ideas was assessed by means of a written test.
The test targets ideas associated with measurement, experimental design and data analysis.
Written probes to assess aspects of procedural knowledge have been used by other
researchers (see, for instance, Buffler et al. 2001; Germann, Aram and Burke 1996;
Germann and Aram 1996; Gotwals and Songer 2006; Lubben and Millar 1996). We have
used this test elsewhere (Gott and Roberts 2004; Roberts and Gott 2003, 2006) and it has
been refined as a consequence. The pre-test was taken immediately prior to the teaching of
the evidence module. It took up to an hour for some students to complete and it comprised
some 17 items spanning the concepts of evidence. Although set within everyday biology,
chemistry or physics contexts, the questions required minimal understanding of the
substantive ideas. A handful of the sub-items were dropped from subsequent analysis as a
result of the item analysis. At the end of the module students were asked to complete a post-
test. This was a subset of the pre-test (time did not allow for a full repeat of the pre-test)
chosen on the basis of a combination of factors.
& Facility and discrimination: avoiding items with high facilities which would be likely to
run out of headroom
Res Sci Educ (2009) 39:595–624 605

& Richness: items which, on the pre-test, gave interesting responses


& Spread: across the various concepts of evidence
Pre- and post-tests were scored using the same coding system. A sample of 10 tests was
marked independently by the authors and inter-marker checks showed a consistent
application of the mark scheme. As this paper is concerned with the procedural
understanding the students have at the end of the taught module and not the improvement
as a result of the module, we only use the post-test results here. These were, for the
helicopter group, a mean of 67.2, standard deviation 11.2 and for the shoe group a mean of
68.1, standard deviation 14.2 on a percentage scale again. Here, too, the two groups are
very similar.

Investigation

The two groups of students completed investigations which differed in the amount of
substantive understanding required. The helicopter group conducted an investigation which
corresponds to type 1 described above, with only minimal substantive understanding
necessary. The task set was: “Does the material of a ‘helicopter’ affect the time it takes to
fall?”
The shoe group conducted an investigation which corresponds to type 2 described
above, i.e. some substantive understanding was required. Their task was to find out: “How
much do different surfaces affect how easy it is for a shoe to slide?”
Research by Solano-Flores et al. (1999) attempted to produce ‘skills’ templates for the
tasks in order to control better the task difficulty. That research pointed to the inherent
difficulty of such an approach due to the complex interaction of procedural and
substantive understanding and context etc. in the task. We will return to this problem in
the discussion section. For now it is worth comparing some key procedural demands of the
two tasks.

Similarities
& In both tasks the independent variable is categoric: material of the helicopter and
surface on which the shoe slides.
& This points to particular ways of representing the data in tables and bar charts.
& The dependent variable could be measured easily and quickly, allowing scope for many
repeated readings, and therefore ways of handling the data using standard statistical
techniques.
Differences
& A slightly more ‘scientific’ context for the shoe task which involves force meters rather
than rulers and stopwatches.
Both groups gave written accounts of their procedure and their findings which served as
the basis for the marking. This can be considered a suitable surrogate for direct observation
(Baxter et al. 1992; Gott and Murphy 1987; Welford et al. 1985) which would have been
too time-consuming. While Haigh (1998) found that students’ written accounts did not
necessarily reflect the subtleties of students’ ideas, we encouraged students to write about
‘the thinking behind their doing’ rather than write a more formal ‘apparatus, method, results
conclusion’ account. Martin (1993) argues that a research report is an opportunity to
persuade the reader of the reliability of the scientific claim, and we explicitly encouraged
606 Res Sci Educ (2009) 39:595–624

the students to justify their decisions and reasoning in a coherent account. This approach to
writing has been adopted successfully by Toh and Woolnough (1994). In the marking of the
written accounts of the investigations, credit was given to students’ accounts where the
concepts of evidence were explicitly and correctly applied. No credit was given to
substantive understanding as such, only where it informed the procedure, i.e. in the
selection and operationalisation of the variables etc. That is, we gave no credit to
explanations of mass and gravity or forces and friction which were not the focus of the task.
The results for the two groups are given in Table 2. As we can see, the performance of
the helicopter group was considerably better than that of the shoe group. This poses
questions concerning the comparability of the two tasks. We cannot rule out that they differ
not only in the amount of substantive science content involved, but also in difficulty. We
argue that in order to analyse the relationship between contributing factors and outcome, the
absolute level of the outcome the students achieved is not crucial. Our analysis is based on
necessary and sufficient conditions contributing to an outcome, and on the identification of
conjunctions of conditions. Thus, it is concerned with the structure of the relationship and
not just with the absolute level of the outcome each individual achieves. Employing
Qualitative Comparative Analysis with fuzzy sets (fs/QCA) enables us to do this. The
method is explained in the next section. In addition, we shall be using the results section to
explain further the use and the features of fs/QCA.

Method

Fuzzy Set QCA Explained

The fuzzy set approach will need to be explained in some detail as it is not likely to be
familiar to most readers.
Traditionally, quantitative social scientists use various forms of regression analysis in
their research. The rationale behind this approach is the identification of net effects of
independent variables on a given dependent variable. It attempts to mirror the nature of
some experiments. The underlying model is a linear and additive one, i.e. the effects of the
different independent variables are thought to be linearly related to the outcome and to act
independently of each other.
This approach has been criticised by various authors for a number of reasons (see for
example Abbott 1988; Lieberson 1985; Ragin 1987, 2000, 2006a). “Net effects thinking”,
as Ragin (2006a) calls it, does not adequately reflect social reality. In the social world, the
effects of different variables cannot easily be separated from each other. Instead, it is quite

Table 2 Investigation results


Helicopter investigation Shoe investigation

Mean 61.8 53.2


Std. Dev. 17.3 19.4
Minimum 23 0
Maximum 92 96
1st Quartile 51.0 43.0
Median 61.0 53.0
3rd Quartile 78.0 65.5
N 51 73
Res Sci Educ (2009) 39:595–624 607

Table 3 Simple implication:


sufficiency ‘if A, then O’ expressed in terms of inclusion, sufficient relationship

A Not A
a O Present Possible
Based on Boudon (1974) as
discussed in Cooper (2005, Not O Excluded Possible
2006).

common that the effect of one variable depends on the presence or absence of another, or it
might only be relevant for one group, but not for another (see also Ragin 1987).
A related issue is the existence of multiple causation. Alternative pathways of varying or
equal importance leading to a particular outcome may exist, with neither of them being
sufficient or necessary. This is difficult to pick up with a regression analysis.

The Analysis of Sufficient and Necessary Conditions

As noted above, the Qualitative Comparative Analysis (QCA) approach is an alternative to


regression-based methods. It attempts to identify causal configurations of conditions which
are associated with the outcome. The focus is on the case considered holistically rather than
on the relations between variables. Conjunctions of causes and the existence of alternative
causal pathways are taken into account. In accordance with this alternative causal model,
Charles Ragin has developed a method3 which reflects configurational thinking about
causation (Ragin 1987, 2000).
Originally, QCA was developed for use with small n datasets. Ragin is a political scientist,
and one research interest in political science is the analysis of differences between countries.
Researchers in this field often have deep knowledge of the cases, i.e. countries, under study,
but only a limited number of cases is available. This does not lend itself to traditional
quantitative research methods, but QCA is well suited to it, since not all of the requirements of
regression analysis apply. Recently, the use of QCA with large n datasets has been explored
(e.g., Cooper 2005, 2006; Cooper and Glaesser 2007; Ragin 2003), but it remains a useful
tool for analysing small to medium-sized datasets as ours. Examples of QCA being used
with such datasets can be found in Ragin’s book which introduces QCA (Ragin 1987).
Applying QCA involves the identification of necessary and sufficient conditions for a
given outcome. The underlying principle is a set theoretic approach which involves
determining subset relations. We start by introducing the crisp set approach. Crisp sets are
those where a case is simply in or out of a set (e.g., the set of males), unlike fuzzy sets,
where varying degrees of memberships are possible.
Consider Table 3 and the Venn diagram in the left hand panel of Fig. 1. In logical terms,
condition A is sufficient for outcome O, that is, whenever A occurs, O will occur, as can be
seen from the left hand column of Table 3. This does not mean that A is necessary for O to
occur, there may well be other conditions4 sufficient for O, as indicated by the right hand
column of Table 3. In set relation terms, A constitutes a subset of O.
In the real world, relations are less than perfect. Therefore, it is necessary to consider
instances of weaker implication, that is, the relative frequencies of cases rather than simple
presence or absence (Cooper 2006). A diagrammatic representation is given in the right

3
Together with others, he also has developed the software fs/QCA (for “fuzzy set/Qualitative Comparative
Analysis”) (Ragin et al. 2006) which performs the required analyses. This is the software we use.
4
We deliberately avoid the use of the terms “cause” or “causal condition” as the relationships described here
are patterns of association. Causal statements can only be made based on theoretical considerations.
608 Res Sci Educ (2009) 39:595–624

Fig. 1 Sufficiency

Perfect sufficiency Near sufficiency

hand panel of Fig. 1. This can be illustrated by adding some numbers to Table 3. They
represent a number of cases with the relevant conditions (Table 4).
Out of all the cases with condition A, 90% experience O. This high percentage indicates
that A is “nearly always sufficient” to obtain O. These 90% in our example can be referred
to as the degree of consistency with which O is obtained given A. This table can also be
used to introduce the concept of explanatory coverage which plays a role in set-theoretic
analysis analogous to that of variance explained in a regression analysis. In Table 4, for
example, 240 cases achieve the outcome. Of these, 90 have condition A. Coverage in this
case is the simple proportion of 90 divided by 240, 0.375. It can be seen that this index
records the proportion of the outcome set that is covered by the explanatory set. Clearly, in
this case, there must be other causal paths to the outcome. It is not necessary to have
condition A in order to achieve the outcome, a fact reflected in the low coverage figure.
We now need to introduce necessity at this point. Consider a variation of Table 3
(Table 5). Here, A constitutes a necessary condition for O, that is, without A, O cannot occur.
In set theoretic terms, A is a superset of O. This is illustrated in the left hand panel of Fig. 2.5
Again, we have to consider the possibility of less than perfect necessity, as shown in the right
hand panel of Fig. 2.
Adding some numbers to illustrate the necessity relation, we get Table 6. Here, nearly all
of the cases with the outcome O have experienced A. This points to A being a (nearly
always) necessary condition for O. The proportion of cases with O who have previously
experienced A is 0.90. This 0.90 is a measure of the consistency of these data with a
relationship of necessity.6
Note that this does not make any claims about A’s sufficiency: from the left hand
column, it is clear that if even A is present, O does not usually occur which points to the
possible need for further conditions which have to present in order for O to usually appear.

5
Note that, in conducting research, temporal order and substantive knowledge need to be used in
determining the causal order, i.e. the difference between Fig. 1 and 2 lies in what is considered cause and
effect. It is conceivable that this may vary or not be clear in a research situation. For our purposes, however,
we have decided that A is the cause and O the outcome. The determination of sufficiency and necessity is
based on this decision.
6
In this simple case, another way of thinking about consistency/sufficiency and coverage/necessity is in
terms of inflow and outflow: in a crosstabulation such as Table 4, the proportion of condition A in O which
we called consistency can be called outflow because it refers to the percentage of people with A who
subsequently obtain O. The proportion of O with condition A as described in Table 6 (called coverage) can
also be called inflow because it refers to the percentage of people with O who got there after having also
experienced A.
Res Sci Educ (2009) 39:595–624 609

Table 4 Weaker implication:


sufficiency Weaker implication, sufficient relationship:
‘if A, then (nearly always) O’

A Not A
O 90 150
Not O 10 50

So far, in this section we have only considered the case of two variables, one
independent and one dependent. However, in the social sciences we usually find more than
just one independent variable or causal condition, and we want to consider all the relevant
ones simultaneously. This is where causal configurations come into play. In our example, it
is possible to add the condition B. This results in Table 7.
In order to determine the consistencies of the possible configurations, it is useful to
represent the data in what is called a truth table (Table 8). 1 denotes presence of a condition
(or membership in the target set), 0 denotes absence (or non-membership in the target set).
Here, all the four configurations which can be obtained using the conditions A and B are
listed and the respective proportions of their members obtaining O are given. These
proportions represent the consistencies of the configurations with regard to their achieving
O. The rows of the truth table have been sorted into descending order of consistency. The
coverage of a particular configuration can be obtained by calculating the proportion of the
cases with a given configuration, say A and B (the first row in the truth table, Table 8) out
of all the cases with the outcome O, in this case 35.4% (85 out of 240 cases; cf. the top left
hand cell of Table 7).7 It is conceivable that there is more than one configuration leading to
the outcome. In this example, we might argue that the consistency values of 0.944 and 0.8
are both high enough for us to decide that these configurations can be considered to be
usually sufficient for the outcome. In other words, there is a reasonably high proportion of
cases with the configurations “A and B” as well as “not A and B” obtaining the outcome.
This leads to the issue of set theoretic notation and minimisation. Set intersection or
logical AND is indicated by *. Set union or logical OR is indicated by +. Membership in a
set is indicated by upper case notation, non-membership or logical negation is indicated
either by ∼ or by lower case notation which is what we use here. Looking at the example
given above, the solution obtained can be noted as follows: A*B + a*B. This solution can
be written in a simpler form, using logical minimisation, i.e. simply B.
Finally, another point can be illustrated using this example. First of all, there is an
element of choice in that the researcher decides what level of consistency is still to be
considered acceptable when choosing a solution.8 In our example, if someone were to argue
that any proportion higher than 0.55 obtaining the outcome indicates (near) sufficiency9,
this would give the following configurations:
A*B þ a*B þ a*b ð1Þ

7
Note that it is not possible directly to obtain the number of cases with the outcome from a truth table such
as Table 8. It can be calculated from the number of cases in a given row together with the consistency figure,
which in effect is the proportion.
8
Going through various levels of consistency instead of choosing a single one brings out the relative
importance of conditions which can be very instructive. Cooper (2005; 2006) makes use of this approach.
9
This would be a rather generous threshold, however, and was chosen only in order to demonstrate a
solution with several pathways. It is more common to choose a threshold of at least 0.70.
610 Res Sci Educ (2009) 39:595–624

Table 5 Simple implication:


necessity ‘O, only if A’, necessary relationship

A Not A
O Present Excluded
Not O Possible Possible

This solution can be further simplified, resulting in a + B. Using the software fs/QCA
(Ragin et al. 2006), it is possible to calculate consistencies for the individual configurations
in a solution and for the solution as a whole. In addition, coverage is also given both for the
individual configurations and the whole solution. Often, there is some overlap between the
configurations found: in our example with the solution a + B, there are cases belonging to
both configurations, i.e. all the ones with conditions a*B. In calculating the coverage for the
configurations contributing to a solution, there are two coverage figures given. Unique
coverage refers to the proportion of cases with the outcome accounted for by cases with the
configuration without the overlap, raw coverage refers to the proportion of all cases,
including those who are also covered by the other configuration given in the solution
(Ragin 2006b).
fs/QCA gives the following output for our example:
raw unique
coverage coverage consistency
---------- ---------- -----------
a + 0.625000 0.125000 0.750000
B 0.854167 0.354167 0.854167
solution coverage: 0.979167
solution consistency: 0.810345

The unique coverage figure given for “a” refers to the 30 cases obtaining the outcome
(out of 240) who do NOT have B, that is, which are not covered by the other part of the
solution. The raw coverage figure refers to all those cases without the condition A (i.e.,
“a”), regardless of whether B is present or not, i.e. 150 out of 240 who obtain the outcome.
In the same way, the unique coverage given for B refers to the 85 out of 240 cases who are
not also a (i.e. “A”). The overlap, i.e. a*B can be obtained by subtracting the unique
coverage figures from the solution coverage. In our example, the resulting figure is 0.5.
This refers to the 120 cases with a*B who experience O (cf. Table 7).

Fig. 2 Necessity

Perfect necessity Near necessity


Res Sci Educ (2009) 39:595–624 611

Table 6 Weaker implication:


necessity Weaker implication, necessary relationship: ‘O, only if A’

A Not A
O 90 10
Not O 150 50

We can see that there is considerable overlap between the two configurations leading to
the outcome: the unique coverage is fairly low for both of them compared to the respective
figures for raw coverage, indicating that there are many cases in the configuration a*B. This
is in line with what can often be observed in the real world: causes or conditions tend to
occur together, which can make a regression analysis with its assumption of independence
of variables and attempt to determine net effects of questionable value.

Fuzzy Sets

So far, we have only been concerned with dichotomous data, that is, with the simple
presence or absence of conditions, or membership versus non-membership in a crisp set.
However, sometimes we want to use a finer measure of membership, thus allowing for the
fact that there may be degrees of membership. For this purpose, we can assign fuzzy
membership scores, which indicate full, partial or non-membership in a set. For example, it
is not always straightforward to decide whether someone is an adult or not. A person who is
10 years old is not in the set of adults, but a person who is 30 years old certainly is. The
case of a twenty-year-old, however, is more difficult to decide. Therefore, different fuzzy
scores are allocated, indicating the degree to which somebody is a member in the target set.
In the example, a score of 0.9 could be allocated to the twenty-year-olds, indicating that
they are almost, but not fully in the set of adults (Kosko 1994).
In analogy to the crisp set approach, set intersection, union and negation are employed in
the analysis of fuzzy sets. In Ragin’s fs/QCA, to obtain the intersection of fuzzy sets, the
minimum set membership score of all the sets in question is employed; to obtain union, the
maximum is employed, and to obtain negation, the fuzzy score is subtracted from 1.
The calculation of consistency and coverage (both raw and unique) is then also possible
with fuzzy sets, albeit in a slightly more complicated way than in the crisp case. Several
approaches exist. The simplest approach involves testing the proportion of cases where the
fuzzy score of the condition is less than or equal to the fuzzy score of the outcome. Another
method uses an aggregating index that considers a fuzzy analogue of the degree to which
condition and outcome sets overlap. Ragin’s truth table algorithm which is implemented in
the current version of the fs/QCA software uses such an aggregating index. As in the crisp
case, the subset relation of the two sets, the cause and the outcome is taken into account.
For a full explanation of the calculation of fuzzy consistency see Ragin (2005, 2006b) and

Table 7 Two conditions


A Not A

B Not B B Not B Total

O 85 5 120 30 240
Not O 5 5 30 20 60
Total 90 10 150 50 300
612 Res Sci Educ (2009) 39:595–624

Table 8 Truth table


A B Number of cases Proportion obtaining O

1 1 90 0.944
0 1 150 0.800
0 0 50 0.600
1 0 10 0.500

Cooper (2006) for a summary. An example is used in the results section of the present paper
to illustrate the basic principle of fuzzy consistency.
The consistency and coverage figures associated with a solution obtained through QCA
(crisp or fuzzy) give an indication as to how good the solution is.
To summarise: fs/QCA allows for the analysis of conjunctions of causes and alternative
causal pathways. For example, in the educational context, it might be that high
mathematical ability is sufficient to pass GCSE, but that very hard work, supplemented
by private tuition, could alternatively lead to a pass. Unlike regression analysis, fs/QCA
does not attempt to quantify net effects of variables, but treats cases holistically. This can be
done either in the crisp context or, as we have done in this paper, through the use of fuzzy
sets which allow a more differentiated treatment of the conditions under investigation. QCA
is particularly suited to sample sizes like ours, i.e. ca. 50–100 cases, which are not small
enough to conduct an in-depth qualitative analysis, but too small for many of the
requirements of regression based methods and inferential statistics.

Calibration

Finally, we need to address the issue of calibration, that is, the allocation of fuzzy scores.
In the crisp case, calibration simply requires the setting of a threshold above which we
regard the score as representing ‘success’ or membership of the target set, whereas fuzzy
scores range from 0 to 1, indicating degree of membership. As explained above, 1 stands
for “fully in the set” and 0 for “fully out of the set”. The point of maximum ambiguity is
0.5: this score indicates that it is impossible to decide whether a case is more in or more
out of the target set. In allocating fuzzy scores to the values from an ordinal, categorical,
interval or ratio scale, it is therefore possible to allow for the fact that variation is not
equally important at all points of the scale. Consider the example of the set of adults
above: there may be some debate about the age group between, say, 16 and 24, but
variation above and below the critical points does not matter with regard to the concept of
interest. Both 30- and 60-year-olds are in the set of adults, and the difference on the scale
of their ages becomes meaningless with respect to adulthood. Fuzzy scores reflect this,
which is why there is a need for careful calibration.
Various methods exist. One of them is “expert” judgement (see Cooper 2006), which
Verkuilen (2005) calls the method of direct assignment. It is most appropriate when a
researcher has detailed knowledge of the cases under study, or of the underlying concept or
scale. In the case of the GCSE science mark scheme we do have detailed knowledge of the
meaning of the results which is why we have applied the method of direct assignment here,
based on many years of experience with the UK’s public exam system (see Table 9).
Note the verbal labels attached to the fuzzy values. They give an indication of the way
the fuzzy scores are meaningful, as implied above.
Clearly, in the case of an interval scale like the marks in the science exam, the evidence test
and investigation tasks, we cannot apply such a direct approach. Neither do we have detailed
Res Sci Educ (2009) 39:595–624 613

Table 9 Fuzzy scores for GCSE science

Fuzzy score Verbal label GCSE grades

1 fully in the set of able according to GCSE A* or A


0.67 more in than out of the set of able B
0.33 more out of than in the set of able C
0 fully out of the set of able D or lower

enough knowledge of every single case we are studying, so cannot assign fuzzy scores directly
to individuals. Instead, we need to transform these scales. To do this, we use a logistic function
(Smithson and Verkuilen 2006).10 Ragin points out the merit of such an approach:
Working in the metric of log odds is useful because this metric is completely
symmetric around 0.0 (an odds of 50/50) and suffers neither floor nor ceiling effects.
Thus for example, if a calibration technique returns a value in the log of odds that is
either a large positive number or large negative number, its translation to degree of
membership stays within the 0.0 to 1.0 bounds which is a core requirement of fuzzy
membership scores (Ragin, forthcoming).
This way of calibration involves fixing three crucial values: a threshold below which an
individual is out of the target set, a threshold above which they are in the target set, and a
crossover point, corresponding to the point of maximum ambiguity (0.5) explained above.
The original interval scale is thus transformed into fuzzy values. We do not have space here
to fully explain the approach (details can be found in Ragin, forthcoming).
We have plotted the raw scores of the science exam results of the helicopter group
against the fuzzy scores by way of example (Fig. 3).
Note that variation outside the upper and lower thresholds respectively becomes
increasingly less important once the scales have been transformed into fuzzy scores—the
gradient at these points is low indicating that a change in raw score makes little impact on
the fuzzy score. On the other hand, the curve is much steeper in the middle, indicating the
greater impact a small change on the raw scale can have within this range.
An example may serve to illustrate this point: yearly income might usefully be
recalibrated into a fuzzy measure of poverty. In that case we would define a cut-off point
above which an individual would not be categorised as belonging to the set of “poor
people”, i.e. all the cases above this point would be given a fuzzy score of zero with respect
to their membership in the set of “poor people”. Obviously, their income can still vary
considerably, but what they all have in common is that they do not suffer the negative
effects of being poor.
This form of calibration is, of course, analogous to the calibration of any scientific
instrument. In most cases, such instruments are designed to be precisely linear and so need, in
theory, calibration only at the end points of the scale (0 and 100 for a thermometer for instance).
However some instruments are not linear and calibration is then carried out in a similar way
outlined here for the fuzzy score: we determine the end points of the scale, and then fit a curve
between them using intermediate calibration points to fix its position and curvature.

10
Other methods for transforming a continuous scale exist, for example, applying a linear filter (Cooper
2006, Smithson and Verkuilen 2006), or the “totally fuzzy and relative” (TFR) approach (Cheli and Lemmi
1995; Cooper 2006) where there is minimal judgement on the researcher’s part involved.
614 Res Sci Educ (2009) 39:595–624

Fig. 3 Plot of fuzzy scores 1.0

0.9

0.8

0.7

fuzzy scores
0.6

0.5

0.4

0.3

0.2

0.1

0.0
20 40 60 80 100
raw scores (percent)

Obviously, the question of calibration is not straightforward and critical readers may
point out that there is a certain amount of arbitrariness involved in the allocation of scores.
This is true whenever categories are constructed as a basis of analysis.11 The merits of this
approach are the possibility of taking theoretical considerations into account, thus allocating
cases more meaningfully to their categories. Finally, we should like to note that with any
form of measurement, measurement error may be present. The fuzzy scores will only be as
good as the scales they are based upon.
All three scales, science exam, the evidence test, and investigation tasks, originally were
percentage scales, that is, with scores ranging from 0 to 100. Incidentally, the university
criterion for a pass is 40%, but this is not a crucial criterion for our purposes since we need
to decide what level of ability the percentage scores reflect, i.e., the extent to which they
indicate membership in the target sets. Details of the thresholds we have chosen and the
reasoning for them are as follows:

Investigation

The two investigations are coded in the same way for the two groups because marks were
allocated against the same criteria each time. We should like to draw attention to the fact
that, as noted above (cf. Table 2), the two groups had performed rather differently on their
respective tasks. If we were to use crisp sets, we would have difficulty in drawing a
meaningful boundary between membership and non-membership in the set of “performed
well”: if the boundary is too high, we will not have enough cases in the “performed well”
set in the shoe group; if it is too low, we will have too many cases in the “performed well”
set in the helicopter group in order to conduct a sensible analysis. The use of fuzzy sets
avoids this problem, since fuzzy scores are allocated so that the full range of raw scores can
be taken into account.

11
Another instance of arbitrariness is the choice of threshold for conventional significance testing in
inferential statistics.
Res Sci Educ (2009) 39:595–624 615

Upper Threshold 80. Students at this level demonstrate clear understanding of the ideas
required to perform the investigation competently. They are able to select the relevant
concepts of evidence for each stage of the task, synthesise them to address the problem and
evaluate the quality of the evidence they collect.

Crossover Point 55. Students at this level are on the borderline of showing some
understanding which is required to conduct the investigation, but not enough to be able to
be clearly denoted either more in or more out of the set of those doing well on the
investigation.

Lower Threshold 35. Students who have scored below this threshold show little
understanding of the task, certainly not enough to conduct the investigation with even a
rudimentary amount of sense.

Evidence Test

Some of the thresholds set are slightly higher than the ones for the investigation. The
evidence test had closed questions which made it easier for students to gain at least the
initial marks allocated within each question.

Upper Threshold 80. Students who gained marks above the upper threshold of 80
demonstrated a clear understanding of the ideas underpinning validity and reliability when
asked these short closed questions. They were consistently correct in applying their ideas to
the different contexts of the questions.

Crossover Point 60. Students who achieve this mark have a feel for the importance of
validity and reliability of empirical data, but their understanding of the concepts of evidence
is not consistently secure.

Lower Threshold 45. A student with a mark of less than 45 has very little understanding of
ideas underpinning the validity of the design of investigations or the reliability and handling
of varied data.

Science

Upper Threshold 75. Students who gain a grade of 75 or above have a sound understanding
of nearly all of the topics examined. Such students have thoroughly grasped the main ideas
taught in the module.

Crossover Point 55. Students with this mark have demonstrated some deeper understand-
ing on at least some of the topics examined, but this is not consistent across the course.
They will have demonstrated the potential to understand some science topics but will have
gaps in their understanding. It cannot therefore be decided whether they are more in or
more out of the set of those who are able in terms of their performance on the science exam.

Lower Threshold 40. The lower threshold of 40 is also the lowest pass mark for the
module. Any student below 40 shows no real understanding of the subject and is unlikely to
616 Res Sci Educ (2009) 39:595–624

make substantial progress even if allowed to improve on their assessment score. Such a
student would have little confidence in science, gaining marks more for recall than higher
levels of understanding. Such students typically gain marks without being able to apply the
ideas consistently; they will often demonstrate confusion about the same idea in a later part
of the question.

Results

Helicopter Group

We shall use the presentation of the results to further explain the method, fs/QCA, and to
point out its specific qualities. The starting point of the analysis is a model such as HELI_F =
function (EVPOST_F, SCIENCE_F, GCSE_F), where the relationship of the outcome, the
helicopter investigation (HELI_F), with the conditions evidence post test (EVPOST_F),
science exam (SCIENCE_F) and GCSE (GCSE_F) is analysed in Boolean terms. We recall
that capital letters in the crisp set context stand for the presence of a condition. The suffix _F
indicates that the conditions are fuzzy. The first step is to produce a truth table. In the fuzzy
case, this is not quite as straightforward as in the crisp case explained above, because a case
can have non-zero membership in more than one configuration. We noted above that negation
is obtained by subtracting the fuzzy score from 1, so that in the example of adults, someone
with a membership score of 0.9 in the set of ADULTS has a membership of 0.1 in the set of
adults (the non-adults). The truth table therefore lists the cases whose membership in the
relevant configuration is > 0.5.12 Table 10 is the truth table for our Boolean equation. As
usual, it is ordered by consistency—highest to lowest—for the different rows, where the rows
represent configurations of conditions. Three of the rows have only two or three cases each.
As Ragin (2003) points out, unless the dataset is very small, it is difficult to have detailed
enough case knowledge to be reasonably confident that there is no coding or measurement
error. Obviously, we cannot ever rule out measurement error, but if there are enough cases in
a row, and any error is random, then we can be more confident that it will not influence the
analysis unduly. Ragin (2005, p. 9) suggests that we “establish a frequency threshold for the
relevance or viability of causal combinations” which is why we have decided not to include
any rows with n < 4 in the analysis.13 The values in the outcome column, “HELI_F” in our
case, have to be entered by the researcher, to reflect the chosen threshold for consistency. The
example of a threshold set at 0.85 is shown in this table. There are two approaches to
choosing such thresholds for consistency: there must be a high enough level of near-
sufficiency and it is becoming conventional in this field to use 0.7 as a minimum level in
fuzzy set analysis (e.g., Ragin 2005). We can see that using a 0.7 threshold would, given the
difficulty of this task, allow all the configurations into the solution (which would have a
coverage index of 1). This is equivalent to saying that, as far as this level of sufficiency is
concerned, none of the configurations fail to be sufficient for the outcome. Since we wish to

12
In most circumstances (and this is true given our calibrations), each case will have just one such
membership > 0.5. This situation ceases to be the case where the dataset includes cases with exactly 0.5
values (for details see Ragin 2005).
13
Of course, the fact that some rows are low in n is informative in itself. It indicates so-called limited
diversity, i.e. the finding that, empirically, some combinations of conditions are rare or non-existent.
Res Sci Educ (2009) 39:595–624 617

Table 10 Truth table helicopter dataset


EVPOST_F SCIENCE_F GCSE_F number HELI_F consistency
0 1 0 2 0.879638
1 1 1 14 1 0.864196
0 1 1 2 0.845854
0 0 0 5 0 0.776421
1 1 0 11 0 0.7689
1 0 1 9 0 0.759778
1 0 0 5 0 0.75753
0 0 1 3 0.756087

explore the ways these different configurations contribute to sufficiency, we clearly need to
explore some other solutions employing a threshold higher than 0.7. It is also important to
take account of major breaks in the levels of consistency in the final column of the truth table,
which is what we have done here: we have decided not to include the first and third (and last)
rows due to the small n, so there is a distinct jump from the consistency of the configuration
EVPOST_F*SCIENCE_F*GCSE_F to the next one, evpost_f*science_f*gcse_f. After that,
the consistencies are fairly similar from one row to the next, so it would seem arbitrary to
include one but not the next in the analysis. We are then left with only the first row of the
truth table entering the Boolean solution, as indicated by the “1” entered in the column
“HELI_F”. A “0” indicates that the row does not enter the logical minimisation procedure,
and a blank indicates that the row has been left out of the analysis altogether because of small
n. The rows in the analysis have been shaded in grey.
As only one row is entered into the minimisation procedure, the result is straightforward:
the solution obtained combines all three conditions, i.e. post evidence test, science and
GCSE. In addition to the solution, the software fs/QCA also gives the coverage and
consistency figures.
** TRUTH TABLE SOLUTION omitting rows with n < 4**

raw unique
coverage coverage consistency
---------- ---------- -----------
EVPOST_F*SCIENCE_F*GCSE_F 0.459894 0.459894 0.864196
solution coverage: 0.459894
solution consistency: 0.864196

The output first lists the raw and unique coverage and consistency figures for this
combination of conditions. The raw and unique coverage figures are identical (0.459894)
because it is the only combination identified. For the same reason, the raw and unique
coverage figures are also identical with the “solution coverage” given in the next line, and
the consistency (0.864196) for this combination is identical with the “solution consistency”.
The solution then implies that the combination of all three conditions is sufficient at the
chosen consistency level for the outcome to be achieved. The consistency with sufficiency
is fairly high at 0.86. Coverage is not very high, however, indicating that the solution we
have identified is not necessary for the outcome, i.e. there are other pathways which also
may lead to it (as we saw when we discussed what would happen if we were to set the
618 Res Sci Educ (2009) 39:595–624

Fig. 4 Consistency plot 1.0

0.8

Helicopter investigation
0.6

0.4

0.2

0.0
0.0 0.2 0.4 0.6 0.8 1.0
EVIDENCE*SCIENCE*GCSE

threshold at 0.7). We can use this solution to illustrate fuzzy consistency. Remember that
fuzzy intersection involves using the minimum of all the contributing sets. The fuzzy values
of the configuration EVPOST_F*SCIENCE_F*GCSE_F are therefore given by using the
minimum on each of the three conditions for every case. We plot this against the values of
the outcome, the fuzzy investigation (Fig. 4).
The resulting plot looks rather different from the sort of scatterplot we would see in the case
of a high correlation coefficient. The diagonal line has simply been added for illustration
purposes, it is not a regression line. The meaning of a sufficient relationship in the fuzzy context
becomes clear: a high score on the condition is usually associated with a high score on the
outcome (as in a correlation), but a low score on the condition may be associated with any value
on the outcome (unlike in a correlation). Typically, the fuzzy condition values have to be less
than or equal to the fuzzy outcome values.14 All the values above and on the diagonal line
conform to this rule. By contrast, the points in the triangle below the diagonal line violate a
sufficiency relationship. This reflects the fact that we have high, but not perfect, consistency
with sufficiency (0.86). We must stress again that this does not mean that a low value on the
condition is necessarily associated with a low value on the outcome; any value on the
outcome is possible in that case. It does mean, however, that a high value on the condition is
usually associated with a high value on the outcome. The relationship is not symmetric, as it
would be in the case of a high correlation coefficient.

Shoe Group

We now turn to the group who conducted the shoe investigation. The basic procedure is the
same as for the helicopter investigation, since we are interested in the question of whether

14
There are several measures of fuzzy consistency implemented in the software fs/QCA. The one we are
using here, the “truth table algorithm” which is implemented in the current version of the fs/QCA software
(Ragin et al. 2006), does not simply take into account whether cases conform to the “less than or equal to”
rule, but it also takes near misses into account. See Ragin (2005).
Res Sci Educ (2009) 39:595–624 619

Table 11 Truth table shoe dataset


EVPOST_F SCIENCE_F GCSE_F number SHOE_F consistency
0 1 1 2 0.749084
1 1 1 27 1 0.715157
0 1 0 1 0.701698
0 0 1 3 0.666216
1 1 0 17 0 0.635111
1 0 0 7 0 0.594721
1 0 1 8 0 0.580656
0 0 0 8 0 0.489822

the underlying sufficiency and necessity relations are the same in these two investigations
which required varying amounts of substantive knowledge. We already know that the
absolute levels of performance differ in the two groups on all the indicators, but fs/QCA
can help determine whether the structure of the relationship with respect to sufficiency and
necessity is still comparable between the two. We shall not explain the steps of the analysis
in the same amount of detail since they are basically the same as for the helicopter
investigation. Again, we analyse the model SHOE_F = function (EVPOST_F, SCIENCE_F,
GCSE_F) and obtain a truth table (Table 11).
The structure is similar to that of the helicopter investigation. Again, we find that three
rows have very few cases, and they are the same rows as in the shoe investigation group, i.e.
the combinations evpost_f*SCIENCE_F*GCSE_F, evpost_f*SCIENCE_F*gcse_f and
evpost_f*science_f*GCSE_F. Ignoring the omitted rows, we again find the combination
EVPOST_F*SCIENCE_F*GCSE_F at the top of the truth table, with a considerable gap
between the consistency of this first row and the following one for this group, too. Overall,
the consistencies are considerably lower, however. This does imply lower levels of sufficiency
overall, and it also reflects the fact that the two groups have not done equally well on the
investigation, with the shoe group having achieved worse results.
To indicate which rows have been entered into the analysis, these have again been
shaded in grey, and 1s and 0s entered in the outcome column to indicate which rows are
entered into the minimisation procedure. Here, given the consistency values in Table 11, it
is both appropriate and fruitful to use a conventional threshold of 0.7. Also, the drop in
consistency in the row below this threshold is a large one.

** TRUTH TABLE SOLUTION omitting rows with n < 4 **

raw unique
coverage coverage consistency
---------- ---------- -----------
EVPOST_F*SCIENCE_F*GCSE_F 0.679322 0.679322 0.715157
solution coverage: 0.679322
solution consistency: 0.715157

Again, we find that the conjunction of the three conditions evidence test, science exam
and GCSE is (nearly) sufficient for obtaining the outcome, although the consistency with
sufficiency is lower than in the other group. On the other hand, coverage is higher, but still
not quite so high as to indicate (near) necessity for the solution we have obtained.
620 Res Sci Educ (2009) 39:595–624

To summarise: Comparing the two groups, we find that the structures of the truth tables are
similar and the rows with too few cases to be considered are the same. Once the effect of the
difference in the difficulty level of the two tasks is taken into account (by using two different
consistency thresholds), we obtain two identical Boolean expressions in the solutions, though
the consistency and coverage figures differ between the two groups. The lower consistency
figure for the shoe task reflects the relative difficulty of this task. We argue that this does not
necessarily affect our conclusion that the two groups are very similar (ignoring the rows omitted
from the truth table) in terms of the structure of the relationship of substantive science and
procedural knowledge and outcome on the investigation. The consistency and coverage figures
can be compared to model fit in regression analysis. Thus, the differences between the two
groups reflect a difference of degree rather than kind.

Conclusion

We have compared students’ performance on two different open-ended investigation tasks.


One of them, the helicopter investigation, did not require any substantive knowledge to
speak of (corresponding to type 1 in our typology of investigations). The other task, the
shoe investigation, did involve some substantive content, pertaining to friction and forces
(type 2). We have found that both substantive and procedural understanding, together with
prior attainment, are (near) sufficient conditions for success on an open-ended investigation,
as defined. However, we should note that, if we were to use the 0.7 threshold for both
analyses, this particular combination of conditions would not be the sole sufficient
configuration in the case of the helicopter group. However, if in comparing the two groups,
we just concentrate on the configuration with the highest consistency with sufficiency, i.e.,
the joint presence of all three conditions, we can note that we cannot separate these
conditions from one another, their joint presence is required for the highest level of
consistency in the two groups. This confirms our earlier findings (Glaesser et al.
forthcoming), and also puts them on firmer ground, since we have an additional sample
with a different investigation task here, and we have used a more sophisticated method,
involving fuzzy sets rather than just crisp ones. However, this finding contradicts our
expectation that substantive understanding should have no impact on the performance in the
helicopter investigation. We therefore have to clarify what the measure of substantive
understanding actually contains.
On the basis of the shoe investigation alone, one obvious interpretation of our findings
would have been that it is the content knowledge of the subject matter which is involved in
the performance on the investigation, even though we found that substantive knowledge
was not sufficient on its own. The substantive knowledge tested in the exam was not
required for the helicopter investigation though. Since the relationship of conditions and
outcome for the helicopter investigation—including the role of substantive understanding—
was basically the same as in the shoe investigation, we have to look for alternative
explanations. It could be that the science exam is not only a measure of substantive
knowledge, but also of the general inclination to tackle a scientific problem, the readiness
with which students engage with such a problem, a propensity to scientific thinking and
confidence in that area, the degree to which someone is scared of science or generally
interested and the like.
Another possibility of course is that we have misjudged the amount of substantive
knowledge required for the helicopter investigation and that it is actually the same type as
the shoe investigation. This is open to debate. However, we do not see how the substantive
Res Sci Educ (2009) 39:595–624 621

knowledge as such which was tested through the science exam could have any bearing on
the helicopter investigation.
And a third possibility is one we referred to earlier in this paper; a context effect. It may
simply be that the content of these two tasks interacts with the previous experience of the
students in ways which throw up this pattern in the data, that is, there may have been
something in their prior learning experience to change the way in which they dealt with
their respective tasks. We are aware that context effects (Solano-Flores et al. 1999) will play
a part in investigations, in that the students’ performance will depend on whether the
investigation they are asked to carry out is linked to the substantive subject matter they
have recently been taught or to a particular interest they may have. We are also aware that
students’ prior beliefs, and individual differences in the strength of those beliefs, about the
tasks may influence observations the students make, the claims students make and their
search for evidence (Chinn and Malhotra 2002; Kunda 1990).We cannot discount this with
any certainty, but we have no reason to think that it is a major effect.
Further research is needed in order to clarify the meaning of any given result in a
substantive science exam, since there is the possibility that it comprises attitude as well as
substantive knowledge. It will also be necessary to extend the range of tasks if we are to
exclude the problem of context-specificity.
Unlike in the earlier paper, we have used fuzzy sets here. This has several advantages:
for one thing, fuzzy sets are suited to a finer calibration of raw scores and they do not
require a firm cut-off which puts individuals either in the set with a specific outcome or
without it. Instead, it allows for fuzzy boundaries between two extremes. This is
particularly relevant for our study, since the two groups have performed so differently
that we would not have been able to compare them if we had used crisp sets.
For another thing, the scaling is based on substantively meaningful anchor points.
Therefore, it is possible to compare the structural relationship within two samples, again
even in the face of very different absolute values: The two groups have performed
differently, but the mechanism is likely to be the same, since we find the same solution and
similar truth tables. We can only speculate as to why they have done differently. It is
possible that because the students in the helicopter group were older overall, they felt more
confident in applying common sense to a task which was new to them. Another possibility
might be that even though the task was not harder as such, the shoe group nevertheless felt
that they lacked the necessary substantive knowledge and therefore they were overly
anxious in performing the task.
In addition to the substantive questions about students’ understanding of ideas about
evidence, this paper was intended to give an overview of the method fs/QCA. This method
is well suited to uncover factors which operate jointly to produce some outcome, and to
investigate necessary and sufficient conditions for some outcome, making use of set
relations. Using fuzzy calibration, it also enables researchers to give substantive meaning to
the measures they employ. Thus, we believe we have been able to show that this method
has much to offer researchers, in science education and in the social sciences more
generally, who are interested in such questions and who are using samples of a similar size
as ours (i.e., around 50–100 cases).

References

Abbott, A. (1988). Transcending general linear reality. Sociological Theory, 6, 169–186. doi:10.2307/
202114.
622 Res Sci Educ (2009) 39:595–624

Abd-El-Khalick, F., Boujaoude, S. R, Lederman, N. G., Mamlok-Naaman, R., Hofstein, A., Niaz, M.,
Treagust, D., & Tuan, H.-L. (2004). Inquiry in science education: international perspectives. Science
Education, 88(3), 397–419. doi:10.1002/sce.10118.
Adey, P. (1992). The CASE results: implications for science teaching. International Journal of Science
Education, 14(2), 137–146. doi:10.1080/0950069920140202.
American Association for the Advancement of Science. (1967). Science—a process approach. Washington
DC: Ginn & Co.
Baxter, G. P., Shavelson, R. J., Goldman, S. R., & Pine, J. (1992). Evaluation of procedure-based scoring for
hands-on assessment. Journal of Educational Measurement, 29(1), 1–17. doi:10.1111/j.1745-3984.1992.
tb00364.x.
Bryce, T. G. K., McCall, J., Macgregor, J., Robertson, I. J., & Weston, R. A. J. (1983). Techniques for the
assessment of practical skills in foundation science. London: Heinemann.
Buffler, A., Allie, S., & Lubben, F. (2001). The development of first year physics students’ ideas about
measurement in terms of point and set paradigms. International Journal of Science Education, 23(11),
1137–1156. doi:10.1080/09500690110039567.
Cheli, B., & Lemmi, A. (1995). A ‘totally fuzzy and relative’ approach to the measurement of poverty.
Economic Notes, 94(1), 115–134.
Chen, Z., & Klahr, D. (1999). All other things being equal: Acquisition and transfer of the control of
variables strategy. Child Development, 70(5), 1098–1120. doi:10.1111/1467-8624.00081.
Chinn, C. A., & Malhotra, B. A. (2002). Children’s responses to anomalous scientific data: How is
conceptual change impeded? Journal of Educational Psychology, 94, 327–343. doi:10.1037/0022-
0663.94.2.327.
Cooper, B. (2005). Applying Ragin’s Crisp and Fuzzy Set QCA to large datasets: social class and educational
achievement in the National Child Development Study. Sociological Research Online, 10(2). URL:
http://www.socresonline.org.uk/10/12/cooper.html.
Cooper, B. (2006). Using Ragin’s Qualitative Comparative Analysis with longitudinal datasets to explore the
degree of meritocracy characterising educational achievement in Britain. Paper presented at the Annual
Meeting of the American Educational Research Association, San Francisco, 07—11/04/2006.
Cooper, B., & Glaesser, J. (2007). Exploring social class compositional effects on educational achievement
with fuzzy set methods: a British study. Paper presented at the Annual Meeting of the American
Educational Research Association, Chicago, 09—13/04/2007.
Curriculum Council, Western Australia (1998). Science learning area statement. Website: http://www.
curriculum.wa.edu.au/files/pdf/science.pdf.
Davis, B. C. (1989). GASP: Graded assessment in science project. London: Hutchinson.
Duschl, R.A., Schweingruber, H.A., & Shouse, A.W. (Eds.).(2006). Taking science to school: learning and
teaching science in grades K-8. Committee on Science Learning, Kindergarten Through Eighth Grade.
Board on Science Education, Center for Education, Division of Behavioral and Social Sciences and
Education. Washington, DC: The National Academies.
Erickson, G., Bartley, R. W., Blake, L., Carlisle, R., Meyer, K., & Stavey, R. (1992). British Columbia
assessment of Science 1991 technical report II: Student performance component. Victoria, B.C.:
Ministry of Education.
Germann, P. J., & Aram, R. (1996). Student performances on the science processes of recording data,
analyzing data, drawing conclusions, and providing evidence. Journal of Research in Science Teaching,
33(7), 773–798. doi:10.1002/(SICI)1098-2736(199609)33:7<773::AID-TEA5>3.0.CO;2-K.
Germann, P. J., Aram, R., & Burke, G. (1996). Identifying patterns and relationships among the responses of
seventh-grade students to the science process skill of designing experiments. Journal of Research in
Science Teaching, 33(1), 79–99. doi:10.1002/(SICI)1098-2736(199601)33:1<79::AID-TEA5>3.0.CO;2-M.
Glaesser, J., Gott, R., Roberts, R., & Cooper, B. (forthcoming). Underlying success in open-ended
investigations in science: using qualitative comparative analysis to identify necessary and sufficient
conditions. Research in Science & Technological Education.
Gott, R., & Duggan, S. (2007). A framework for practical work in science and scientific literacy through
argumentation. Research in Science & Technological Education, 25(3), 271–291. doi:10.1080/
02635140701535000.
Gott, R., & Murphy, P. (1987). Assessing investigations at ages 13 and 15. Science report for teachers: 9.
London: DES.
Gott, R., & Roberts, R. (2004). A written test for procedural understanding: a way forward for assessment in
the UK science curriculum? Research in Science & Technological Education, 22(1), 5–21. doi:10.1080/
0263514042000187511.
Gott, R., & Roberts, R. (2008). Concepts of evidence and their role in open-ended practical investigations
and scientific literacy; background to published papers. The School of Education, Durham University,
Res Sci Educ (2009) 39:595–624 623

UK. URL: http://www.dur.ac.uk/education/research/current_research/maths/msm/understanding_


scientific_evidence.
Gotwals, A., & Songer, N. (2006). Measuring students’ scientific content and inquiry reasoning. In Barab,
S.A., K.E. Hay, N.B. Songer, & D.T. Hickey (Eds.), Making a difference: The proceedings of the
Seventh International Conference of the Learning Sciences (Icls). (International Society of the
Learning Science).
Haigh, M. (1998). Investigative practical work in year 12 biology programmes. Unpublished Doctor of
Philosophy Thesis. University of Waikato, Hamilton, NZ.
Hart, C., Mulhall, P., Berry, A., Loughran, J., & Gunstone, R. (2000). What is the purpose of this
experiment? Or can students learn something from doing experiments? Journal of Research in Science
Teaching, 37(7), 655–675. doi:10.1002/1098-2736(200009)37:7<655::AID-TEA3>3.0.CO;2-E.
Hodson, D. (1991). Practical work in science: time for a reappraisal. Studies in Science Education, 19, 175–
184. doi:10.1080/03057269108559998.
House of Commons, Science and Technology Committee (2002). Science education from 14 to 19 (Third
report of session 2001-2 ed.). London: The Stationery Office.
Inhelder, B., & Piaget, J. (1958). The growth of logical thinking. London: Routledge and Kegan Paul.
Jenkins, E. W. (1979). From Armstrong to Nuffield: studies in twentieth century science education in
England and Wales. London: John Murray.
Jones, M., & Gott, R. (1998). Cognitive acceleration through science education: alternative perspectives.
International Journal of Science Education, 20(7), 755–768. doi:10.1080/0950069980200701.
Kosko, B. (1994). Fuzzy Thinking: the new science of fuzzy logic. London: HarperCollins.
Klahr, D., & Nigam, M. (2004). The equivalence of learning paths in early science instruction: effects of
direct instruction and discovery learning. Psychological Science, 15(10), 661–667. doi:10.1111/j.0956-
7976.2004.00737.x.
Kuhn, D., & Dean, D. (2005). Is developing scientific thinking all about learning to control variables?
Psychological Science, 16(11), 866–870.
Kuhn, D., Amsel, E., & O’Loughlin, M. (1988). The development of scientific thinking skills. San Diego:
Academic Press.
Kunda, Z. (1990). The case for motivated reasoning. Psychological Bulletin, 108, 480–498. doi:10.1037/
0033-2909.108.3.480.
Lieberson, S. (1985). Making it count. The improvement of social research and theory. Berkeley, Los
Angeles, London: University of California Press.
Lubben, F., & Millar, R. (1996). Children’s ideas about the reliability of experimental data. International
Journal of Science Education, 18(8), 955–968. doi:10.1080/0950069960180807.
Martin, J. R. (1993). Literacy in science: learning to handle text as technology. In M. A. K. Halliday, & J. R.
Martin (Eds.), Writing science: literacy and discursive power (pp. 166–220). Pittsburgh, P.A.: University
of Pittsburgh Press.
Millar, R., Lubben, F., Gott, R., & Duggan, S. (1994). Investigating in the school science laboratory:
conceptual and procedural knowledge and their influence on performance. Research Papers in
Education, 9(1), 207–248. doi:10.1080/0267152940090205.
Osborne, J. F., Collins, S., Ratcliffe, M., Millar, R., & Duschl, R. (2003). What “Ideas about science” should
be taught in school science? A Delphi study of the expert community. Journal of Research in Science
Teaching, 40(7), 692–720. doi:10.1002/tea.10105.
Polanyi, M. (1966). The tacit dimension. Gloucester, MA: Peter Smith.
Qualifications and Curriculum Authority. (2004). Reforming science education for the 21st century; a
commentary on the new GCSE criteria for awarding bodies. London: QCA.
Qualifications and Curriculum Authority (QCA) (undated). GCSE Criteria for Science. Website: http://www.
qca.org.uk/libraryAssets/media/5685_gcse_sc_criteria.pdf.
Ragin, C. C. (1987). The comparative method. Moving beyond qualitative and quantitative strategies.
Berkeley, Los Angeles, London: University of California Press.
Ragin, C. C. (2000). Fuzzy-set social science. Chicago and London: University of Chicago Press.
Ragin, C. C. (2003). Recent advances in fuzzy-set methods and their application to policy questions. URL:
http://www.compasss.org/Ragin2003.pdf.
Ragin, C. C. (2005). From fuzzy sets to crisp truth tables. URL: <http://www.compasss.org/Raginfztt_April05.pdf>.
Ragin, C. C. (2006a). How to lure analytic social science out of the doldrums. Some lessons from
comparative research. International Sociology, 21(5), 633–646. doi:10.1177/0268580906067834.
Ragin, C. C. (2006b). Set Relations in social research: evaluating their consistency and coverage. Political
Analysis, 14, 291–310. doi:10.1093/pan/mpj019.
Ragin, C. C. (forthcoming). Fuzzy sets: calibration versus measurement. In J. Box-Steffensmeier, H. Brady &
D. Collier (Eds.), The Oxford handbook of political methodology Oxford: Oxford University Press.
624 Res Sci Educ (2009) 39:595–624

Ragin, C. C., Drass, K. A., & Davey, S. (2006). Fuzzy-set/qualitative comparative analysis 2.0. Tucson,
Arizona: Department of Sociology, University of Arizona Press Website: http://www.u.arizona.edu/%
7Ecragin/fsQCA/software.shtml).
Roberts, R., & Gott, R. (2000). Procedural understanding in biology: how is it characterised in texts? The
School Science Review, 82, 83–91.
Roberts, R., & Gott, R. (2003). Assessment of biology investigations. Journal of Biological Education, 37
(3), 114–121.
Roberts, R., & Gott, R. (2004). Assessment of Sci1: alternatives to coursework? The School Science Review,
85(131), 103–108.
Roberts, R., & Gott, R. (2006). Assessment of performance in practical science and pupil attributes.
Assessment in Education, 13(1), 45–67. doi:10.1080/09695940600563652.
Ryder, J., & Leach, J. (1999). University science students’ experiences of investigative project work and their
images of science. International Journal of Science Education, 21(9), 945–956. doi:10.1080/
095006999290246.
Sandoval, W. A. (2005). Understanding students’ practical epistemologies and their influence on learning
through inquiry. Science Education, 89, 634–656. doi:10.1002/sce.20065.
Schauble, L. (1996). The development of scientific reasoning in knowledge-rich contexts. Developmental
Psychology, 32(1), 102–119. doi:10.1037/0012-1649.32.1.102.
Screen, P. A. (1986). The Warwick Process Science project. The School Science Review, 72(260), 17–24.
Shayer, M., & Adey, P. S. (1992). Accelerating the development of formal thinking in middle and high
school students, II: post project effects on science achievement. Journal of Research in Science
Teaching, 29, 81–92. doi:10.1002/tea.3660290108.
Smithson, M., & Verkuilen, J. (2006). Fuzzy set theory. applications in the social sciences. Thousand Oaks:
Sage.
Solano-Flores, G., & Shavelson, R. J. (1997). Development of performance assessments in science:
conceptual, practical and logistical issues. Educational Measurement: Issues and Practice, 16(3), 16–25.
doi:10.1111/j.1745-3992.1997.tb00596.x.
Solano-Flores, G., Jovanovic, J., Shavelson, R. J., & Bachman, M. (1999). On the development and
evaluation of a shell for generating science performance assessments. International Journal of Science
Education, 21(3), 293–315. doi:10.1080/095006999290714.
Star, J. R. (2000). On the relationship between knowing and doing in procedural learning. In B. Fishman, &
S. O’Connor-Divelbiss (Eds.), Fourth international conference of the learning sciences (pp. 80–86).
Mahwah, NJ: Erlbaum.
Toh, K. A., & Woolnough, B. E. (1994). Science process skills: are they generalisable? Research in Science
& Technological Education, 12(1), 31–42. doi:10.1080/0263514940120105.
Toth, E. E., Klahr, D., & Chen, Z. (2000). Bridging research and practice: a cognitively based classroom
intervention for teaching experimentation skills to elementary school children. Cognition and
Instruction, 18(4), 423–459. doi:10.1207/S1532690XCI1804_1.
Tytler, R., Duggan, S., & Gott, R. (2001a). Dimensions of evidence, the public understanding of science and
science education. International Journal of Science Education, 23(8), 815–832. doi:10.1080/
09500690010016058.
Tytler, R., Duggan, S., & Gott, R. (2001b). Public participation in an environmental dispute: implications for
science education. Public Understanding of Science, 10(4), 343–364. doi:10.1088/0963-6625/10/4/301.
Verkuilen, J. (2005). Assigning membership in a fuzzy set analysis. Sociological Methods & Research, 33(3),
462–496. doi:10.1177/0049124105274498.
Welford, G., Harlen, W., & Schofield, B. (1985). Practical Testing at Ages 11, 13 and 15. London: DES.

Anda mungkin juga menyukai