Keywords: Multiple-choice examinations offer the ability to grade quickly as well as being able to assess
Academic dishonesty concepts and understanding in a wide range of subjects. Consequently, many large classes use
Cheat-resistant assessment multiple-choice examinations. One problem, however, is that multiple-choice examinations are
Learning environment more prone to cheating than constructed-response style examinations. Multiple-choice ex-
Student experience
aminations offer limited answer options, and these limited options can lead to sharing answers
Student assessment
through collusion or gleaning answers from unwitting peers. To counter such cheating, this paper
investigates a personalization approach to examinations whereby every student gets their own
version of the examination that is different to the rest of their peers. Such personalization ap-
proach not only counters cheating, but also encourages students to focus on concepts rather than
just answers. A software framework that facilitates generating personalized examination papers
is developed, and the paper reports on the experience of using the approach in large classes. It
discusses the administrative, technical, and pedagogical challenges posed by personalization and
how these challenges might be overcome using the framework as well as accompanying pro-
cesses. Surveys indicate that both students and staff are positive about using such a system.
1. Introduction
Multiple-choice examinations are widely used in large classes to make scoring manageable, and to enable measurement of
knowledge and competencies at the same time (Steven & Downing, 2006). To be an effective assessment tool, the questions of a
multiple-choice examination need to be well-designed, taking into account the respective levels of the Bloom's taxonomy (Steven &
Downing, 2006; Young & Shawl, 2013).
One of the major drawbacks of multiple-choice examinations, even if well-designed, is that they are prone to cheating. While
examiners place measures to make direct copying difficult, for example, by having multiple versions of the examination script where
the answers and/or questions are shuffled, strategies that circumvent these measures have been observed. These strategies include
some form of coded communication between a group of participating students. For example, Fig. 1 illustrates an observed strategy of
writing a code that uniquely identifies an answer in big letters so that students sitting close enough would collude by observing what
each other has written down. The answer option (A–E) may differ between students, but the absolute answer remains the same across
them. Consequently, the strategy works well even with multiple versions of the examination script.
Such collusion has been repeatedly observed in our university, and the research work reported herein is a result of trying to
mitigate such collusion. We use personalization to render any blind copying in the examination ineffective. While personalization has
been used in other contexts such as targeted instruction and adaptive learning (Weldet al, 2012; ZareC Stephanidis, 2011) and even
plagiarism mitigation (Manoharan, 2017), personalization in the context of examinations, in particular, multiple-choice examina-
tions, poses its own challenges, and it is those challenges that make this research interesting. These challenges fall under three
https://doi.org/10.1016/j.compedu.2018.11.007
Received 3 March 2018; Received in revised form 17 November 2018; Accepted 19 November 2018
Available online 22 November 2018
0360-1315/ © 2018 Elsevier Ltd. All rights reserved.
S. Manoharan Computers & Education 130 (2019) 139–151
Personalized assignments have been widely studied and used in a number of other works (Kashyet al, 1993; Kumar, 2013;
Manoharan, 2017; Smaill, 2005). Therefore, both of these research questions have already been partially answered. The main
contribution of this paper is to apply similar concepts in the context of multiple-choice examinations and to discuss the challenges
specific to these examinations.
The rest of this paper is organized as follows. Section 2 discusses the broad issues of cheating in examinations and personalized
assessment, both in the light of related work. Section 3 discusses the challenges in developing personalized examination scripts.
Section 4 discusses the architecture of a software framework that is able to generate and score personalized examinations. Section 5
evaluates the framework in the light of its ability to reduce cheating. The evaluation includes results from student surveys as well as a
staff survey. Section 6 shares the experience of using the system, and its advantages and disadvantages. The final section summarizes
and concludes the paper.
Zobel reported on a long-standing and well-organized cheating racquet in 2004 (Zobel, 2004). The so-called ‘my tutor’ was used
not only to contract-cheat in assignments but also to ghost-write in examinations. The case highlighted the prevalence of cheating as
well as the effort required to investigate cheating incidents and to bring those students and contractors to justice.
Nearly fifteen years on, and with almost all academic institutions having in place compulsory tuition on academic honesty, has
cheating reduced? This is not an easy question to answer, but our observation is that academic institutions still come across more
cases of cheating than they would like to. It is estimated that about 70% of students admit to some cheating (Broeckelman-Post,
2008).
Contract cheating has become a lot more prevalent (Clarke & Lancaster, 2006; Walker & Townley, 2012). While Zobel's ‘my tutor’
case advertised on local notice boards, modern contractors advertise online, including social media. There are a large number of
online sites offering a variety of contract services ranging from simple assignment solutions to writing complete doctoral theses.
These sites even guarantee that their work is “plagiarism free”. Prevalence of contract cheating has prompted national education
bodies to develop policies that aim to mitigate such cheating (The Quality Assurance Agency for Higher Education, 2017). One such
policy stipulates that students visiting “cheating” sites from within academic institutions be redirected to a site that promotes aca-
demic honesty (The Quality Assurance Agency for Higher Education, 2017).
Advances in technology has made cheating detection easier, but also the same technology has made cheating easier. Students use
not only the time-tried cheat-sheets and invisible inks, but also miniature cameras and communication devices. To counter the
potential use of these devices, some of the examination centres in China jam wireless signals blocking any potential over-the-air
cheating.
While contract ghost-writing and the use of cheating devices in examinations seem glamorous, the most employed forms of
cheating that we have observed are the following three:
140
S. Manoharan Computers & Education 130 (2019) 139–151
Through an extensive student survey, Shon classified the common cheating strategies students employ (Shon, 2006). Their
classification includes collusion, gleaning answers from unwitting peers, use of technology, and taking advantage of the behavioural
and/or psychological profiles of invigilators.
Cheating in multiple choice examinations becomes somewhat easier than in an examination where questions use free-format
answers, or constructed-responses. This is because the information that needs to be gleaned from other students, willing or unwitting,
is much smaller in a multiple-choice examination. In the most trivial case, it would simply be an answer option (such as one of A–E).
Consequently, most universities shuffle answer options and/or questions to make copying difficult. However, as illustrated in
Fig. 1, shuffling answer options might not deter students determined to cheat. Randomized seating would mitigate premeditated
collusion, but in many universities exam rooms have no seat numbers to facilitate randomized seating. While shuffling the answer
options can be easily automated, shuffling questions need to be a manual process in general – this is because the ordering of questions
would typically follow a topic order. Therefore, many examinations opt to shuffle only the answer options.
In most seating arrangements, students are able to observe the answers chosen by surrounding peers. This is due to the lack of
space. In addition, where examinations are conducted in lecture halls that are typically sloped towards the podium, it is easy to note
the answers of students sitting forward.
Where there are multiple versions of the examination scripts, the scripts are distributed in such a way that no student sits next to
another student with the same version. To help this distribution, it is common practice to colour code the scripts. However, the colour
coding also enables a dishonest student to seek answers from a student with a matching script.
Would it be possible to quantitatively assess if there was cheating in a multiple-choice examination? Marx and Longer had argued
that it wouldn't be (Marx and Longer, 1986), but there is more recent work that deals with statistical analysis of student answers to
detect potential collusions (Wesolowsky, 2000; Ercole, Whittlestone, Melvin, & Rashbass, 2002; Richmond & Roehner, 2015; D'Souza
& Siegfeldt, 2017). Legal aspects of how any such detected collusion could be treated is unclear unless there is other physical evidence
to substantiate the statistical measures.
Dealing with suspected cases of cheating is a time-consuming exercise. Physical evidence and witnesses need to be obtained and
retained through the formal process of trying the students involved. In addition, statistical evidence may also be generated to
supplement the physical one.
The approach this paper takes is to remove the opportunity to cheat through collusion and discreet observation. To this end, the
examination scripts are personalized so that no two students get the same questions. The concept of personalization in assessment is
not new. It has been used in the context of targeted instruction and adaptive learning (Weldet al, 2012; Zare, 2011) as well as in the
context of mitigating plagiarism (Manoharan, 2017).
Typically, personalized assessments use three approaches.
Problets, which generates short computer programming questions, takes the parameterization approach to generate fragments of
computer programs which the students are asked to analyze (Kumar, 2013). Abaligeti and Kehl take a similar parameterization
approach to individualize examination scripts (Abaligeti & Kehl, 2018). OASIS, a personalized engineering quiz, takes the databank
approach to select quizzes from a database, but it also substitutes parameters in the selected quizzes (Smaill, 2005). The personalized
assignments discussed by Manoharan use a macro approach and therefore achieve a high level of complexity and customization that
are not possible via parameterization alone (Manoharan, 2017). It is the macro approach this paper takes to personalize multiple-
choice examinations, for this approach gives the instructor a lot of freedom and flexibility to formulate questions. The downside,
however, is that the instructor needs basic programming skills to write the macros.
3. Challenges
While it appears that personalized multiple-choice examinations can overcome the main strategies of cheating, crafting perso-
nalized questions poses a number of challenges. These challenges can be broadly classified into administrative, technical, and
pedagogical challenges.
This section discusses these challenges and how they can possibly be addressed.
Multiple-choice questions are typically answered on an optical answer sheet, also known as “bubble sheet” or “Scantron sheet”
(named after the well-known company Scantron Corporation commercializing optical answer sheets).
The Scantron sheet our university uses has fields for student name, a numeric student ID, an alphanumeric course ID, and a
141
S. Manoharan Computers & Education 130 (2019) 139–151
numeric version ID (which distinguishes multiple versions of the same examination), and fields to bubble in the chosen answer
options.
Multiple-choice questions belong to a generic class of questions known commonly as selected-response questions where the
student chooses one or many of supplied answer options. This is in contrast to constructed-response questions where the students
write free-format answers (such as short answers or essays).
Selected-response questions have different types (Haladyna, Downing, & Rodriguez, 2002).
1. Multiple-choice single-response. Most of the examinations use this type where the student would choose just one of the available
answer options. In this case, only one of the options is the correct answer and the others are distractors.
2. Multiple-choice multiple-response. This applies when there are more than one possible correct answer options, and the student is
expected to select them all.
3. Complex multiple-choice or XYZ questions. These are multiple-choice single-response questions but with a secondary level. The
sample question in Fig. 1 belongs to this class. More than one or none of the X, Y, Z options could be correct.
4. Group. This is technically not a question, but puts together a number of related questions, often with a common context. A student
answering the group of questions will first need to understand the common context which may supply information that applies to
all the questions in the group.
5. True/false or dichotomy. In this case, there are two answer options available and one of them is correct. This is technically a
subset of the multiple-choice single-response class. Three related dichotomy questions can be combined into an XYZ question so as
to have more than two answer options. True/false questions can also be extended to have more than two answer options, but still
ensuring that only of the options is correct.
142
S. Manoharan Computers & Education 130 (2019) 139–151
1. Consider the parameterized question “What is the sum of $a and $b?” where $a and $b are parameterized integer values. This
question could lead to generating “What is the sum of 23 and 48?” and “What is the sum of 5238 and 2639?”. While they both test
the same learning outcome of being able to add, the time it may take to sum four-digit numbers will be more than the time it takes
to sum two-digit numbers. Therefore, the two generated questions aren't fair.
2. Consider the parameterized question “What is the result of $a $op $b?” where $a and $b are parameterized integer values and $op
is a parameterized arithmetic operator. This question could lead to generating “What is the result of 42 + 74?” and “What is the
result of 32 × 53?“. The generated questions test two different learning outcomes – one being addition, and the other being
multiplication – and have different levels of difficulty.
Constraints need to be placed on parameters and macros so that they do not lend themselves to generating questions that have
different levels of difficulty or different learning outcomes.
All questions need to be peer-reviewed not only for correctness but also for fairness.
3.3.2. Distractors
Auto-generation of plausible distractors is another major pedagogical challenge. Distractors are generally based on exploiting the
common misconceptions of a typical learner, or based on forming incorrect options that are similar to the correct option. The latter
143
S. Manoharan Computers & Education 130 (2019) 139–151
can lend itself to automation through extracting features of the correct option and using a subset of the features to form incorrect
options (Lai et al., 2016).
4. Software framework
The framework1 requires the examination specification be written as an HTML template with macros. The macros are functions
that an instructor will have to define. The framework has a macro processor that takes as its input an HTML template and a library
consisting of the instructor-defined macros, and outputs HTML examination scripts where the macros have been substituted by the
result of executing the macros. See Fig. 3.
Use of HTML for the template allows the instructor to develop the bulk of the examination paper such as cover page, appendices,
and possibly parts of question stems and/or answer options in HTML and supplement them with the code in the macros. It also allows
a limited preview, without any macro substitution, of the template through any browser. HTML examination scripts will permit
digital delivery if applicable, or the scripts can be printed out for traditional paper-based examinations.
Fig. 4 illustrates a sample HTML template with a small set of sample questions. Macros are demarcated by a CSS (Meyer & Weyl,
2017) class cws_code_q. The macro processor will identify these macros, execute the macro code, and replace the macros with the
result of the execution. A macro beginning with $ has special significance within the macro processor. For example, $n is replaced
with the nth digit of the script ID (expressed in base 5 using the alphabets A–E). The first few questions in the script instruct the
students to choose the supplied answers which form the script ID. A macro that does not start with $ is instructor-defined. These
macros, when executed, return HTML fragments that replace the macro. The sample HTML template shows the use of three in-
structor-defined macros: GetElvishLanguages, GetThorinsCompany, and GetApplesAndOranges. The first two macros return answer
options to the respective questions while the third one returns both the question stem as well as answer options.
See Fig. 5 which illustrates a sample (partial) examination script generated by the macro processor.
The question on Elvish languages is a true/false question, where the question stem is defined in the template and the instructor-
defined macro returns answer options. In the macro, the instructor would supply a pool of true statements on the topic as well as a
pool of false statements. The framework has a built-in truth question type which would pick one true statement and four false
statements from the instructor-supplied pools. These statements are then shuffled to form the answer options. To pick a false
statement as the answer, the instructor would reverse the roles of true and false statements they supply.
The question on Thorin's company is an XYZ question where three true/false statements are involved. As in the truth question
type, the instructor would supply a pool of true statements as well as a pool of false statements on the topic. Recall that an XYZ
question can have eight possible answer options. Given that we only allow five answer options in the examination, some of the eight
possible answer options are combined. See Table 1 that illustrates two ways of combining them to make five answer options.
The framework has a built-in XYZ question type which first randomly chooses one of the possible option pools as illustrated in
Table 1. It then randomly chooses one of the answer options as the correct answer. Based on this choice, it would then pick an
appropriate mix of true and false statements from the instructor-supplied pools to assign to X, Y, and Z. This approach gives all answer
options equal probability.
Since the correct answer option is internally chosen in the XYZ question as well as the truth question, both question types register
the correct answer automatically. Recall that scoring requires that the correct answer be registered.
The logic for the apples and oranges question will be completely defined by the instructor who created the macro. Unlike the
previous two macros, this macro emits the entire question as well as the answer options. This illustrates the power of macros, and
how such power can lead to producing highly complex questions. The number of apples and oranges and Castar amounts will differ
from script to script, thus requiring every student to work out their own answer. The macro would look at common mistakes in the
1
The framework is available for download from www.dividni.com.
144
S. Manoharan Computers & Education 130 (2019) 139–151
solution and produce plausible distractors (e.g., off-by-one answers, answers that swap apples and oranges, etc.). To ensure the same
level of difficulty across the scripts, the number of apples and oranges should be kept within a reasonable range (e.g., 2–20). There is
only one correct answer to this question, and therefore its design can use the truth question type: the macro could simply add the
correct answer to the pool of true statements, and the distractors to the pool of false statements.
While the framework supports programmatically creating any complex question, it also allows instructors to create true/false and
XYZ questions without any programming. True/false and XYZ questions are generic and are applicable across many disciplines. Being
able to write them without requiring any programming knowledge, therefore, enables wider use of the framework.
To form true/false and XYZ questions, the instructor would simply supply the true and false statement pools in an XML format.
Fig. 6 shows the XML specification of the question on Thorin's company. The XML content is converted into the macro code by the
framework. The attribute type specifies whether the question is an XYZ question or a true/false question, while the attribute id
provides a question id that is to be used in the HTML template.
Note that the XML specification includes the question stem. The framework supports having the question stem either in the HTML
template or in the XML specification – the suggested option is the latter since it allows an instructor to review a question solely by
viewing the corresponding XML file (e.g., in a browser).
145
S. Manoharan Computers & Education 130 (2019) 139–151
Table 1
Two possible answer option pools for XYZ questions.
Type 1 Type 2
5. Evaluation
The research questions relate to the feasibility of constructing a generic framework to support personalized examinations and the
challenges posed by personalization. Loosely speaking, both of the questions are answered positively through the successful con-
struction of the framework and its continued use in examinations and in-class tests. The research methodology we followed to
evaluate and substantiate these claims are as follows.
We trialled the system in an assessed in-class test. We collected anonymous feedback from the students on their perception of how
resilient the test was to cheating, and if they thought the test was fair. We also had a question on their view of using personalized tests
in other courses, as well as a question on their overall liking of such tests. The questions used the standard 5-point Likert scale. In
146
S. Manoharan Computers & Education 130 (2019) 139–151
addition, the students were able to provide open-ended feedback. The survey was open to all students in the course.
We also evaluated the system with a small group of staff who compared personalized tests and standard 4-version tests in the light
of cheating. We also questioned staff for the amount of time they were prepared to spend developing personalized tests.
We compared the exam performance of students in two offerings of the same course – the first exam used a standard 4-version
multiple-choice questions, while the second used personalized questions.
The system was first trialed in a third year computer science class which had just over 400 students. This was a for-credit
supervised test conducted in-class.
Ten staff members reviewed the test script, and two of them reviewed the source macros. Reviewing the source macros is
important because they effectively generate the scripts. In particular, the true and false statement pools of the truth questions and
XYZ questions need peer review.
In addition, two teaching assistants checked a random sample of scripts to verify that the auto-generated answers were indeed
correct answers.
On completion of the test, an anonymous online survey was conducted of the whole class. The response rate was around 30%.2
The summary of the survey results is listed in Table 2. The results show that a large proportion of the students who responded to
the survey are in favour of personalized tests, and over 80% agree that personalization reduces the level of cheating. There was some
skepticism over fairness and this is reflected in the responses to our second question.
2
The response rate was low, but this is the norm in most of our undergraduate courses.
147
S. Manoharan Computers & Education 130 (2019) 139–151
Table 2
Student evaluation results of the personalized mid-semester test (2017, semester 2). SD: strongly disagree; D: disagree; N: neutral; A: agree; SA:
strongly agree.
Question SD D N A SA
We also evaluated the system with a group of staff (which also included staff from other departments such as Mathematics and
Medical Sciences).
The group attempted a short standard 4-version test under test conditions, but with the view of cheating. The group observed that
standard test seating arrangements allow one to see the answer options chosen by those around them if the options are marked in the
test script. Even if the options could not be read, the chosen options could be inferred by some characteristics of the chosen options
(e.g., length of an option or any other prominent feature). The group noted that it was also possible to collude using the strategy
illustrated earlier in Fig. 1, and such collusion was much easier than trying to glean answers from unwitting neighbours.
The group then attempted to cheat in a short, personalized test and found that it was not possible unless they were allowed to
discuss the concepts and answer options with others.
One of the major downsides of personalized examination is that an instructor will need to spend more time developing questions.
The extra time is attributed to (1) developing pools of true/false statements, and (2) writing macros. We conducted a survey among a
small group of staff to see how much extra time they might be willing to spend developing personalized examinations. All but one of
the staff members were prepared to spend more time (2–3 times more), and all of them strongly agreed that personalization is helpful
to reduce the level of cheating.
Considering the experience and feedback from the initial trial, the system was rolled-out for use in other tests and examinations.
Courses that currently use the system include digital security, computer networks, software development, and web applications. The
class sizes range from about 300 to just over 400 students. While the tests are run in-class supervised by instructors and teaching
assistants, the examinations are run centrally by the University. The administrative head of examinations had this to say on the
personalized examinations: “Only positive feedback. No student or supervisor issues. It seems to have run very smoothly, so happy to
continue with this.”
Free-format feedback from students, collected over a number of courses, was generally positive. Most of the comments related to
cheating, and commended the ability to combat cheating. Some of these comments are:
Personalized examinations not only help to mitigate cheating, but also encourages students to focus on concepts rather than just
answers. Many students commented on how the system contributed to positive learning:
1. “When debriefing with my mates after the exam, we focus on the content not the raw answer.”
2. “Had to make sure to understand the concepts not just memorize answers”
3. “It encouraged knowledge of the content rather than knowledge of the question”
4. “A few times people posted their version of the question on Piazza, which meant everyone else got the benefit of having an extra
version of the question to practice”
5. “Makes you understand concepts better rather than rote learning stuff”
6. “After the test people were posting their questions on Piazza asking for help/explanations etc., this had the positive side effect that
everyone reading got a new version of the question to practice, which is more fun than just going over one version of the question
repeatedly.”
There were also genuine concerns about the fairness of the system:
148
S. Manoharan Computers & Education 130 (2019) 139–151
Table 3
Grade distributions in two courses.
Grade Course 1 Course 2
A+ 20 16 30 15
A 21 22 19 12
A- 34 22 23 15
B+ 39 25 21 20
B 72 42 29 27
B- 65 43 29 26
C+ 57 39 43 36
C 22 29 33 41
C- 6 16 17 32
F 81 56 78 72
Total 417 310 322 296
1. “Some people may get hard versions which take up more time to do. However, if you knew the topic really well, it wouldn't affect
too much.”
2. “That there is an RNG factor – if I sit in one seat I could get better marks than if I sat in another seat because of the choices
available”
3. “Not a level playing field if some of the questions are randomly assigned to be harder than others.”
4. “Difficulty of questions can vary from version to version.”
While personalization has the potential to introduce unfairness, its reduction of cheating increases fairness. One student com-
mented “No one would be able to cheat, so in that sense [personalization] made it fair”.
Two of our courses used personalized examinations in 2018. Table 3 shows the grade distributions in the two courses across two
years: 2017 used standard 4-version multiple-choice examinations while 2018 used personalized multiple-choice examinations.
We performed Pearson's ˜ 2 tests in the two sets of ordinal data to determine whether there was a significant difference between
the two years in the two courses. The proportional distributions of grades are not exactly the same in the two years, but the ˜ 2 tests
show no evidence of statistically significant differences.
Pearson's ˜ 2 tests yield a p-value of 0.05659 for Course 1 and a p-value of 0.1307 for Course 2. Both p-values are above 0.05, so
not statistically significant, notwithstanding some weak evidence of an association between years and grade composition.
Personalization therefore is not likely to have caused any statistically significant difference.
Post-examination issues similar to what was already observed in the trial continue.
In spite of the extensive reviews, some of the tests and one examination had one or two questions with incorrectly marked
answers. The macros needed to be patched up post-examination to correct these issues. The framework allows convenient me-
chanisms to interrogate the answer options generated for each script and to take corrective measures in case of errors – after patching
up the macros, expected answer options are re-created.
The framework picked up five pairs of students using the same version number. While two of the pairs turned out to have a
student who incorrectly filled their version number, the other three pairs had strong traits of cheating – they had lot more answer
similarities than that was statistically expected. In each of these three pairs, one of the students used the other student's version
number and most of their answers. One of these pairs has been investigated by the University disciplinary committee and it was found
that one student had indeed copied from the other unwitting student – the former was unable to produce their examination script
with their unique version number, while the latter was able to. The student who copied has subsequently been disciplined. The other
two pairs are under investigation.
6. Discussion
The successful development of the framework for personalized examinations, and its use within several large classes positively
answered our first research question. These classes were in a number of different areas such as software development, computer
networks, and digital security. The framework is therefore seen to be generic enough to span multiple areas. In addition, the truth and
XYZ question types truly allow the framework to be used in many domains and by instructors less comfortable with programming.
To answer our second research question, we discussed the challenges posed by personalization and addressed these challenges.
Some of these challenges were addressed by the framework, while the others were addressed by the processes we followed (such as
149
S. Manoharan Computers & Education 130 (2019) 139–151
Table 4
Challenges and solutions – summary.
Challenge Addressed by …
Framework Process
Examination Delivery ✓
Student and Script Identification ✓
Scoring and Script Identification ✓
Supporting Multiple Types of
Selected-response Questions ✓
Duplicate Answer Options ✓
Accounting for Errors ✓ ✓
Fairness ✓ ✓
Distractors ✓ ✓
Coverage of Learning Outcomes ✓
Quality Assurance ✓
peer-review). Some challenges required the help of the framework as well as required us to follow processes. Table 4 summarizes the
challenges and how they were addressed.
Personalized examinations allow an instructor to re-use a number of questions; this is because personalization ensures that the
students will have different values to work with – students cannot simply memorize answers from a previous examination. In fact, in
two of our classes we ran repeat tests – tests that had the same questions as the first test but with different data sets. This allowed the
students to study the areas where they had gaps, and get better scores the second time. One of the students said of the repeat test: “I
was thankful for the repeat test as it allowed me to improve greatly and was able to understand the concepts better”.
In addition, personalized examinations enable the administration to have less space for conducting the examinations: there is no
need to have large gaps between students. One of the students wrote this in the free-format feedback: “You can pack way more
sardines into the can this way”. Note that the examinations are still supervised and students are not allowed to talk to each other or
share information.
Personalization has the potential to introduce unfairness in the assessment. It is therefore paramount that fairness is taken into
account when developing questions, and the questions are carefully reviewed for fairness before they are used.
Personalization helps to reduce cheating by collusion as well as copying from unwitting peers, but it does not help to reduce other
cheating means such as contract-cheating or the use of modern communication devices inside the examination room. These need to
be handled the traditional way. For example, effective identification checks could mitigate ghost-writing in examinations.
Used in conjunction with typical test conditions (such as ID checks, ban on electronic devices, restrictions on personal items, etc.),
personalized examinations can be an effective mechanism to reduce cheating incidents.
Personalized examinations not only help to mitigate cheating, but also encourage students to focus on concepts rather than
answers. Student feedback reported in section 5 attests to this.
Cheating in examinations is an ongoing issue, especially when the stakes are high. This paper proposed an approach to reduce the
level of cheating in multiple-choice examinations that many large classes use. The approach is based on personalization that creates
as many versions of the examination script as there are students. Consequently, blindly copying the answers from other students will
not help a student score better than just guessing. A software framework that supports personalized examination is developed and its
design and use are discussed. We use the approach in large classes with sizes ranging from about 300 to 400 students. Our experience
suggests that not only does personalization counters cheating, but it also encourages students to focus more on concepts than mere
answers.
References
Abaligeti, G., & Kehl, D. (2018). Personalized exams in probability and statistics. Proceedings of challenges and innovations in statistics education multiplier conference (pp.
4). ISBN: 978-963-306-575-4.
Broeckelman-Post, M. (2008). Faculty and student classroom influences on academic dishonesty. IEEE Transactions on Education, 51(2), 206–211. https://doi.org/10.
1109/TE.2007.910428.
Clarke, R., & Lancaster, T. (2006). Eliminating the successor to plagiarism? Identifying the usage of contract cheating sites. Proceedings of the 2nd international plagiarism
conference.
D'Souza, K. A., & Siegfeldt, D. V. (2017). A conceptual framework for detecting cheating in online and take-home exams. Decision Sciences Journal of Innovative
Education, 15(4), 370–391. https://doi.org/10.1111/dsji.12140.
Ercole, A., Whittlestone, K. D., Melvin, D. G., & Rashbass, J. (2002). Collusion detection in multiple choice examinations. Medical Education, 36(2), 166–172. https://
doi.org/10.1046/j.1365-2923.2002.01068.x.
Gierl, M. J., Bulut, O., Guo, Q., & Zhang, X. (2017). Developing, analyzing, and using distractors for multiple-choice tests in education: A comprehensive review. Review
of Educational Research, 87(6), 1082–1116. https://doi.org/10.3102/0034654317726529.
Haladyna, T. M., Downing, S. M., & Rodriguez, M. C. (2002). A review of multiple-choice item-writing guidelines for classroom assessment. Applied Measurement in
Education, 15(3), 309–333. https://doi.org/10.1207/S15324818AME1503.5.
150
S. Manoharan Computers & Education 130 (2019) 139–151
Information technology - Security techniques - Check character systems. (2003). Geneva, Switzerland: Standard, International Organization for Standardization.
Kashy, E., et al. (1993). CAPA – an integrated computer-assisted personalized assignment system. American Journal of Physics, 61(12), 1124–1130. https://doi.org/10.
1119/1.17307.
Kumar, A. N. (2013). A study of the influence of code-tracing problems on code-writing skills. Proceedings of the 18th ACM conference on innovation and technology in
computer science education, ITiCSE ’13 (pp. 183–188). ACM. https://doi.org/10.1145/2462476.2462507.
Lai, H., Gierl, M. J., Touchie, C., Pugh, D., Boulais, A.-P., & Champlain, A. D. (2016). Using automatic item generation to improve the quality of MCQ distractors.
Teaching and Learning in Medicine, 28(2), 166–173. https://doi.org/10.1080/10401334.2016.1146608.
Manoharan, S. (2017). Personalized assessment as a means to mitigate plagiarism. IEEE Transactions on Education, 60(2), 112–119. https://doi.org/10.1109/TE.2016.
2604210.
Marx, D. B., & Longer, D. E. (1986). Cheating on multiple choice exams is difficult to assess quantitatively. North American Colleges and Teachers of Agriculture Journal,
30(1), 23–26.
Meyer, E., & Weyl, E. (2017). CSS: The definitive guide (4th ed.). O'Reilly Media.
Richmond, P., & Roehner, B. M. (2015). The detection of cheating in multiple choice examinations. Physica A: Statistical Mechanics and Its Applications, 436(Supplement
C), 418–429. https://doi.org/10.1016/j.physa.2015.05.040.
Shon, P. C. H. (2006). How college students cheat on in-class examinations: Creativity, strain, and techniques of innovation, Plagiary: Cross-Disciplinary. Studies in
Plagiarism, Fabrication, and Falsification, 1(1), 130–148.
Smaill, C. (2005). The implementation and evaluation of OASIS: A web-based learning and assessment tool for large classes. IEEE Transactions on Education, 48(4),
658–663. https://doi.org/10.1109/TE.2005.852590.
Steven, T. M. H., & Downing, M. (Eds.). (2006). Handbook of test developmentRoutledgehttps://doi.org/10.4324/9780203874776.
The Quality Assurance Agency for Higher Education (2017). Contracting to cheat in higher education – how to address contract cheating, the use of third-party services and
essay mills. Gloucester, United Kingdom.
Walker, M., & Townley, C. (2012). Contract cheating: A new challenge for academic honesty? Journal of Academic Ethics, 10(1), 27–44. https://doi.org/10.1007/
s10805-012-9150-y.
Weld, D. S., et al. (2012). Personalized online education – a crowdsourcing challenge. Proceedings of the 26th AAAI conference on artificial intelligence.
Wesolowsky, G. O. (2000). Detecting excessive similarity in answers on multiple choice exams. Journal of Applied Statistics, 27(7), 909–921. https://doi.org/10.1080/
02664760050120588.
Young, A., & Shawl, S. J. (2013). Multiple choice testing for introductory astronomy: Design theory using Bloom's taxonomy. Astronomy Education Review, 12(1), 1–27.
Zare, S. (2011). Personalization in mobile learning for people with special needs. In C Stephanidis (Ed.). Universal access in human-computer interaction. Applications and
services, vol. 6768 of lecture notes in computer science (pp. 662–669). Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-21657-2_71.
Zobel, J. (2004). “Uni cheats racket”: A case study in plagiarism investigation. Proceedings of the 6th Australasian conference on computing education: Vol. 30, (pp. 357–
365).
151