Anda di halaman 1dari 8

Automated Scoring System for

Essays
ABSTRACT:
Automatic grading systems for essays conventionally check linguistic
structure and grammatical aspects along with the degree of relation with the topic
in case of descriptive essays. They primarily follow techniques of Singular Value
Decomposition (SVD) in Latent Semantic Indexing (LSI) aided by the NLP
methods for a consistent and uniform grading of essays. However, these methods
can be supplemented, for a higher accuracy rate to match human grading, by a
better choice of the surface and complex parameters in the Content Vector
analysis (CVA) based on established rubrics and paradigms. This project upgrades
the use of the automated grading system from descriptive essays to
argumentative and issue essays by examining the logical structure as well. It
focuses on constructing vectors and document matrix using character n-grams as
features instead of the normal method of using words.

INTRODUCTION:
With the advent of online examinations like GRE, GMAT and CET4 there had
been an increasing call for automation in the scoring process. Scoring of objective
questions has been considerably simple and it has been existent since years back.
But the evaluation of essays has been in practice only manually in most of the
cases. This is because of the high complexity involved in programming such a
system that will perform as good as a human in its cognition. With the evolution of
advanced text database practices and Natural Language Processing (NLP)
techniques, this has become possible off late.
Any e-rater should offer several salient features, most importantly, the
following:
1. Speed: Score generated in a matter of seconds as against the time-consuming
manual correction.
2. Ease/Less fatigue: Process made easy by automation as against the laborious
manual task.
3. Equitable: No place for any unjust favoring or unfair partiality or preference; all
scores are generated unbiased.
4. Uniformity: Overcomes the problem of different mindsets or attitudes of
different evaluators; ensures all essays are graded on a similar outlook.

LITERATURE SURVEY:
[1]AUTOMATED ESSAY SCORING USING KNN ALGORITHM
Lin Bin,Lu Jun,Yao Jian-Min,Zhu Qiao-Ming
International Conference on Computer Science and Software Engineering
IEEE-2008
EXPLANATION:
Transformation of Essays into Vectors:

Training set of essays is converted into vectors of word frequencies.


They are then transformed into word weights.
These weight vectors occupy the training space.
To score a test essay:
It is converted into a weight vector
A search is conducted to find the training vectors most similar to it.
This is measured by the cosine between the test and training vectors.
The closest matches among the training set are used to assign a score to
the test essay
(Burstein,2003)

Feature Selection KNN


After eliminating the stop words, the features of the essays viz. words, phrases and
arguments are chosen .The value of each vector is expressed by the term frequency and
inversed document frequency (TF-IDF) weight. Similarity of essays is calculated with cosine in
the KNN algorithm.
a. Term frequency TF
thresholds.
b. Information Gain IG

Used to select features by predetermined

K-Nearest Neighbor Algorithm for Text Categorization:


We determine the most similar k features as nearest neighbors to a
given feature, and assign individual scores according to the distance of the
neighbors calculated from suitable
methods like Euclidean and Cosine relation.
The final score is the weighted sum.
ISSUES:
The major issues arise because of the following limitations of using KNN algorithm.

KNN
1. Memory based : Large space requirement to store the entire dataset is
required.
2. Unreliable Neighborhood-Lack of Overlapping results: Since the
dataset usually gives a sparse matrix, there are no overlapping values. But
similarity measures require high overlapping for higher reliability.
3. Unsuitable for corporate dataset- Due to Sparseness: By the above
argument, since the corporate datasets are usually sparse, KNN is less
suitable for them.
[2]AUTOMATED ESSAY SCORING SYSTEM FOR CET4
Yali Li,Yonghong Yan
Second International Conference on Education technology and Computer
Science
IEEE-2010

EXPLANATION:
Score-Determining Components:
1.Surface Features:
The number of characters in the document(Chars)
The number of words in the document(words)
The number of different words (Diffwds)
The fourth root of the number of words in the document, as suggested by the
Page(Rootwds)
The number of sentences in the document(Sents)
Average word length(Wordlen=Chars/Words)
Average sentence length(Sentlen=Words/Sents)

Number of words longer than five characters(BW5)


2.Grammar checking:
ALEK(Assessment of Lexical Knowledge)- a tool for grammar checking .
Bigram and trigram of part-of-speech tag sequence are used.
3. Sentence Error Detection:
Parts-of-Speech tag analysis
4. Relation to the topic- 2 Approaches:
Simple Comparison of Keywords
Content Vector Analysis
5.Final score by linear regression:
The score in all is the linear weighted sum of several components.
ISSUES:
The major limitations occur due to linear Regression and are as follows:

Incomplete description of relationship among variables1. Extremes are ignored.


2. Only Mean is considered

Sensitive to outliers

Precision attained: 70.125%

[3]AUTOMATED ESSAY SCORING USING GENERALIZED LATENT


SEMANTIC ANALYSIS
Md. Monjurul Islam , A. S. M. Latiful Hoque
13th International Conference on Computer and Information
Technology
IEEE-2010
EXPLANATION:
Informational retrieval by Latent Semantic analysis using Singular Value
Decomposition.

Usage of N-gram by document matrix instead of word by document matrix

ISSUES:
The issues that occur in the performance due to SVD are:
Order of complexity of the Algorithm : O(n^2k^3)- very high
Requires Normal distribution of term : Words are required to be normally
distributed across the documents. But in corporate datasets there is sparse
distribution.
[4]AN EFFECTIVE AUTOMATED ESSAY SCORING SYSTEM
USING SUPPORT VECTOR REGRESSION
Yali Li, YonghongYan
Fifth International Conference on Intelligent Computation Technology
and Automation
IEEE-2012

PROPOSED SYSTEM:
Dataset Construction Using Character n-grams over words
Content Vector Analysis(CVA) over Latent Semantic Analysis(LSA)
Uses SVM
o Model Based
o Popular in text classification problems where very high-dimensional
spaces are the norm
Support Vector Regression
Evaluation of rhetorical arguments
Each argument=Mini-document
I. Methodology :

Vector construction for each document


Extraction of words
Morphological Analysis
Frequency vector construction
Weight-assignment based on salience( relative freq and inverse relative
freq)

II. Testing:
Cosine relation between test vector and document/class vector
Class with highest correlation is selected

MODULE SPLIT-UP and GANTT CHART:

REFERENCES:
Dikli, S. (2006). An Overview of Automated Scoring of Essays. Journal of
Technology, Learning, and Assessment, 5(1). Retrieved [date] from
http://www.jtla.org.
J. Burstein, K. Kukich, S. Wolff, C. Lu, M. Chodorow, L. Bradenharder, and M.
Dee Harris, Automated Scoring Using A Hybrid Feature Identification
Technique, in Proc. In the Proceedings of the Annual Meeting of the
Association of Computational Linguistics,1998
System and method for computer-based automatic essay scoring, Jill C.
Burstein et al
Automatic essay scoring system-Yvacheslav Andreyev et al
Building an automated English sentence evaluation system for students
learning English as a second language-Kong Joo Lee , Yong-Seok Choi , Jee Eun
Kim

CONCLUSION:
The grades assigned by using this software in place of a human being
will be as efficient as when a second rater is used in its place. Any deviation, that
rises, can be adjusted by suitable mathematical models.

Anda mungkin juga menyukai