Anda di halaman 1dari 7

A Study of Fallback Procedures in a Keystroke Biometric System

Michael Friedman, Birendra Gurung, Derwin Lugo, Murat Ocak, Mark Ritzmann, Lars
Weinrich
Ivan G Seidenberg School of CSIS, Pace University
1 Martine Ave, White Plains, NY, 10606, USA
{mf22990n, bg77633w, dl99837n, mo73153p, lw50479n}@pace.edu
marksritzmann@yahoo.com

Abstract from these long-text-input entries is then


extracted and a pattern classifier is then used to
The Keystroke Biometric System identify the author of the text. This program can
developed at Pace University over the be used by a researcher attempting to identify
2005/2006 academic year is able to identify keystroke patterns in long-text passages.
subjects based on long text samples. This system Potential uses include professors of online
used a grammar based extraction method where courses who need to validate the work submitted
incomplete or insufficient data would be by students [1]. A paper by Villani et al. [2]
substituted with more generalized grammatical explores the results when a subject trains on one
data. The resulting data when fed into a style of keyboard (e.g., desktop keyboard, laptop
classification program based on Euclidean keyboard) and/or one style of data entry (e.g.,
distances was able to consistently identify free text, copy task) and is tested against the
subjects with degrees of accuracy exceeding same and/or a different style of keyboard and/or
96%. data entry style.
In an effort to further improve the
results obtained from this “Linguistic” method 2. Keystroke Biometric System
of feature extraction, a new “Touch-Type
fallback method” was developed. This fallback The keystroke biometric system
method is based on the geography of a standard consists of five components: capturing
computer keyboard rather than grammar. demographic data; identification of keyboard
Further improvements to the overall system type and selection of data entry task; data entry
include a user interface for dramatic efficiency through the preexisting Java applet; keystroke
gains in running various types of tests, and a feature extraction; and classification.
“trace mechanism” which allows the user to
analyze and track feature extraction events. 2.1 Capturing Demographic Data

1. Introduction To begin capturing demographic data,


the user accesses a Web site hosted by a server
We are exploring Keystroke Biometric with the capability of serving HTML and PHP
Systems which measures typing characteristics files, and running a MySQL database [1]. To
believed to be unique to an individual and ensure all users have entered the demographic
difficult to duplicate. A Keystroke Biometric data, they are required to enter their first and last
Identification System (one-of-n response) has name (the researcher-approved composite
been developed at Pace University since the primary key of the demographics table) and
2004/2005 academic year. This project is a submit this information. The information is then
continuation of the Pace University CS616, entered in the MySQL database. If the first and
2005-2006 Keystroke Biometric System created last name combination is not found, the
by Gary Giang Ngo, Justin Simeone and demographic data is captured through a Web
Huguens St. Fort for Mary Villani, Dr. Tappert form which will authenticate that all
and Dr. Cha. demographic questions have been answered and
The system employs a Java applet to that the potential subject has agreed with the
collect raw keystroke data over the Internet. Data terms and conditions of this experiment. The
demographic data is then written to the
demographics table and the user is shown
gratitude for their participation.

Figure 3: Activity Selection Page [1].

Figure 1: Registration Activity.

To allow users to leave and return to the


Web site, four counters are also initialized to Figure 4: Once you click go, it takes you to the
track the entry number the user will begin with appropriate Java applet.
upon returning to the Web site. The four
experimental categories are copy task on a
desktop, copy task on a laptop, free-text entry on
a desktop and free-text entry on a laptop [1].

Figure 2: Four Experimental Categories

At the completion of registration or, Figure 5: Java applet before any keystrokes have
upon returning to the site, the user is redirected been entered [1]
to the activity selection PHP page. This page
receives the user’s first name and last name from After analyzing previous raw data files,
the referring page and queries the database to it was identified that typos or inconsistencies in a
obtain values of the counter fields [1]. (See participant’s name causes problems in the feature
Figure 3) extractor. By requiring the user to register once
Clicking go redirects the user to the and use the same first and last name to access the
appropriate Java applet based on his/her system, the problem is eliminated. The same
selections (See figure 5). There are six pieces of principle is true for the activity sequence
information sent to, and required by, the Java number; should the user enter a number already
applet: first name; last name; experiment style used, the user will overwrite his/her existing raw
(e.g., free text, copy task); sequence number for data file. This is corrected through the use of
the selected experiment style (respective counter counters in the database managed through PHP
field value); keyboard style; and awareness [1]. scripts [1].
Awareness refers to whether the user knows Depending on the sample being
he/she is working with a keystroke biometric collected, the system checks for a minimum
system. If the Java applet does not receive these number of keystrokes [1]. In the study by Villani
six values, or if the user does not have a Java et al. [2] the copy task entries must be at least
Runtime Environment (JRE) equal to or later 635 keystrokes and free text samples at least 677
version 1.4, the applet will not launch. Lastly, keystrokes, otherwise the user is prompted to
the user must use Microsoft’s Internet Explorer continue typing (See Figure 6).
in order for the applet to function properly [1].
a feature. Fallback is implemented by assigning
each node on the tree a numeric pair consisting
of that feature’s unique numeric identifier. This
allows the programmer to easily change the pairs
and thereby changes the structure of the tree [1].

2.3 Feature Classifier

After all features have been extracted into a


data or “features” file, the data is ready to be
classified in an attempt to identify an author. The
Figure 6: Warning if user clicks submit before identification is a measure of the sum of the
meeting the minimum number of keystrokes [1]. Euclidian distances of all the collected features.
This analysis is done in one of two methods.
When the user correctly completes the In the “train-on-one” or “leave-one-out” method,
task and clicks submit, a PHP file is called, one features file is used and classification occurs
which writes the raw data information to a text by pulling out each data entry and comparing it
file and (transparent to the user) updates the to all the other data in the features file.
user’s counter field by one in the database. The Classification is successful if the Euclidean
user sees the Java applet in a nearly identical distance is least with respect to another data
state as that pictured in Figure 5, except the entry by the same author.
sequence number has been incremented. The The second method of classification is to
user can enter another sample or click back train the classifier on one features file and then
button to return to the activity selection page [1]. attempt to match the data from a second features
For ease of locating the raw data files, file to those in the training file. Again, a
each experimental style/keyboard combination is successful match in this case is when the
given its own directory on the server. Before Euclidean distance is least between the data
progressing to the feature extraction process, the being tested and the data in the training file by
researcher must FTP the raw data files to a the same author.
directory on his/her local disk [1].
3. Methodology
2.2 Feature Extraction
We used the agile project development
The software developers used Borland’s methodology, particularly Extreme Programming
JBuilder as the IDE of choice. The feature (XP) which involves small releases and fast
extraction program reads all of the raw data text turnarounds in roughly two-week iterations. We
files from a directory on the researcher’s local held various meetings with the client where an
disk. One string of data is created from file and updated system was always delivered, critiqued
stored in a vector. The vector is read in and a new deliverable set for the following week.
ascending order from index zero to index N,
where N is the number of raw data files. A 3.1 Object-Oriented Approach
second vector is instantiated to track the
frequency of each feature detected from the raw The object-oriented approach to
data. At the lowest level these features are programming was used in both the feature
simply the keys pressed. The higher level extraction and pattern classification programs
features are dependant on the fallback method [1].
used in the analysis. These features come into
play when the frequency of the lower level
features is insignificant.
4. Logic of Touch-Type Duration and
The “Linguistic” fallback method Transition Model
developed by Villani et al. [2] contains duration
as well as transition features which have been In an effort to further improve results, a
implemented in feature extraction program. different fallback strategy was developed and
Fallback is used to minimize “bad” data caused tested. This method was based on Touch Typing
by a less-than-optimal number of occurrences of of keyboarding, first introduced by Frank Edgar
McGurrin in the late 1800’s. This method is still • The “frequent consonants” (t,n,s,r,h)
taught today and is more than likely, the method roll up to a node. These letter are
most readers of this article employ. It calls for distributed among 3 nodes in the TT
the use of the four fingers to press the keys while Model
both thumbs exclusively press the space bar. • The “next most frequent consonants”
The logic behind the Touch Type (l,d,c,p,f) roll up to a node. These
fallback duration model is that fingers and hands letters are distributed among 4 nodes in
will act in a similar manner, regardless of the the TT Model
particular assigned letter is being depressed. • The “least most frequent consonants”
Therefore, each key that is assigned to a specific (all others) roll up to a node. These
finger would form a natural cluster suitable for letters are distributed among 3 nodes in
substitution in the event of insufficient sample the TT Model
sizes.
The logic behind the transition In again examining the 4th level starting
dimension of the Touch Type Model is, again, with the Touch Type model (See figure 8):
that fingers and hands will perform in a
consistent, like manner and, therefore, the finger
and hand assignments associated with each letter
will lead to natural groupings.

4.1 Duration Dimension of each Model

Upon inspection, the Touch Type model


does significantly differ from the Linguistic
Model. For duration, both models are 4 levels,
but that’s where the similarities end. While
Linguistic does have 4 levels, there are 6 cases
where leafs appear on the 3rd level. Touch Type
only have 1 case that terminates on the 3rd level.
An examination of the 4th level of each Figure 8: Touch-Type Fallback: Duration
model provides the most illustrative support of
how the models differ. Is starting with the • There are 3 letters that roll up to “left
Linguistic Model (See figure 7): little” (a,q,z). These letter are
distributed among 2 nodes in the
Linguistic model
• There are 3 letters that roll up “left
ring” (s,w,x). These letter are
distributed among 3 modes in the
Linguistic model
• There are 3 letters that roll up to “left
middle” (d,c,e). These letters are
distributed among 2 nodes in the
Linguistic model
• There are 6 letters that roll up to “left
index” (f,g,r,t,v,b). These letters are
distributed among 4 nodes in the
Linguistic model
• There are 6 letter that roll up to “right
index” (h,j,y,u,n,m). These letters are
distributed among 4 nodes in the
Figure 7: Linguistic Fallback: Transition Linguistic model.
• There are 2 letters that roll up to “right
• The 5 vowel all roll up to a node. These middle” (k,i). These letters are
letter are distributed among 5 nodes in distributed among 2 nodes in the
the TT Model Linguistic model.
• There are 2 letters that roll up to “right • There is 1 letter pair that rolls up to
ring” (l,o). These letters are distributed “vowel/vowel” (ea). This pair also rolls
among 2 nodes in the Linguistic model. up to 1 node in the TT model.
• There is 1 letter that rolls up to “right
little” (p). It, obviously, rolls up to one In examining the 4th level of the models
node on the Linguistic model. starting with the Linguistic model (note –
“neighbor” keys are those which share and edge
4.2 Transition Dimension of each Model on the keyboard; “non-neighbors” are ones that
do not) (See figure 10):
In comparing the transition dimension
of each model, again, each model has 4 levels.
The fourth level (the leaf level) is identical for
each model in that the 15 frequently occurring
transitions were captured for this study.
Accordingly, these 15 leafs are found in both
models. However, the Linguistic model also
features 11 leafs on the 3rd level, while the Touch
Type Model only has 4 on that level. On the 2nd
level of the Touch Type model, there are 4 nodes
that are found on the 3rd level of the Linguistic
model.
In examining the 4th level of the models
starting with the Linguistic model (See figure 9):

Figure 10: Touch-Type Fallback: Transition

• There are 3 letter pairs in that roll up to


the “left/left neighbor” node ( er, es, re).
There pairs are distributed among 2
nodes in the Linguistic model.
• There are 3 letter pairs that roll up to the
“left/left non-neighbor” node (st, at, ea).
These pairs roll up to 2 nodes in the
Linguistic model.
• There are 2 letter pairs that roll up to the
“right/right non-neighbor” node (in,
on). These are also found rolling up to
Figure 9: Linguistic Fallback: Transition
1 node in the Linguistic model.
• There is one letter pair that rolls up to
• There are 3 letter pairs that roll up to the
the “left/right index-index” node (th).
“consonant/consonant” node (th, st,nd).
These pairs are distributed among 3 • There is one letter pair that rolls up to
nodes in the TT model. the “left/right index-other” node (ti).
• There are 8 letter pairs that roll up to the • There are 2 letter pairs that roll up to the
“vowel/consonant” node (an, in, er, es, “left/right other-other” node (an, en).
on, en, at, or). These pairs are These are also found rolling up to 1
distributed among 5 nodes in the TT node on the Linguistic model.
model • There are 2 letter pairs that roll up to the
• There are 3 letter pairs that roll up to the “right/left index-other” node (nd, he).
“consonant/vowel” node (he, re, ti). These pairs roll up to 2 nodes in the
These pairs are distributed among 3 Linguistic model.
nodes in the TT model.
• There is one letter pair that rolls up to Laptop Desktop 61.2% 68.3%
the “right/left other-index” node. Table 1: Copy Task Identification Success Rates

5. Trace Mechanism
Train Data Test Data Linguistic TouchType
While we are certain that Fallback Success Success
procedures do, indeed, improve overall Rate Rate
performance and result in higher match Desktop Laptop 98.3% 95.5%
percentages, we are somewhat in the dark (with Laptop Desktop 99.5% 98.4%
the current version of the application) as to why. Combined Combined 98.6% 97.8%
In the current version of the code, there is no Keyboard Keyboard
mechanism that reports when and how Fallback s s
occurred. In some respects, we take it’s Desktop Laptop 58.5% 61.8%
invocation on faith. Laptop Desktop 55.1% 57.4%
In order to produce a more granular Table 2: Free-Text Task Identification Success
explanation of results, a Trace Mechanism was Rates
developed. This functionality will allow for the
identification of insufficient data (ie – which
letters were not used with enough frequency to 7. Conclusion and Recommendations
form a complete sampling) and allow for the
identification of the path (percentages and
Upon completion of this project, there
weights) that was taken along the hierarchy of
will be two fallback models in place on the
the Touch-Type model.
system already which are the Linguistic Model
This information is extremely valuable
and the Touch-Type model. The improvements
is examining results, fine tuning the model by
that were made to the current system were
adjusting parameters and weight, and improving
implementation of the “Touch-Type Model”, the
results.
development of “User Interface” for the Feature
Extractor as well as the Feature Classifier, and a
6. Results “Trace Mechanism” to help the researcher in
detecting/identifying insufficient data.
Contrary to our expectations, a For future improvement, explorations of
comparison of the results obtained while running more of these types of fallback models will
the KeyStroke Biometric System was not clear greatly help in minimizing the error rate and
cut. Our hypothesis was that a fallback method achieving higher success rate with the results.
designed to reflect the geography of a keyboard One fallback model is the “Statistical Model”,
(the TouchType method) would achieve greater which has already been developed. It is based on
rates of accuracy than a fallback method based the statistical analysis of data. But the foremost
on grammar (Linguistic method). priority for future project team should be in
What follows are some preliminary implementing this model since our client
results from running the system in Train-On-One believes that this “Statistical Model” will be the
& Test-On-Another mode. 36 subjects were used most accurate and the results can be used to
in this test. Each had performed all tasks explain the performance of the other two models
collected by the Java Applet Data Collector (Linguistic and Touch-Type).
(‘Copy on Laptop’, ‘Copy on Desktop’, ‘Free on
Laptop’ & ‘Free on Desktop) 4 - 8 times.

Train Data Test Data Linguistic TouchType


Success Success
Rate Rate References:
Desktop Laptop 98.9% 97.3%
Laptop Desktop 98.9% 96.8% [1] G. Ngo, J. Simone and H. St. Fort, “Developing a
Java-Based Keystroke Biometric System for Long-
Combined Combined 98.9% 98.1%
Text Input,” New York, USA; May 2006
Keyboard Keyboard
s s [2] M. Villani, C. Tappert, G. Ngo, J. Simone, H. St.
Desktop Laptop 56.9% 61.7% Fort and S. Cha, “Keystroke Biometric Recognition
Under Ideal and Application-Oriented Conditions,”
Proc.- IBC, IBS, Montreal, Canada; July 2006

Anda mungkin juga menyukai