Time Table Scheduling in Data Mining

Chapter 1
INTRODUCTION
This chapter includes the introduction to the timetabling. It describes the basic concepts and
types related to the timetable. Then the objectives of this study and thesis outline are described.
1.1 Introduction to Data mining
Data Mining is a process to analyzing the data from large databases. As it is also clear from its
name Data Mining : searching for valuable information in a large database. Data mining is
also known as knowledge discovery.
Generally, data mining (sometimes called data or knowledge discovery) is the process of
analyzing data from different perspectives and summarizing it into useful information -
information that can be used to increase revenue, cuts costs, or both. Data mining software is one
of a number of analytical tools for analyzing data. It allows users to analyze data from many
different dimensions or angles, categorize it, and summarize the relationships identified.
Technically, data mining is the process of finding correlations or patterns among dozens of fields
in large relational databases. The overall goal of the data mining process is to extract information
from a data set and transform it into an understandable structure for further use.
Data mining is a process that uses a variety of data analysis tools to discover patterns and
relationships in data that may be used to make valid predictions. The first and simplest analytical
step in data mining is to describe the data summarize its statistical attributes (such as means
and standard deviations), visually review it using charts and graphs, and look for potentially
meaningful links among variables (such as values that often occur together). But data description
alone cannot provide an action plan. You must build a predictive model based on patterns
determined from known results, and then test that model on results outside the original sample. A
good model should never be confused with reality (we know a road map isnt a perfect
representation of the actual road), but it can be a useful guide to understanding our business. The
final step is to empirically verify the model. For example, from a database of customers who
have already responded to a particular offer, we have built a model predicting which prospects
are likeliest to respond to the same offer.
1.1.1 Importance of Data Mining
We can simply define data mining as a process that involves searching, collecting, filtering and
analyzing the data. It is important to understand that this is not the standard or accepted
definition. But the above definition caters for the whole process. Large amount of data can be
retrieved from various websites and databases. It can be retrieved in form of data relationships,
co-relations and patterns. With the advent of computers, internet and large databases it is
possible collect large amounts of data. The data collected may be analyzed steadily and help
identify relationships and find solutions to the existing problems. Governments, private
companies, large organizations and all businesses are after large volume of data collection for
the purposes of business and research development. The data collected can be stored for future
use. Storage of information is quite important whenever it is required. It is important to note that
it may take long time for finding and searching for information from websites, databases and
other internet sources.
1.1.2 How does Data mining work?
While large-scale information technology has been evolving separate transaction and
analytical systems, data mining provides the link between the two. Data mining software
analyzes relationships and patterns in stored transaction data based on open-ended user queries.
Several types of analytical software are available: statistical, machine learning, and neural
networks. Data mining consists of five major elements:
Extract, transform, and load transaction data onto the data warehouse system.
Store and manage the data in a multidimensional database system.
Provide data access to business analysts and information technology professionals.
Analyze the data by application software.
Present the data in a useful format, such as a graph or table.
1.1.3 Data Mining (KDD) Process
Understand the application domain
Identify data sources and select target data
Pre-process: cleaning, attribute selection
Data mining to extract patterns or models
Post-process: identifying interesting or useful patterns
Incorporate patterns in real world tasks

Figure 1.1 Data mining process

1.1.4 Data mining Techniques
a) Classification
Classification consists of examining the features of a newly presented object and assigning to it
a predefined class. The classification task is characterized by the well-defined classes, and a
training set consisting of pre-classified examples. The task is to build a model that can be
applied to unclassified data in order to classify it. Examples of classification tasks include:
Classification of credit applicants as low, medium or high risk
Classification of mushrooms as edible or poisonous
Determination of which home telephone lines are used for internet access
b) Clustering
Clustering is the task of segmenting a diverse group into a number of similar subgroups or
clusters. What distinguishes clustering from classification is that clustering does not rely on
predefined classes. In clustering, there are no predefined classes. The records are grouped
together on the basis of self similarity. Clustering is often done as a prelude to some other form
of data mining or modeling. For example, clustering might be the first step in a market
segmentation effort, instead of trying to come up with a one-size-fits-all rule for determining
what kind of promotion works best for each cluster.
c) Association Rules
An association rule is a rule which implies certain association relationships among a set of
objects (such as occur together or one implies the other) in a database. Given a set of
transactions, where each transaction is a set of literals (called items), an association rule is an
expression of the form XY, where X and Y are sets of items. The intuitive meaning of such a
rule is that transactions of the database which contain X tend to contain Y. An example of an
association rule is: 30% of farmers that grow wheat also grow pulses; 2% of all farmers grow
both of these items. Here 30% is called the confidence of the rule, and 2% the support of the
rule. The problem is to fund all association rule that satisfy user-specified minimum support and
minimum confidence constraints.
d) Regression
Regression is a data mining (machine learning) technique used to fit an equation to a dataset.
Regression is a data mining function that predicts a number. Age, weight, distance, temperature,
income, or sales could all be predicted using regression techniques. The simplest form of
regression, linear regression, uses the formula of a straight line (y = mx + b) and determines the
appropriate values for m and b to predict the value of y based upon a given value of x. The
regression functions are used to determine the relationship between the dependent variable
(target field) and one or more independent variables. The dependent variable is the one whose
values you want to predict, whereas the independent variables are the variables that you base
your prediction on.
1.1.5 Advantages of data mining
Provides new knowledge from existing data
o Public databases
o Government sources
o Company Databases
o Old data can be used to develop new knowledge
New knowledge can be used to improve services or products
Improvements lead to:
o Bigger profits
o More efficient service
1.1.6 Disadvantages of data mining
User privacy/security
Amount of data is overwhelming
Great cost at implementation stage
Possible misuse of information
Possible in accuracy of data
1.2 Scheduling and Timetabling
1.2.4 Scheduling
Scheduling is one of the important tasks encountered in real life situations. Various scheduling
problems are present, like personnel scheduling, production scheduling, education time table
scheduling etc. Education time table scheduling is a difficult task because of the many
constraints that are needed to be satisfied in order to get a feasible solution. Education time table
scheduling problem is known to be NP hard. NP hard stands for non polynomial hard and means
that; there is no known exact algorithm that can solve problems of time table scheduling in
polynomial time. Methodologies like genetic algorithms (GAs), Evolutionary Algorithms (EAs)
etc. have been used with mixed successes.
Scheduling theory is concerned with the optimal allocation of scarce resources to activities
over time. The practice of this field dates back to the first time two human contended for a
shared resource and developed a plan to share it without bloodshed. The theory of the design of
algorithm for scheduling is younger but still has a significant history. The earliest papers in the
field were published more than 40 years ago.
Scheduling problems arise in a variety of settings, as illustrated by the following examples:
Consider the central processing unit of a computer that must process a sequence of jobs
that arrive over time.
Consider a team of five astronauts preparing for the reentry of their space shuttle into
atmosphere.
Consider a factory that produce different sorts of gadgets. Each gadget must first be
processed by machine 1 then machine 2 then machine 3 where different gadgets require
different amount of processing time on different machines.
Consider an academic environment, which requires the scheduling of a given set of
courses and meetings between students and lecturers. Each course takes place in a
particular hall and each hall has its capacity. We must also make sure students or
lecturers are not fixed up in more than one appointment.
1.2.2 Scheduling of timetabling
The general area of scheduling has been the subject of intense research for a number of decades.
Scheduling and timetabling are typically viewed as two separate activities, with the term
scheduling used as a generic term to cover specific types of problems in this area. Consequently,
timetable constructions can be considered as a special case of generic scheduling activity.
In the most general terms, scheduling can be described as the constrained allocation of resources
to objects, being placed in space-time in such a way that the total cost of a set of the resources
used can be minimized. Examples of this problem set can be seen in transport scheduling and
delivery vehicle outing where the business driven objective is to minimize the total cost function.
Timetable construction is the allocation, subject to constraints, of given resources to objects
being placed in space-time in such a way as to satisfy or nearly satisfy a desirable set of possible
objectives. Class timetables and exam timetables are examples of these problems where all hard
constraints must be satisfied to generate a valid solution. [1]
Thus the term scheduling covers all aspects of the activity of allocating resources and, at the
same time, satisfying some predetermined objective. However, due to the enormity of the
problem, it becomes necessary to classify the scheduling problem into specialized activities such
as timetabling. Thus, in practical terms the timetabling problem can be described as scheduling a
sequence of lectures between teachers and students in a prefixed time period (typically week),
satisfying a set of varying constraints.

1.3 What is Timetabling?

Timetabling problems are a specific type of scheduling problem and are mainly concerned with
the assignment of events to timeslots subject to constraints with the resultant solution
constituting a timetable. Wren (1996) defined timetabling in the following way:
Timetabling is the allocation, subject to constraints,
Of given resources to objects being placed in space time,
in such a way as to satisfy as nearly as possible a set of desirable objectives.
Based on the definition given by Wren (1996), we need to know whether there are sufficient
resources available for the given event to take place at its specified time as well as which
resources are allocated. The goal is to optimize some objective function depending on the
application domain at hand. For example, in examination timetabling environments, the function
to optimise is usually the gap between two examinations that a student has to sit in i.e. try to
spread the examinations throughout the examinations periods of time. The basic terminology
used in timetabling problems is summarised in Table 1.1.
Table 1.1 Basic terminology used in timetabling
Terminology Definition
Event An activity to be scheduled. Examples include
examinations and courses.
Timeslot (period) An interval of time in which events can be scheduled.
Resource Resources required by events. Examples include rooms
and equipment (i.e. projectors).
Constraint A restriction to schedule the events. Examples include
room capacity and specific timeslot.
Individual A person who has to attend the events.
Conflict Two events are clashing with each other if they have at
least a common individual and are scheduled in the same
timeslot.

The constraints in timetabling can be divided into two categories: hard and soft. Hard constraints
cannot be violated. Soft constraints are not essential but their satisfaction is highly desirable in
order to produce a good quality timetable.
1.3.1 Hard Constraints: Hard constraints [9] [15] are the constraints that physically cannot
be violated; a timetable in presence of violation of such hard constraints can never be
acceptable. For example, a lecturer cannot be in two places at once. Following are the list of
hard constraints:
1. Classrooms must not be double booked.
2. Every class must be scheduled exactly once.
3. Lecturers must not be double booked.
4. A lecturer must not be booked when he/she is unavailable.
5. Some classes need to be held consecutively. For example the Labs.
6. Some classes require particular rooms like experiments must be held in particular
laboratories.
7. Classrooms must be large enough to hold the class scheduled in it.
1.3.2 Soft constraints: Some constraints [9] [15] are less straight forward to define. Usually,
these constraints must be fulfilled as well as possible. The timetable that violates these constraints
is still usable, but it is not convenient for either students or teachers. Following are the soft
constraints:
1. Teachers may prefer specific time slots.
2. Teachers may prefer specific rooms.
3. Certain kind of subjects should not be in contiguous time slots.
4. Some lecturers do not wish to have classes assigned consecutively in time.
5. There are preferred hours in which a lecturer's classes might be scheduled.
6. Most students and some lecturers do not wish to have empty periods in their timetables.
7. Classes should be distributed evenly over the week.
8. Classrooms should not be booked which are much larger than the size of the class.
9. More than one member of staff might need to be assigned to a particular class.
It is desirable that timetables should satisfy all hard and soft constraints. However, it is usually
difficult to meet all these constraints because hard constraint must not be violated in any case, but
some soft constraints can be sacrificed to find feasible timetables.
1.4 Classification of Educational Timetabling Problems
Schaerf (1999a) classified educational timetabling into three main classes i.e. school
timetabling, course timetabling and examination timetabling. They share the same basic
characteristics of the general timetabling problem but can still have significant differences
between them. Each one of them has its own constraints, requirements and rules. More details on
educational timetabling can be found in Burke et al. (2004e). In this section, a classification of
educational timetabling and its properties are discussed. We divided educational timetabling into
two categories i.e. school timetabling and university timetabling (which consists of examination
timetabling and course timetabling).
1.4.1 School Timetabling
The school timetabling problem is concerned with the weekly scheduling for all the lessons of a
school. The problem consists of a set of teachers, classes, subject/lessons and weekly periods.
These weekly periods are predefined. This problem tries to assign lessons to periods and, a
teacher to a particular class at a given time while satisfying a set of constraints in order to
produce a feasible timetable. Some examples of constraints in the school timetabling problem are
capacities, locations, teacher loads, rest time between two lessons and other personal preferences.
Examples of research on school timetabling can be found in Abramson (1991) who employed
simulated annealing, Carrasco and Pato (2001) who employed a multi-objective genetic
algorithm and Legierski (2003) who applied a constraint-based approach.

1.4.2 University Timetabling
The university timetabling problem can be grouped into two categories: (i) course (or lecture)
timetabling and (ii) examination timetabling. The course timetabling problem is the process of
assigning timeslots and rooms so that meetings between lecturers and students can take place.
The examination timetabling problem refers to the assignment of timeslots and rooms so that
students can take examinations. These two (examination and course) timetabling problems are
fairly similar in some superficial ways, but there are some distinct underlying differences
between them. In examination timetabling, several examinations can be assigned to one (large)
room at the same time. However, this is not possible for course timetabling where only one
course can be assigned to one room.
a) The Examination Timetabling Problem
The examination timetabling problem represents a major administrative activity for academic
institutions. It is often a difficult and demanding process and it affects a significant number of
people. Romero (1982) reports that there are three broad categories of people that are affected by
its outcome: administrators, academic staff and students. Many universities are seeing an
increasing number of student enrolments into a wider variety of courses and an increasing
number of combined degree courses. This is contributing to the growing challenge of developing
examination timetabling software to cater for the broad spectrum of constraints and demands that
are required by educational institutions across the world and therefore the quality of a timetable
should be evaluated from several points of view.
Carter and Laporte (1996) defined the examination timetabling problem as:

The assigning of examinations to a limited number of available
Time periods in such a way that there are no conflicts or clashes
The examination timetabling problem is very common in both schools and universities. It is
concerned with allocating a set of examinations, into a limited number of timeslots (periods),
subject to a set of constraints. Carter et al. (1994) quoted that the basic challenge of examination
timetabling is to schedule examinations over a limited number of timeslots so as to avoid
conflicts and to satisfy a number of side constraints. In this case, the conflict is referred to as a
hard constraint and side constraints are referred to as soft constraints.
The generally accepted hard constraints for the examination timetabling problem are (i) there
must be enough seating capacity and (ii) no student should be required to sit two examinations at
the same time. Solutions that satisfy all the hard constraints are called feasible. On the other
hand, there might be some requirements that are not essential. These are referred to as soft
constraints. Common soft constraints are (i) Students should not be scheduled to sit more than
one examination in a day. (ii) Each students examinations should be spread as evenly as
possible over the schedule.
In a real world situation, it is, of course, usually impossible to satisfy all the soft constraints, but
minimizing these violations will increase the quality of the solution by calculating the penalty
function to the extent to which a timetable has violated its soft constraints.
b) The Course Timetabling Problem
Carter and Laporte (1998) defined course timetabling as:

a multi-dimensional assignment problem in which students,
teachers (or faculty members) are assigned to courses, course
sections or classes; events (individual meetings between students
and teachers) are assigned to classrooms and times
In course timetabling (which is also sometimes known as class/teacher timetabling), a set of
courses is scheduled into a given number of rooms and timeslots within a week and, at the same
time, students and teachers are assigned to courses so that the meetings can take place. Some
combinatorial models which draw upon graph colouring for simple class-teacher timetabling
problems can be found in de Werra (1996b, 1997b). As in examination timetabling, course
timetabling also involves hard and soft constraints. Examples of hard constraints for the course
timetabling problem are:
1. A student and a teacher cannot be in two places at the same time.
2. Only one course is allowed to be assigned to a timeslot in each classroom.
3. The classroom capacity should be equal to or greater than the number of students attending the
course at a particular timeslot.
Some related soft constraints for course timetabling reported by Socha et al. (2002) are:
1. Teachers may prefer specific time slots.
2. Teachers may prefer specific rooms.
3. Certain kind of subjects should not be in contiguous time slots.
Some combinations of assignments lead to acceptable timetables, others do not. Such
restrictions follow from conditions imposed by rooms, students or teachers. As stated earlier, in
university course timetable, a set of course and associated events is assigned to a set of rooms and
time periods within a week and at the same time, students and teachers are assigned to the
courses so that the appropriate lessons can take place, subject to a variety of hard and soft
constraints.
1.5 Need of Study
Organizations like universities and schools uses timetable to schedule classes and lectures,
assigning times and places to them in such a way that makes best use of available resources.
Universities in particular increasingly have to deal with a large number of courses and flexible
degree structures. A timetable that is not well designed will be inconvenient and will be
expensive in terms of wasted time and money. Timetabling is a search for Good Solutions in a
space of possible timetables.
Traditionally, the educational staff solved the problem manually. Making timetable is a
slow, laborious task, performed by people working on the strength of their knowledge of
resources and constraints of a specific institution. Generating universitys timetable is a tedious
job with lots of constraints to be satisfied. Different requirements by different departments or
universities must be satisfied also. Thus, generating timetable is being considered as a complex
problem, but result is often not reasonable i.e. it does not meet all the requirements. These
uncertainties have motivated for the scientific study of the problem, and to develop a semi-
automated solution technique for it. These programs build a set of timetables but still do not
solve the whole problem.
The construction of automated course timetables for academic institutions is a very
difficult problem with a lot of constraints that have to be respected and a huge search space to be
explored, even if the size of the problem input is not significantly large, due to the exponential
number of the possible feasible timetables. On the other hand, the problem itself does not have a
widely approved definition, since different departments face different variations of it. This
problem has therefore proven to be a very complex. Timetables are considered feasible provided
the so-called hard constraints are respected. However, to obtain high-quality timetabling
solutions, soft constraints, which impose satisfaction of a set of desirable conditions for classes
and teachers, should be satisfied and also gives more accurate timetable schedule, high precision,
high recall and takes less execution time for scheduling the timetable by using the modified k-
mean clustering algorithm.
1.6 Research Objectives
The main objective of this thesis work is to fully utilize the resources of the university in the
automated timetable Generator. The goals of the thesis work are:
1. Analysis of the problem exists in timetabling.
2. To create a system that can utilize the resources in efficient and effective manner in
order to remove the redundancy, ambiguity so that the system should be cost effective
and user friendly.
3. To compare the existing system with new one.

In this work we check the accuracy, precision, recall of generated timetable and we also compare
the execution time for the timetable generation. In this work we have to use improved k-mean
clustering algorithm and analysis that execution time for improved k-mean clustering is less then
k-mean clustering.
1.7 Scope of Work
A time-table is very crucial for educational institutions and schools. We have found an
effective solution to the problem of adjusting time tables in a simple and economic way. Through
our interactive means we can ensure that the time-table can be generated really fast and
smoothly. The software for time-table is equipped with the most efficient features that can ensure
that the whole time table can be made easily to ensure that the most effective time schedules can
be fixed according to the school's needs and suggestions. With the help of some really effective
tools we can make sure that there are no unnecessary delays or confusion caused with effective
time-tables.
1.8 Structure of dissertation
The dissertation has been organized into six chapters. A brief description of the content of
these chapters is given in the following paragraphs:
Chapter 1 provides an overview and introduction to timetable scheduling. It also introduces
various timetabling problems and concentrates upon particular research issues concerned with
university timetabling problems. This chapter presents the need of study and the aims of the
research.
Chapter 2 introduces background of timetabling problems. It reviews and analyses the current
published research on the subject of university timetabling.
Chapter 3 introduces the various techniques for solving the timetabling problem.
Chapter 4 introduces the methodology used in this research work for solving the timetable
scheduling problem.
Chapter 5 shows the working of new system and snap-shots of results. It also compares the new
system with existing system. This chapter also describes the accuracy, precision, recall and
execution time of this new system.
Chapter 6 describes the conclusion and future work.

Chapter 2

SURVEY OF LITERATURE

This chapter discusses the various literatures that are reviewed during the whole research work.
Various research papers and journals have been studies during the period.
2.1 Background
Timetabling is known to be a non-polynomial complete problem i.e. there is no known efficient
way to locate a solution. Also, the most striking characteristic of NP-complete problems is that,
no best solution to them is known. Hence, in order to find a solution to a timetabling problem, a
heuristic approach is chosen. This heuristic approach, therein, leads to a set of good solutions
(but not necessarily the best solution). In a general educational timetabling problem, a set of
events (e.g. courses and exams, etc) are assigned into a certain number of timeslots (time
periods) subject to a set of constraints, which often makes the problem very difficult to solve in
real-world circumstances [2]. In fact, large-scale timetables such as university timetables may
need many hours of work spent by qualified people or team in order to produce high quality
timetables with optimal constraint satisfaction [7] and optimization of timetables objectives at
the same time. These constraints are of two types Hard and Soft constraints. Hard constraints
include those constraints that cannot be violated while a timetable is being computed. For
example, for a teacher to be scheduled for a timeslot, the teacher must be available for that time
slot. A solution is acceptable only when no hard constraint is violated. On the other hand soft
constraints are those that are desired to be addressed in the solution as much as possible. For
example, though importance is given to a teachers scheduling, focus is on setting a valid
timetable and this can lead to a teacher going free for a time slot. Thus, while addressing the
timetabling problem, hard constraints have to be adhered, at the same time effort is made to
satisfy as many soft constraints as possible. Due to complexity of the problem, most of the work
done concentrates on heuristic algorithms which try to find good approximate solutions [8].
Some of these include Genetic Algorithms (GA) [8], Tabu Search [10], Simulated Annealing
[11] and recently used Scatter Search methods. Heuristic optimization methods are explicitly
aimed at good feasible solutions that may not be optimal where complexity of problem or limited
time available does not allow exact solution. Generally, two questions arise (i) How fast the
solution is computed? and (ii) How close the solution is to the optimal one? Tradeoff is often
required between time and quality which is taken care of by running simpler algorithms more
than once, comparing results obtained with more complicated ones and effectiveness in
comparing different heuristics. The empirical evaluation of heuristic method is based on
analytical difficulty involved in the problems worst case result. In its simplest form the
scheduling task consists of mapping class, teacher and room combinations (which have already
been pre- allocated) onto time slots.
2.2 Literature Review
Many approaches and models have been proposed for dealing with the variety of timetable
problems. Problems range from the construction of semester or annual timetables in schools,
colleges and universities to exam timetabling at the end of the period. Early timetable activities
were carried out manually and a typical timetable once constructed remained static with only a
few changes necessary, in order to fine tune it every semester or year. However, the nature of
education has changed substantially over the years and thus the requirements of timetables have
become much more complicated than they used to be. Consequently the need for automated
timetable generation is increasing and thus the development of a timetable generation system
that generates valid solutions is essential. As a result, during the last 30 years, many papers
related to automate timetabling have been published in conferences, proceedings and journals. In
addition, several applications have been developed and implemented with various successes.

The early techniques used in solving timetabling problems were based on a simulation of the
human approach in resolving the problem. These included techniques based on successive
augmentation that were called direct heuristics. These techniques were based on the idea of
creating a partial timetable by scheduling the most constrained lecture first and then extending
this partial solution lecture by lecture until all lectures were scheduled .[3] Then exit step was
for researchers to apply general techniques like integer and linear programming, graph coloring
and network flow to solve the timetable problem. Hence the first two papers published on
timetable construction using these general techniques are generally attributed to Kuhn and
Haynes. Kuhns a paper adopts a mathematical approach to the fundamental timetable problem
in contrast to Haynes paper, which concentrates, on the more practical problem aspects of
scheduling events for a conference. Interest in timetable solution generators increased
dramatically in the 1960s mainly due to the more common availability of computers to perform
the number crunching required by the algorithms developed. [4]
The first non-heuristic approach was developed by Gotlieb in 1963 and discussed in the now
famous process of reducing the availability array and presented at the Munich IFIP congress.
This was arguably the first paper on this partitioning approach and was further enhanced by
Berghuis, where the concept of virtual classes or teachers to obtain the classical bipartite
problem was introduced. Typically these papers were based on a heuristic approach. Due to this
work many of the papers followed which discussed the problem but had very little new work in
them.
Around the late 60s some attempts at limiting the general problem by considering case
examples were beginning to be published. For instance, Lawrie in 1969 developed a model for
the school timetable problem by using an integer linear programming approach.
During the 1970s several authors adopted the usage of the heuristic approach in tackling the
timetable problem. For example Junginger in 1972 provided a reduction of the timetable
problem by applying it to a three dimensional transport problem. Schmidtand Strohlein in 1973
predicted the generation of timetables by computer would be heavily influenced by devices at
hand, with timetable programming moving from remote handling in huge computing centers to
micro computer centre's owned by schools and directly handled by teachers on their desktops.
The major general techniques that seemed to have been prevalent in the 1970s and1980s have
their roots in artificial intelligence and are based on algorithms supported by simulated
annealing, tabu search and genetic algorithm methods. Papers in the literature typically
described a substantial software implementation and this is supported by the presentation of
results of the application of the method in one or more cases. Furthermore, there were a number
of important surveys of timetabling literature that were published in the 1980s. [4]

DeWerra in 1985 listed the various problems dealing with timetabling in a formal way and
provided different formulations in an attempt to solve them. He also described the approaches
considered the most important at that time. Carter in 1986 analyzed a survey, which discussed
actual applications of timetables at several universities. He also provided details of a tutorial
guide for practitioners on electing and/or designing an algorithm for their own institutions. [5]

Junginger in 1986 described research work in Germany on the school timetable problem and the
underlying approaches that were based on direct heuristics. Corneetal in 1994 provided a survey
of Genetic Algorithm application to timetables, discussed future perspectives of such approaches
and compared results obtained with respect to other approaches. Although there were papers
published in the 1990s solving timetable problems using the above artificial intelligence based
techniques, there was a new approach emerging, also rooted in Artificial Intelligence that has
gained prominence called Constraint Satisfaction Programming (CSP). [6]

Abramson in 1991 used Simulated Annealing as an optimization technique. The possibility of
adding cost components was discussed in an attempt to include the more complex scheduling
constraints that arise in schools. Also described is how the weighting of cost components
allowed one component to be made more important than others. He implemented this in a
parallel computer system and proved that the speed of the algorithm improved along with
results. Cooper and Kingston in 1993 described a computer program that solved a problem
within a large and highly constrained high school without any simplifications. A timetable
specification language was provided that helped to avoid many constraints in a uniform way.
Schaerf in 1999 provided a survey of the different techniques used in timetable generation.
Constraint satisfaction techniques were stressed as an important addition to the tools that are
used in solving the timetabling problem. [10]

KennedyandEberhartin1995 developed Particle Swarm Optimization (PSO) algorithm for
optimization. Shu-Chuan Chu, Yi-Tin Chen in 2006 developed the school timetable using the
PSO. They observed that PSO has many successful applications in continuous optimization
problems. The main contribution of their work is to utilize PSO to solve the discrete problem of
timetable scheduling. [7]

Ahmed Hamdi Abu Absa and Dr. Sana Wafa Ai Sayegh [8] explained the details of the
implementation of the Genetic Algorithms (GA) which is used for university timetable generator.
This paper presents a program, written in java. In a simple university timetable problem it creates
efficient time table without constraint violation. The study tested the effects of mutation rate and
population size. This paper discussed Genetic Algorithm approach is very effective and useful on
the lecture time tabling problems.

Alberto Colorni et al. [9] analyzed the results of an automated timetabling problem solve by
genetic algorithm. They described that automated timetable problem is representative of the class
of multi-constrained, NP-hard, combinatorial optimization problems with real-world application.
This paper compares two versions of the genetic algorithm, with and without local search, both
to a handmade timetable and also compare with to two other approaches based on simulated
annealing and tabu search. The results show that genetic algorithm with local search and tabu
search with relaxation perform better than simulated annealing and handmade timetables. When
they tested algorithm the results were better, both the didactical requirements and teachers
preferences were better satisfied. The total cost of this newly built system was much less than
handmade timetable.
E.K. Burke et al. [10] discussed automatic timetable generation with the use of traditional
methods such as graph coloring and advanced methods such as the genetic algorithms. This
paper presents the examination timetabling. This paper discussed Genetic algorithm is very
useful general purpose optimization tools that may be applied to wide range of very difficult
problems
Khaled Mahar [2] proposed genetic algorithm has a simple representation that handles all the
university timetables at once and easily modified to creation of a accurate timetable which
satisfies constraints that must not be broken. This paper presents that algorithm is applied to
create timetables for the college of Arab Academy for Science and Technology in Egypt and the
results are very satisfactory and there is no hard constraint violation encountered. The program
tested with different population sizes, a crossover and mutation rates. This paper also provides an
overview of different techniques for automatic generation of university time tables like tabu
search, simulated annealing, genetic algorithms, graph coloring heuristics, constraint
programming, network flow models, and constraint programming .This papers proves that as
long as population size increases the cost changes faster and large size takes too much running
time and memory consumption.
Dipti Srinivasan et al. [11] stated an evolutionary algorithm based approach to solving a large
constrained university timetabling problem. Other techniques also used for obtaining feasible
timetables in a appropriate time that are Heuristics and context-based reasoning. The complete
course timetabling system presented in this paper has been accurate, tested and discussed using
data from a university. The results have shown that implementing the intelligent adaptive
mutation operator has led a more than 10 times of improvement in the performance of
evolutionary algorithm.
Prof. Swapna Borde et al. [12] presented a hybrid algorithm for university timetabling problem
which is combination of two techniques first is If Else Algorithm and second is Graph Coloring
Algorithm. First algorithm is based on simple if else statement which can be easily used in any
programming language and second algorithm is based on connected graphs. This hybrid
algorithm is removing the individual methods disadvantages and provide more efficient time
table generating algorithm.
Hana Rudova et al. [13] focused on the hard constraint with preference propagation for soft
constraints. They extended the constraint logic programming technique that used for partial
satisfy the soft constraints. They applied this method to solve the timetabling problem of Purdue
University. This model and search methods applied to the solution of the large lecture room
component are presented and analyzed the computational results. Their results were able to
satisfy the course requests of 98% of students.
Ashish jain, Dr. Suresh Jain and DR. P.K. Chande [14] showed the various genetic operators
such as selection, mutation and crossover. They select the best chromosomes on the basis of
fitness function from the groups of chromosomes and similarly with crossover we can exchange
the information of the timetable as per our requirements. This paper claims that evolutionary
based genetic algorithm approach as an effective solution and powerful method to solve course
timetabling problem.
Yao-Te Wang et al. [15] proposed a practical automatic timetabling scheduling system based on
students needs in which process is divided into two stages. In the first stage, students needs in
course selection are decided and an association among courses selected by students is extract
using the association mining technique; while in the second stage, the genetic algorithm is used
to arrange the course timetable. This study is based on students willingness in course selection,
analyzes the performance of student learning, teachers preferred schedules, determines the cost
function value of each class period, and then applies the genetic algorithm for class period
exchange and produce an optimal course timetable. The automatic course scheduling system
proposed in this study not only can efficiently replace the task of conventional manual
timetabling scheduling, but also produce course timetables that truly fulfill users needs and
increase students and teachers satisfaction. The automatic course scheduling system proposed
in this study is capable of improving the interaction among students, teachers and the school,
creating a good relationship among the three parties.
Kuldeep Kumar et al. [6] suggested in their paper, there are a very large number of feasible
solutions of university timetabling problem. Some method is required to permission the overall
quality of different solutions to be measured, in order to allow them to be compared, so that the
best one is selected. The use of Genetic Algorithm for university timetabling problems is
generally the appropriate technique that gives a number of alternative solutions that satisfy most
of the hard constraints are possible.
Nikita Desai [16] proposed a large number of tools that are used for solving timetabling problem
based on the resources provided by user. The tools consider those resources which are ignored.
They focus mainly on the specifications of classrooms, teachers, and subjects but are not able to
fit in resources related to the human factors like fondness, hostile and weakness of the teachers
and students. She presented a survey done to find the preferences of teachers and later concludes
with the rules extract using classification method. She analyzed rules can be proper utilized
resources for an accurate timetable generation.
Carpente [17] presented an application that solves the complex school timetabling process, from
the resources that are available and adjustment the resources with the fully utilization in
automatically generated solution. Their application interacts with the Academic Administration
Official Systems (AAOS) and makes simple the hard phase of introducing the data and complete
solutions are efficiently provided by different heuristic techniques. The application can be easily
updated by designed user interface.
Ho Sheau Fen, Irene et al. [18] described the PSO technique for solving the university
timetabling problem. They apply the constraint based reasoning to the PSO. The proposed
algorithm is tested using real data from Teknologi University, Malaysia. The result is compared
against Standard PSO and hybrid PSO-Local search and the results of proposed algorithm is
better than others but computational time used to generate a solution by proposed algorithm is
slightly longer compare to hybrid PSO- Local search and standard PSO.
Ruey-Maw Chen et al. [19] explained that course timetabling problem is NP-complete problem.
They used the PSO for solving this problem due to its fast convergence, fewer parameters setting
and ability to fit dynamic environmental characteristics. They check the performance of PSO and
SPSO with and without local search. They indicate that after local search added thee outcomes
are significantly better than those obtained by using PSO or SPSO alone. Moreover the
performance of SPSO-local search is better than that of applying PSO-local search.
Elizabeth Montero et al. [20] analyzed the PSO when we have to face a dynamic problem where
new courses and exam can appear during the semester. They use the forward checking algorithm
and this approach can efficiently handle the creation of new courses or after the initial and static
start up planning.
Danial Qarouni - Faral et al. [21] used the swarm intelligence that is based on social
psychological principles as well as contributing to engineering applications. This paper applies
the PSO to the classic timetabling problem. The result shows that the number of errors is
decreased in comparison with previous approaches.
LI Lin et al. [22] described the course-scheduling problem with the PSO algorithm. They
described initial population with higher performance was obtained by improving greedy strategy
so as to promote efficiency of the algorithm. This paper analyzed the application of PSO in
course scheduling system, adding greedy strategy to the algorithm in selecting initial particle
swarm. The initial population is much closer to the target of optimal solution, and the
convergence speed of the algorithm could be enhanced. Meanwhile varies kinds of hard or soft
constraints are taken into account; the difficulty of course scheduling is reduced.
ZHU Jihrong et al. [23] explained a new adaptive particle swarm optimization algorithm. Every
particle chooses its inertial factor according to the fitness of itself and the optimal particle in the
presented algorithm. With better fitness, the particle chooses a smaller inertial factor. The
simulation results show that the proposed algorithm is effective and robust. Simulation results
show that the new algorithm has advantage of global convergence property and can effectively
alleviate the problem of premature convergence. At the same time, the experimental results also
show that the suggested algorithm is greatly superior to PSO and APSO in terms of robustness.
R.C. Eberhart et al. [24] compared the two methods of particle swarm optimization. They
compare performance of particle swarm optimization using an inerti a weight and using a
constriction factor. Five benchmark functions a r e used for the comparison. It is concluded that
the best approach is to use the constriction factor wh i l e limiting the maximum velocity Vmax
to the dynamic range of the variable Xmax on each dimension. The results here also indicate that
improved performance can be obtained by carefully selecting t he inertia weight w, c1,and c2.
Almost all of the papers in the literature describe a substantial software implementation. In
addition, this is supported by the presentation of results of the application of the method in one or
more test cases. The results obtained are measured against manual results but unfortunately, the
absence of a common definition of the various problems and of widely accepted benchmarks
prevents the comparison of the algorithms among each other. The computational complexity of
the proposed systems is determined only through computing time. However comparisons are
difficult as hardware varies from case to case. Furthermore there seems to be a substantial gap
between the theoretical discussion and implementation of the software to test cases in contrast to
obtaining effective and realistic timetables that can be used in every day operations. Therefore in
order to generate a timetable that is practical and effectual it needs to be flexible enough so that it
can facilitate and overcome the problems.

Chapter 3
TECHNIQUES
3.1 Techniques Applied to the Timetabling Problem
A timetabling problem can be defined as the scheduling of a certain number of lectures, which
are to be attended by specific group of students and given by a teacher, over a definite period of
time. Each lecture requires certain resources in limited number and must fulfill certain specific
requirements. In particular, automatic building of timetable is extremely difficult because of
diversity of constraints that must be taken into account.
The most usual methods to solve this problem are inherited from operations research such as
graph coloring and mathematical programming, or from Genetic Algorithms [9]. These well-
known and widely used methods have given good results. But, OR inherited methods generally
lack flexibility (i.e modifying the data may lead to the necessity of reconsidering the initial
model); moreover it is difficult to find a model which includes all the constraints. For local
search methods (where most of the constraints are put in the objective function) or for Genetic
Algorithms (where the constraints are active in the fitness function), the user frequently obtains
solutions by tuning rather than by defining his own search strategy dedicated to the problem.
This section divides the related techniques applied to university timetabling problems into six
categories i.e. constraint-based methods, graph-based approaches, cluster-based methods,
heuristic base approach and genetic approach. The details of these categories are discussed in the
following sub-sections. There are many other approaches are also for timetabling like
population-based approaches, meta-heuristic methods, multi-criteria approaches, hyper-
heuristic/self adaptive approaches, case-based reasoning, knowledge-based and fuzzy-based
approaches.
3.1.1 Constraint-based Methods
Reasoning approaches are considered new methodologies in problem solving. Two types of
reasoning approaches are applied on the UCT problem: Case-Based Reasoning (CBR) [28]
approaches and Constraint-Based Reasoning approaches. Case-Based Reasoning (CBR)
approaches are considered new methodologies in solving timetabling problems which use
previous timetables and previous construction methodology in solving latest timetabling
problems by using similarity measures. The big challenge for these approaches is a definition for
similarity measures between timetables [5]. Constraint-based reasoning approaches treated UCT
problem and modeling it as Constraint Satisfaction Problem (CSP) which is modeling as a set of
variables that have finite domains controlled by a set of constraints. Each constraint is related to
subset of variables and specifies the values for them. The consistent solution is freeconflict
solution which is achieved if an assignment that doesn't violate any constraints related to the
problem. The satisfaction of all CSP constraints sometimes is impossible, because some
precedence mechanism between constraints must employ to prefer constraint over another. The
main advantage for this method is very fast when it deals with small instances.
Constraint Logic Programming is based upon the integration of Constraint Solving and Logic
Programming. This combination helps make Constraint Logic Programming programs both
expressive and flexible, and in some cases, more efficient than other kinds of programs.
Constraint Logic Programming over Finite Domains (CLP) is based upon the integration of CSP
(Constraint Satisfaction Problems) approach in a Logic Programming scheme (Prolog). It
benefits from the results obtained in the AI community in CSP (domains, consistency techniques,
filtering algorithms, search strategies, . . . ) embedded in a Logic Programming scheme. Thus,
the user is provided with a uniform framework in order to both model his problem (constraints)
and develop his own search methods (labeling). It has already been proved that CLP is successful
in tackling many combinatorial optimization problems.
3.1.2 Cluster-based Methods
Clustering technique is another approach used for timetabling; clustering is the process of
finding classes of objects that share common characteristics [29]. Clustering is mainly based on
splitting the events into clusters or groups were each cluster is scheduled in the same timeslots
without having conflicts. Clustering methods satisfy the soft constrains using additional
optimizing rules for obtaining good solution, in order to use this approach the courses are
grouped into fixed clusters at early stages of the algorithm; the formation of the cluster is done
manually based on some predefined rules which leads to poor quality timetable. An FP-tree is a
compressed representation of the input data (transactions). It is constructed by reading the data
set one transaction at a time and mapping each transaction onto a path in the FP-tree [29]. FP-
tree technique has been used in many applications such as clustering and document organization.
In this paper we customize a FP-tree algorithm to dynamically generate clusters from the
students transactions (courses to be registered for the coming semester)
Cluster methods were classified as one of the four major approaches by Carter and Laporte
(1996). The idea of the cluster method was first coined by Desroches et al. (1978). White and
Chan (1979) and White and Haddad (1983) describe cluster methods as which can be thought of
as representing a three phase approach. In the first phase, the examinations are grouped into
timeslots to construct a feasible timetable. The second phase attempts to reduce second order
conflicts by considering permutations of timeslots. Then the third stage is employed with the aim
of improving the solution quality further. This is done by moving a particular examination
between timeslots such as by employing a hill climbing local search.
3.1.3 Graph-based Approaches
Graph coloring is concerned with coloring the vertices of a given graph using a given number of
colors. Let us consider the examination timetabling problem. We need to schedule all the
examinations within a limited number of timeslots in such a way that any clashing examinations
(i.e. examinations that have at least a common student) are scheduled in different timeslots, so
this problem can be viewed as a graph coloring model where the vertices represent the
examinations, the colors represent the slots and the edges represent the conflicts between
examinations. Each vertex of a graph should be colored using p colors so that no two vertices
connected by an edge are both assigned the same color and normally there are a limited number
of colors available.
A definition of the concepts and terms that relate to a graph is given before progressing with the
explanation of this model. An undirected graph G = (V, E) is a representation that consists of a
set of vertices, V = {v
1
,,v
n
}, and a set of edges, E. If (v
i
,v
j
) is an edge in a graph G = (V, E),
then vertex v
i
is adjacent to vertex v
j
(Burke et al., 2004a). Figure 3.1 shows the representation of
an undirected graph on the vertex set {v
1
, v
2
, v
3
, v
4
, v
5
}.

Figure 3.1 an undirected graph G = (V, E)

Other related definitions are:
The degree of a vertex is the number of edges connected to it. For example, from Figure 3.1,
vertex v
1
has a degree of 3.
The chromatic number of a graph is the minimum number of colors necessary to color the
vertices, so that no two vertices connected by an edge are both assigned the same color.
For a better understanding about the relationship between the graph coloring problem and the
timetabling problem, an example of course timetabling is presented in Figure 3.2.
From Figure 3.2, we can see that there are five different courses coded as A, B, C, D and E. One
possible goal is to find the minimum number of timeslots that are needed to schedule the five
courses. A set of edges represents clashes between courses. If there is an edge between vertices,
it means that these courses cannot be scheduled in the same timeslot. In our example, course A
cannot be scheduled at the same time as course B and C. Course B cannot be scheduled at the
same time as course A and D and so on. Clearly 3 colors (timeslots) are needed to schedule this
problem. Course A and E could be colored red, course D could be colored yellow, course B could
be colored blue and course C could be colored yellow or blue. The colors correspond to
timeslots. The graph coloring problem is concerned with finding the chromatic number of a
graph (which is the minimum number of colors required to color the graph). From the graph in
Figure 3.2, it is easy to see that the chromatic number is 3.

Figure 3.2 A graph model for a simple course timetabling problem

A variety of graph coloring based heuristics for constructing a clash-free timetable is available in
the literature.
3.1.4 Heuristic Approach
The Heuristic-based approaches use heuristic concepts and heuristic search to construct and
define the solutions for many problems and give good results. Through the recent decades, there
has been a heaviness of literature on heuristic approaches to solve timetabling problems and
many researches discuss the heuristic topics in related field. Some heuristic approaches employ
heuristic ordering where a heuristic is used to measure the difficulty of scheduling a particular
course and solve conflicting between other courses [2]. These approaches order courses by using
heuristics and then assign the courses sequentially into proper time slot; so that, courses in the
period are free-conflict with each other.
Heuristic optimization methods are explicitly aimed at good feasible solutions that may not be
optimal where complexity of problem or limited time available does not allow exact solution.
Generally, two questions arise (i) How fast the solution is computed? and (ii) How close the
solution is to the optimal one? Tradeoff is often required between time and quality which is taken
care of by running simpler algorithms more than once, comparing results obtained with more
complicated ones and effectiveness in comparing different heuristics. The empirical evaluation
of heuristic method is based on analytical difficulty involved in the problems worst case result.
In its simplest form the scheduling task consists of mapping class, teacher and room
combinations (which have already been pre- allocated) onto time slots.
One possible approach is as follows: We define a tuple as a particular combination of identifiers
such as class, teacher and room, which is supplied as an input to the problem.[2] The problem
now becomes one of mapping of tuples onto period slots such that tuples which occupy the same
period slot are disjoint (have no identifiers in common). If tuples are assigned arbitrarily to
periods, then in anything but the most trivial cases, a number of clashes will exist. We can use
the number of clashes in a timetable as an objective measure of the quality of the schedule. Thus,
we adopt the number of clashes as the cost of any given schedule. It is simple to measure the cost
of a schedule. For each period of the week, we make a count of the number of occurrences of
each class, teacher and room identifier. The cost of the entire timetable is the sum of each of the
individual costs. This procedure is discussed in more detail in Abramson [21]. The proposed
algorithm aids solving the timetabling problem while giving importance to teacher availability.
This algorithm uses a heuristic approach to give a general solution to school timetabling
problem. It takes the user input of a number of subjects, number of teachers, subjects every
teacher takes, number of days in a week for which the timetable needs to be set, number of time
slots in a day and the maximum lectures a teacher can conduct in a week. It initially uses
randomly generated subject sequence to make a temporary time table. While generating this
sequence, care is taken to avoid repetition of subjects over a day. After this, the teacher
availability for each of the subjects allocated for the respective slot is checked. Every time a
teacher is available for the subject at the allocated slot, the subject and the teacher are entered
into the output data structure and marked as final. Before the allocation of this subject to the
output data structure, a check is also conducted on the number of maximum lectures a teacher
can conduct. If the teacher has been allocated more than the allowed maximum lectures the
subject is moved into a Clash data structure. To avoid cycling and to improve the search, this
variable selection criterion can be randomized. There are several methods [22] which can be
applied,
e.g.: a random walk technique (with the given probability p a random variable is selected) not
the worst variable, but a random selection of a variable worse enough (e.g., from the top N worst
variables), or a selection of a variable according to a probability based on the above mentioned
criteria (e.g., roulette wheel selection).
The main advantage for these orderings, it is easy to implement. After the courses ordered,
variety of approaches can be used to choose the best time slot for each course.
The disadvantages of SA method, it needs long time to get good solutions and must supply some
parameters with awareness [16]. Another meta-heuristics used to solve timetabling problem is
the Tabu Search (TS) method, which remembers the features of prior solutions to avoid visiting
them again. This reduces the search space and gets results relatively quickly.
3.1.5 Genetic approaches
The Genetic Searching (GS) algorithms are other meta-heuristics approaches, which employed to
obtain high quality timetables. Many papers written in the literature employ and apply the
genetic algorithms in their approaches to solve the timetabling problems such as [28].
In general, a genetic searching method starts by producing randomized timetables which present
a parent population for the timetabling problem. After that, each generated timetable is converted
to consistent timetable by eliminating courses that cause conflicting with other courses. Some
initial timetables may be empty which no courses are scheduled. After that, selection criterion
applied to choose timetables that used to get new parent population using genetic operators [6].
This operation repeated until the produced solution contains all scheduled courses and soft
constraints satisfied with maximum satisfaction degree. The general algorithm for genetic is as
follows:
Create a Random initial state
An initial population is created from a random selection of solutions (which are analogous to
chromosomes).
Evaluate Fitness
A value for fitness is assigned to each solution (chromosome) depending on how close it actually
is to solving the problem (thus arriving to the answer of desired problem),(These solutions are
not to be confused with answers to the problem, think of them as possible characteristics that the
system would employ in order to reach the answer.)
Reproduce (&Children Mutate)
Those chromosomes with a higher fitness value are more likely to reproduce offspring (which
can mutate after reproduction). The offspring is a product of the father and mother, whose
composition consists of a combination of genes from them (this process is known as crossing
over).
Next Generation
If the new generation contains a solution that produce an output that is close enough or equal to
the desired answer then the problem has been solved. If this is not the case, then the new
generation will go through the same process as their parents did. This will continue until a
solution is reached.

Chapter 4
EXPERIMENTAL PROCEDURES
This chapter describes the experimental procedures followed and processing parameters
selected in the present study.
4.1 Methodology
Making a class schedule is one of those NP complete problems. The problem can be solved
using a heuristic search algorithm and genetic algorithm to find the solution, but it only works for
simple cases. For more complex inputs and requirements, finding a considerably good solution can
take a while, or it may be impossible. In this dissertation work we use the improved k-mean
clustering algorithm and decision tree techniques for solving the timetabling problem.
4.2 Research Design
The thesis work is carried out through a number of stages starting from problem selection to
literature review about the state of art technology specific to Automated timetable Generator on
Java Platform. Most of the time is spent in identifying and selecting the problem and literature
review. Selection of optimization algorithms and understanding the working of it also took a lot
of time. We divided the overall research into four stages as shown in figure 4.1 below:

Figure 4.1 Research Methodology
In this research work we used the improved k-mean clustering algorithm for the clustering of
data set. Clustering is finding groups of objects such that the objects in one group will be similar
to one another and different from the objects in another group. The traditional K-means
algorithm is a widely used clustering algorithm, with a wide range of applications. In the
improved K-means clustering algorithm analysis the advantages and disadvantages of the
Problem
Identification &
Selection
Literature
Review

Select
Appropriate
Algorithm
Toolbox
Optimization
Results
traditional K-means clustering algorithm elaborates the method of improving the K-means
clustering algorithm based on improve the initial focal point and determine the K value.
Simulation experiments prove that the improved clustering algorithm is not only more stable in
clustering process, at the same time, improved clustering algorithm to reduce or even avoid the
impact of the noise data in the dataset object to ensure that the final clustering result is more
accurate and effective.
We also used the decision tree technique of data mining for the classification of the clustered
data set. A decision tree is a flow-chart-like tree structure, where each internal node is denoted
by rectangles, and leaf nodes are denoted by ovals. All internal nodes have two or more child
nodes. All internal nodes contain splits, which test the value of an expression of the attributes.
Arcs from an internal node to its children are labeled with distinct outcomes of the test. Each leaf
node has a class label associated with it.
This dissertation work is implemented on java platform. Java is a computer programming
language that is concurrent, class-based, object-oriented, and specifically designed to have as
few implementation dependencies as possible. It is intended to let application developers "write
once, run anywhere" (WORA), meaning that code that runs on one platform does not need to be
recompiled to run on another. Java applications are typically compiled to bytecode (class file)
that can run on any Java virtual machine (JVM) regardless of computer architecture. Java is, as
of 2014, one of the most popular programming languages in use. The main steps of this research
work are following:
1. Dynamically/ manually create the data base
2. Connect the database with Java Net-beans IDE.
3. Pre process the data set with clustering algorithm.
4. Classify the data set clusters using Decision Tree
5. Knowledge discovery of time table scheduled and made.
4.2.1 Dynamically/manually create the database
In this dissertation work, we first create the database for the teacher registration, student
registration, for subjects of seven semesters of b.tech computer science, for the attendance of
teachers, for the room numbers of college and for the time slots. In this work we create the
database timetable scheduling. Timetable scheduling database has total 13 tables like teacher that
has all the information about the teachers. We can enter the information about a teacher in this
table when we perform the registration procedure for any teacher; we also take the snap of teacher
at the time of registration. Timetable database also has a table student that contains the information
regarding registration of student. We can enter information of student in student table when a new
student takes admission in college or during their registration. Timetable scheduling database also
has the tables like semesterfirst, semestersecond, semesterthird, semesterforth, semesterfifth,
semestersixth and semesterseventh that has information about the subjects of relative semesters.
We create an attendance table in this database that contains the information about the presence or
absence of teachers. We also have a timetable table in the timetable scheduling database that has
the information regarding timeslots of college.
4.2.2 Connect the database with java net-beans IDE
After creating the database we need to connect it with java net-beans. We connect the database
with java net-beans, so that we can dynamically enter the data into database by using user interface
and we can also fetch the data from database using this interface when required. Using this
connection with the database, we can enter the teachers registration information; Student
registration information, attendance of teachers etc enter into the database dynamically. We can
also able to fetch any information from database when required. Using this connection we fetch
data from the database and schedule the timetable for seven semesters of courses.
4.2.3 Pre process the data set with clustering algorithm
After entering the data into database, we need to preprocess the dataset by using the clustering
algorithm. In this dissertation work we use the improved k-mean clustering algorithm. By using
improved k-mean clustering algorithm, we create the clusters for teachers, subjects, rooms and
timeslots. We use the improved k-mean clustering algorithm instead of k-mean algorithm because
improved k-mean clustering has number of advantages over k-mean and it also overcome the
disadvantages of k-mean clustering algorithm. The k-mean and improved k-mean algorithms are
described in following sections.
4.2.3.1 K-mean algorithm
K-means cluster algorithm was proposed by J. B. MacQueen in 1967, which is used to deal with
the problem of data clustering, the algorithm is relatively simple, so generate a widely influence in
the scientific field research and industrial applications [30]. It is based on decomposition, using K
as a parameter, divide n object into K relatively low similarity between clusters. And minimize the
total distance between the values in each cluster to the cluster center. The cluster center of each
cluster is the mean value of the cluster. The calculation of similarity is done by mean value of the
cluster objects. The measurement of the similarity for the algorithm selection is by the reciprocal of
the Euclidean distance.
a) Procedure of K-means Algorithm
Distribute all objects to K number of different cluster at random;
Calculate the mean value of each cluster, and use this mean value to represent the cluster;
Re-distribute the objects to the closest cluster according to its distance to the cluster center;
Update the mean value of the cluster. That is to say, calculate the mean value of the objects
in each cluster;
Calculate the criterion function E, until the criterion function converges.
Usually, the K-means algorithm criterion function adopts square error criterion, be defined as:
K n
E= ||x
i
-m
j
||
2

J=1 i=1
x
i
c
j

In which, E is total square error of all the objects in the data cluster, x
i
bellows to data object set,
m
i
is mean value of cluster C
i
(x and m are both multi-dimensional). The function of this
criterion is to make the generated cluster be as compacted and independent as possible.
b) Analysis of the Performance of K-means Algorithm
Advantages:
1. K-mean value algorithm is a classic algorithm to resolve cluster problems; this algorithm is
relatively simple and fast.
2. For large data collection, this algorithm is relatively flexible and high efficient, because the
Complexity is O (ntk). Among which, n is the times of iteration, k is the number of cluster, t is
the times of iteration. Usually, kn and tn. The algorithm usually ends with local optimum.
3. Because the limitation of the Euclidean distance. It can only process the numerical value, with
good geometrical and statistic meaning.
Disadvantages:
The inherent prosperities of the K-means clustering algorithm to determine its limitations,
specific performance is as follows:
1. The K value is most important for K-means clustering algorithm. There is no applicable
evidence for the decision of the value of K (number of cluster to generate), and sensitive to initial
value, for different initial value, there may be different clusters generated.
2. K-means clustering algorithm has a higher dependence of the initial cluster centers. If the
initial cluster center is completely away from the cluster center of the data itself, the number of
iterations tends to infinity, but also makes it easier for the final clustering results into local
optimization, resulting in incorrect clustering results.
3. K-means clustering algorithm has a strong sensitivity to the noise data objects. If there is a
certain amount of noise data in dataset, it will affect the final clustering results, leading to its
error.
4. K-means clustering algorithm for the discovery of clusters of arbitrary shape is most difficult.
5. K-means clustering algorithm has main limitation on amount of data. In the iterative process,
every time you need to adjust the cluster to which data object belongs and compute cluster
center, so in case of large amount of data, the K-means clustering algorithm is not applicable.
4.2.3.2 The Research Point of K-means Clustering Algorithm
The research on K-means clustering algorithm is mainly from the following two aspects:
First, about the determination of k value. Through the above analysis, the K value of the initial
cluster centers to determine the far-reaching impact throughout the clustering process and the
final clustering results, while the K value in practical applications is very difficult to direct or
one-time determination [30]. Especially, if the amount of data tends to infinity which is pending,
the K value of the K-means algorithm to determine will be very difficult. At present, there are
two clustering algorithms to determine the K value is relatively effective which is the cost
function based on distance and propagation clustering algorithm based on nearest neighbors. The
former find the minimum through using the cost function. Thus obtain the corresponding K
value. The latter using nearest neighbor clustering algorithm to calculate the appropriate number
of cluster center, the number of cluster center provides for the maximum K value of the K-means
clustering algorithm to get the optimal value of K. Second, about the choice of initial cluster
centers. K-means clustering algorithm using the iterative method to solve the problem, except the
first step, the clustering results of each step are improved to some extent; otherwise terminate the
process of iteration. Traditional K-means clustering algorithm takes the cluster squares error and
the criterion function value change or not as the iterative termination conditions. But the
clustering results obtained from this criterion function easily fall into local minimum solution,
the result is the clustering results of search are moving toward the direction of diminishing the
criterion function value [31]. In this, the improvement of K-means algorithm is mainly reflected
in the following two aspects:
Optimize the initial cluster centers, to find a set of data to reflect the characteristics of data
distribution as the initial cluster centers, to support the division of the data to the greatest extent.
Optimize the calculation of cluster centers and data points to the cluster center distance, and
make it more match with the goal of clustering.
4.2.3.3 Improved K-means Clustering Algorithm
a) Related Concept
Definition 1 The distance between data points and the cluster center. The distance formula of
data point x
i
and cluster center k
j
defined as following [5]:

(2)
Where w represents the number of attributes of the data points x
i
.
Definition 2 The density parameter . The number of data points which is contained by a scope
defined as density parameter. The scope is a round which takes space point of not statistics x
i
as
the center, as the radius. The greater the density of x
i
, the greater the value of the density
parameter are.
Definition 3 The core data points. If the y-neighborhood of a data point contains at least PTS
_min number of data points, then the data point called the core data point.
Definition 4 The cluster center. Differences from the traditional clustering adjustment, the
improved clustering algorithm add the weight of data point to the cluster center. Data points near
the center of the cluster weights, on the contrary, the value of data points away from the cluster
center is less weight. The formula of cluster center defined as follow:

(3)
Where j represents the jth cluster, h is the number of data points in the cluster, d
jh
represents the
distance between the hth data point which belongs to cluster c and cluster center. And with the
restriction of dj1 dj2 : : : djh,

Definition 5 The Euclidean distance between data points and the cluster center. The distance
between data point and the cluster center determine the cluster which data point belongs to, the
formula of Euclidean distance is defined as follows:

(4)
where j represents the jth cluster c
j
, i represents the ith data point x
i
, dji is the Euclidean distance
between data point x
i
and the cluster center c
j
,
represents the squares error of the cluster c

j
,
is the squares error sum of the K clusters c.
b) Improved K-means Algorithm Description
Algorithm 1: Improved K-means Algorithm
Input: data set x contains n data points; the number of cluster is k.
Output: k clusters of meet the criterion function convergence.
Program process:
Step 1. Initialize the cluster center.
Step 1.1 Select a data point x
i
from data set X, set the identified as statistics and compute the
distance between x
i
and other data point in the data set X. If it meet the distance threshold, then
identify the data points as statistics, the density value of the data point x
i
add 1.
Step 1.2 Select the data point which is not identified as statistics, set the identified as statistics
and compute its density value. Repeat Step 1.2 until all the data points in the data set X have
been identified as statistics.
Step1.3 Select data point from data set which the density value is greater than the threshold and
add it to the corresponding high-density area set D.
Step 1.4 Filter the data point from the corresponding high-density area set D that the density of
data points relatively high, added it to the initial cluster center set. Followed to find the k-1 data
points, making the distance among k initial cluster centers are the largest.
Step 2 Assigned the n data points from data set X to the closet cluster.
Step 3 Adjust each cluster center K by the formula (3).
Step 4 Calculate the distance of various data objects from each cluster center by formula (4), and
redistribute the n data points to corresponding cluster.
Step 5 Adjust each cluster center K by the formula (3).
Step6 Calculate the criterion function E using formula (1), to determine whether the
convergence, if convergence, then continue; otherwise, jump to Step 4.
4.2.4 Classify the data set clusters using Decision Tree
After making the clusters for the data set, we need to do classification of the clusters, so that we
can assign the teachers to different courses, subjects to teachers, class rooms to different classes
without any clash. In this dissertation work we classify the data set clusters by using the decision
tree technique. By using decision tree technique we also try to satisfy the soft constraints on
timetable schedule like assign the subjects to teachers according to their choice, give the preference
to more experienced teachers, try to assign the class rooms according to their choice and also
assign the timeslots according to teachers choice.
4.2.4.1 Classification
Classification consists of examining the features of a newly presented object and assigning to it a
predefined class. The classification task is characterized by the well-defined classes, and a
training set consisting of pre-classified examples. The task is to build a model that can be applied
to unclassified data in order to classify it. Examples of classification tasks include:
Classification of credit applicants as low, medium or high risk
Classification of mushrooms as edible or poisonous
Determination of which home telephone lines are used for internet access
Predictive modeling can sometime-but not necessarily desirably be seen as a Black box that
makes predictions about the future based on information from the past and present. Some models
are better than others in terms of accuracy. Some models are better than others in terms of
understandability; for example, the models range from easy-to-understand to incomprehensible
(in order of understandability): decision trees, rule induction, regression models, and neural
networks. Classification is one kind of predictive modeling. More specially, classification is the
process of assigning new objects to predefined Categories or classes: Given a set of labeled
records, build a model such as a decision tree, and predict labels for future unlabeled records.
4.2.4.2 Decision tree
Decision tree is a classification scheme which generates a tree and asset of rules, representing the
model of different classes, from a given dataset. Decision tree is a flow chart like tree structure,
where each internal node denotes a test on an attribute, each branch represents an outcome of the
test and leaf nodes represent the classes or class distributions. The top most node in a tree is the
root node. We can easily derive the rules corresponding to the tree by traversing each leaf of the
tree starting from the node. It may be noted that many different leaves of the tree may refer to the
same class labels, but each leaf refers to a different rule. Decision trees are attractive in data
mining as they represent rule which can readily be expressed in natural language. The major
strength of the decision tree methods are the following:
1. Decision tree are able to generate understandable rules.
2. They are able to handle both numerical and categorical attributes.
3. They provide a clear indication of which fields are most important for prediction or
classification.
When a decision tree is used for classification tasks, it is more appropriately referred to as a
classification tree. Classification trees are used to classify an object or an instance to a predefined
set of classes based on their attributes values. Classification trees are frequently used in applied
fields such as finance, marketing, engineering and medicine. The classification tree is useful as
an exploratory technique.
4.2.4.3 The Mathematical Programming Model
In order to study the computational effort involved in solving the problem of interest, the
following mathematical programming model is proposed. [3]
We define the following sets to be used in the model:
I set of all teachers
J set of all courses
K set of all subjects
L set of all days available
M set of all time periods available
C number of classrooms available per time period
Finally, the following decision variables will be required to define the problem:
X
ijklm
=1 if teacher I teaches course j subject k on day l and at time period m;0otherwise(iI, jJ,
kK
j
,lL,mM)
Xmi=Sum of no. of class rooms that is allocated to all teachers at particular slot m, 1 if teacher i
teaches at time slot m; 0otherwise (mM,iI)
P
ik
=Lies between 1 and 3, each teacher teaches at least one subject and at most three subjects
(iI,kK)
L
i
=load of teacher iper week (iI)

For our problem, the objective function reflects a preference function that needs to be
maximized. It refers to the total preferences of assigning courses to the teachers. The objective
function is described by the expression in equation (1):

Maximize

(1)

The following depicts some of the main constraints encountered in our timetabling
problem

1 (j J, l L, m M) (2)

Equation (2) ensures that for a particular course, only one or zero subject conducted in every
time period.

1
3
(I

(3)

Equation (3) represents the minimum and maximum number of subjects taught by each teacher.It
is assumed that each teacher as to teach at least one course and at most three subjects.

1 (i I, l L, m M) (4)

Equation (4) ensures that each teacher can only teach at most one course section in a particular
time period.
(i I, m M) (5)

Equation (5) represents the constraint that at each time period, the number of course sections
taught by teachers could not be more than the number of classrooms available.

(i I, l L) (6)

Equation (6) calculates the load of each teacher per week.
4.2.5 Knowledge discovery of time table scheduled and made
After classification we schedule the timetable for only those teachers that are present for this
semester and those who are not present in this semester we are not able to assign any subjects to
those teachers. In this work we create the pdf file of timetable schedule that contains the timetable
schedule of seven semesters of B.Tech in computer engineering branch. This pdf file is
automatically stored in D drive with file name test.pdf. When we schedule the timetable it also
shows the path of pdf file where it is stored.

Chapter 5
RESULTS AND DISSCUSSION
5.1 Implementation
The dissertation work on timetable scheduling based on modified clustering and decision tree
techniques is implemented on java platform using net-beans. Java is a computer programming
language that is concurrent, class-based, object-oriented, and specifically designed to have as
few implementation dependencies as possible. It is intended to let application developers "write
once, run anywhere" (WORA), meaning that code that runs on one platform does not need to be
recompiled to run on another. Java applications are typically compiled to bytecode (class file)
that can run on any Java virtual machine (JVM) regardless of computer architecture. Java is, as
of 2014, one of the most popular programming languages in use.
5.2 Working of System
Firstly, when we run the system it displays the following page that is shown in figure 5.1. This
page has four buttons. One for admin login, teacher registration, student registration and last one
is exit. If we want to register the new teacher or student then we does not need to admin login.
We can register the new teacher or new student by just clicking corresponding buttons in the
interface and we can fill the details of new student and teacher for the registration. If we want to
exit from the system then we need to just press the exit button and our project will be stop
running. The snap-shot of first page when we run our project shown in following figure 5.1

Figure 5.1 first interface of system

5.2.1 Admin Login
When we click on admin login button then following screen in figure 5.2 is displayed in front of
us. In this we need to enter the username and password for login as an admin. After entering
username and password then we click on login in button, if our username and password are
correct then we can login as admin in the system. If our username or password is wrong then it
will display an error message in a dialog box that is username or password is wrong. Admin
login page has four buttons one for login in that is used to login in as admin into system. Others
are reset, back and exit. Reset button is used when we enter username or password or something
wrong and we know that it is wrong then we can reset it and we can reenter the username and
password. Back button is used for go back to the first screen of system that is shown in figure
5.1. Exit button is used to exit from the system and when we press this button then our project
will be stop running. The snap-shot of admin login screen is shown in following figure 5.2.

Figure 5.2 Admin Login Screen

5.2.2 Student Registration
When we go back to first screen that is shown in figure 5.1 from there we can also do the
registration of new teacher or new student. If we click on student registration button then
following form that is shown in figure 5.3 will be displayed in front of us. In this screen we can
enter the details of new student. All the details of new student will be stored into database
timetable scheduling under the table name student. In this form there is number of textboxes for
the details of student according to labels in front of textboxes. Student registration screen has
four buttons, one of them is submit button. After entering all the details of new student then we
click on submit button. If all the details are correct corresponding to the textbox values then new
registration will be done successfully and all the details of new student will be stored in table
student in database timetable scheduling. When we click on submit button then it shows the
message in dialog box that is successfully uploaded and reset all the textboxes of student
registration form. This screen also has four buttons submit, reset, back and exit. The submit
button is used for successfully upload or store the details of new student into database. Reset
button is used to clear the textboxes if we do not want to enter the details of student. Back button
is used to go back to first screen of project. Exit button is used to exit from the system and
project will be stop running.

Figure 5.3 Student Registration

5.2.3 Teacher Registration
Like student registration, we can also register the new teacher. When we click on the teacher
registration button on first screen of project then teacher registration form will be displayed in
front of us. This form is shown in following figure 5.4. This screen will take all the information
regarding new teacher. Teacher registration form teacher can give the information for their
subject choice, so that according to his choice subject can be assigned to a teacher. This form
also takes the information about the experience of teachers. By using this preference is given to
those teachers who have more experience. In this form teacher can fill the three choices for their
subject of interest i.e. first choice, second choice and third choice. Timetable schedule firstly it
consider the first choice of teacher, if it is assigned to any other teacher then it will consider
second and third choice. At the time of teacher registration it will also need to upload the picture
of new registered teacher. This form contains a button browse image that is used to upload the
image of new teacher into database. When image is successfully uploaded then it will display a
dialog box that gives the message that image successfully uploaded. When we click on submitted
button then all the information of new registered teacher is uploaded into database timetable
scheduling under the table name teacher. This form has five buttons i.e. submitted is used to
store the information into database, browse image is used to upload the image of new
registration, reset is used to clear the textbox data, back is used to go back page that is shown in
figure 5.1 and exit is used to exit from project. The snap shot is shown in figure 5.5.

Figure 5.4 Teacher Registration
5.2.4 Admin Section
The admin login screen shown in figure 5.2, when we enter the correct username and password
and click on login button then following screen shown in figure 5.6 is shown in front of us. This
screen shows all the permissions that admin have, so it named as admin section. This screen has
five buttons. Detail of teacher button is used to fetch the details of particular teacher. The student
teacher record button will show all the records of students and teachers in college. The
attendance of teacher button is used to take the attendance of teachers. The pdf file generate
button is used to schedule the timetable of b.tech for seven semesters, this button create the pdf
file of the timetable of b.tech. The snap-shot of admin section is shown in figure 5.6.

Figure 5.5 Admin Section

5.2.5 Detail of teacher
When we click on detail of teacher button in admin section screen then it will show the following
screen in figure 5.7. In this screen we need to enter the Id of teacher to see the details of
particular teacher. This screen has three buttons. The detail button is used to see the details of
teacher whose id is given in textbox. Back button is used to go back to the admin section screen
shown in figure 5.6 and exit button is used to exit from the project then project will be stop
running. The snap-shot of teacher detail is shown in figure 5.7.

Figure 5.6 Teacher Detail

When we enter the correct teacher Id and click on detail button then following screen in figure
5.8 will be displayed in front of us. This screen will contain all the information about teacher
corresponding to his id. This form will also display the image of teacher. If we enter the wrong id
then it will not display any information, it just gives the dialog box that contains the message that
not valid id. This screen has only two buttons back and exit. Back button is used to go back
screen shown in figure 5.7 and exit button is used to exit from the project. The snap-shot of
teacher detail is shown in figure 5.8.

Figure 5.7 Teacher information

5.2.6 Student Teacher Record
Admin has the permission to see all the records of students as well as teachers. When we click on
student teacher record button in admin section shown in figure 5.6 then it will display all the
records of students and teachers in the new screen student teacher record that is shown in
following figure 5.9. In this screen left side frame shows details of all teachers and right side
frame shows details of all the students. The detail of students and teachers are fetched from the
database timetable scheduling under the tables student and teachers. This screen has only two
buttons. The back button is used to go back screen that is admin section in figure 5.6 and exit
button is used to exit from the project. The snap-shot of student teacher record is shown in figure
5.9.

Figure 5.8 Student Teacher Record

5.2.7 Attendance of Teacher
Admin has also permission to mark the attendance of teachers. In the admin section screen
shown in figure 5.6 when we click on button attendance of teacher then following screen shown
in figure 5.10 is displayed. This screen takes id of teacher and name of teacher then we can mark
the attendance of teacher that is present or absent. The timetable is scheduled only for those
teachers who are present in college for the particular session and those teachers who are not
present are mark as absent and does not include in scheduled timetable. This screen has four
buttons. Submit button is used to store the attendance detail of teacher into database timetable
scheduling under the table name attendance. When we click on submit button then it store the
attendance details into database and also show the dialog box that contain the message
successfully uploaded. Reset button is used to clear the textboxes. Back button is used to go back
screen that is admin section screen shown in figure 5.6 and exit button is used to exit from the
project. The snap-shot of attendance of teacher screen is shown in following figure 5.10.

Figure 5.9 Attendance of Teacher

5.2.8 Pdf File Generate
Admin has permission to generate the timetable schedule. When we click on pdf file generate
button in admin section shown in figure 5.6 then following screen shown in figure 5.11 is
displayed in front of us. This screen shows the attendance details of teachers. It shows the list of
those teachers which are present at time of timetable scheduling, so that timetable is scheduled
only for those teachers who are present at that time. This page has tree buttons, the timeschedule
button is used to create pdf file of scheduled timetable. Back button is used to go back at admin
section screen that is shown in figure 5.6 and exit button is used to exit from the project. The
snap-shot of pdf file generate screen is shown in figure 5.11.

Figure 5.10 pdf file generate

When we click on timeschedule button then it shows the time taken by improved k-mean
clustering algorithm in a dialog box, it shows it by start time and end time of improved k-mean
algorithm in a dialog box and then shows the difference between start and end time in a dialog
box. It also shows the path of pdf file in dialog box where pdf file is stored in our computer and
this pdf file is saved as the name of test.pdf in D drive. This pdf file contains the timetable
schedule of seven semesters of computer science branch. The following figure 5.12 shows the
dialog box that shows path of pdf file generated. When we click on ok button then our project
stop running and our pdf file is saved in D drive. The timetable schedule of semester first is
shown in figure 4.13 and like first semester timetable; the timetable of seven semesters is also in
pdf file.

Figure 5.11 pdf file path

Figure 5.13 timetable of First semester

5.3 Results
Accuracy: Timetabling is known to be a non-polynomial complete problem i.e. there is no
known efficient way to locate a solution. Also, the most striking characteristic of NP-complete
problems is that, no best solution to them is known. Hence, in order to find a solution to a
timetabling problem, a heuristic approach is mostly chosen. This heuristic approach, therein,
leads to a set of good solutions (but not necessarily the best solution). In a general educational
timetabling problem, a set of events (e.g. courses and exams, etc) are assigned into a certain
number of timeslots (time periods) subject to a set of constraints, which often makes the problem
very difficult to solve in real-world circumstances [2]. In fact, large-scale timetables such as
university timetables may need many hours of work spent by qualified people or team in order to
produce high quality timetables with optimal constraint satisfaction [7] and optimization of
timetables objectives at the same time. Heuristic optimization methods are explicitly aimed at
good feasible solutions that may not be optimal where complexity of problem or limited time
available does not allow exact solution.
The intension of timetable scheduling based on modified k-mean clustering and decision tree
techniques is satisfied. This technique incorporates an improved k-mean clustering and decision
tree techniques to improve the efficiency of search operation. The improved clustering algorithm
is not only more stable in clustering process, at the same time, improved clustering algorithm to
reduce or even avoid the impact of the noise data in the dataset object to ensure that the final
clustering result is more accurate and effective. It also, addresses the important hard constraints
of clashes between the availability of teachers, time-slots and room booked etc. The non-rigid
soft constraints i.e. optimization objectives for the search operation are also handled. So, this
approach of time table scheduling generates highly accurate timetable without any clashes
between teachers availability, timeslots and room booking etc. So the accuracy is 99%
achieved in this new approach as no data set which is input is left or unscheduled.
Execution-Time: In this dissertation work we used the modified k-mean clustering algorithm.
The execution time of modified k-mean clustering algorithm is less than the k-mean algorithm
because in this algorithm we optimize the initial cluster centers, to find a set of data to reflect the
characteristics of data distribution as the initial cluster centers, to support the division of the data
to the greatest extent. We also optimize the calculation of cluster centers and data points to the
cluster center distance, and make it more match with the goal of clustering. So that by using this
algorithm for timetable scheduling takes less execution time. We compare the execution time of
modified k-mean clustering with simple k-mean clustering algorithm which shows that it is less
than k-mean algorithms execution time. Our system also compute the execution time of timetable
schedule has taken, it shows that this system takes very small amount of time for scheduling and
generating the timetable pdf file. It takes time in milliseconds.
Precision and Recall: precision (also called positive predictive value) is the fraction of
retrieved instances that are relevant, while recall (also known as sensitivity) is the fraction of
relevant instances that are retrieved. Both precision and recall are therefore based on an
understanding and measure of relevance. Suppose a program for recognizing dogs in scenes from
a video identifies 7 dogs in a scene containing 9 dogs and some cats. If 4 of the identifications
are correct, but 3 are actually cats, the program's precision is 4/7 while its recall is 4/9. When a
search engine returns 30 pages only 20 of which were relevant while failing to return 40
additional relevant pages, its precision is 20/30 = 2/3 while its recall is 20/60 = 1/3. Precision is
calculated as the fraction of correct objects among those that the algorithm believes belonging to
the relevant class. It can be loosely equated to accuracy and it will roughly answer the question:
How many of the points in this cluster belong there correctly classified? Recall is roughly
answers the question. E.g. "Did all of the documents that belong in this cluster make it in?" In
other words, recall is the fraction of actual objects that were identified.
In simple terms, high precision means that an algorithm returned substantially more relevant
results than irrelevant, while high recall means that an algorithm returned most of the relevant
results. It is been observed that heuristic approach could lead to clashes in time table hence the
decision maker algorithm used in time scheduling must have been facing some defects. In this
work some algorithmic approach is being followed apart from heuristics. So that it provide the
high precision and high recall value for the timetable scheduling. This new approach retrieve
most of the relevant result for time table scheduling because there is not any clash between the
teachers, class rooms, subjects, courses etc. so this new approach provides high precision and
high recall.

Chapter 6
CONCLUSION AND FUTURE WORK
6.1 Conclusion
In this dissertation work, we have presented timetable scheduling based on modified clustering
decision tree techniques which are capable of solving timetabling and general constraint
satisfaction problems. In most of colleges and universities still static time tables are made for
examination or for regular classes taken by teacher. Some of the problems which are being faced
nowadays that a human tendency makes a manual time table can create clashes. However a
dynamic approach to automatically make a time table can also lead to clashes in most of the
cases if data base size is increased. The approach needs to be more wide and efficient in terms of
some parameters to avoid clashes and overall accuracy of dynamic time table scheduling. It is
been observed that heuristic approach could lead to clashes in time table hence the decision
maker algorithm used in time scheduling must have been facing some defects. In this work some
algorithmic approach is being followed apart from heuristics.
Our proposal focused on preparation automatic course timetable using computers. Automatic
timetable construction is considered as one of the scheduling issues. This Research contributes a
new automatic course scheduling and timetabling system using the improved k-mean clustering
algorithm and decision tree techniques. The intention of the algorithm to generate a time-table
schedule automatically is satisfied. The algorithm incorporates a number of techniques, aimed to
improve the efficiency of the search operation. It also, addresses the important hard constraint of
clashes between the availability of teachers, time-slots, room booked, classes etc. The non-rigid
soft constraints i.e. optimization objectives for the search operation are also effectively handled.
This system also input from teachers for their preferred subjects and preferred time-slots then
according to preference of teachers the timetable is scheduled. This system only generates the
timetable for those teachers only whose attendance is mark present in the college at time of
timetable scheduling time. It also gives the preference to more experienced teachers than less
experienced teachers. The research aimed at solving the problems encountered in every semester
by finding an automatic system for courses timetable schedules and get high satisfaction degrees
for teachers and student. This system also reduces the execution time by using improved k-mean
clustering algorithm and provides the high precision and high recall values. This system provides
high accurate timetable that satisfies all the hard constraints and try to satisfy maximum soft
constraints too. The developed system will reduce effort and time for the departments workers
who are involved in making these schedules.
6.2 Future Work: We have only tested our system on Computer Science department courses;
it will be interesting to see the performance of our system on some other departments courses.
In our system we used the preferences for teachers. We may achieve better performance by using
students preferences with teachers preferences; this is left as future work.
Given the generality of the algorithm operation, it can further be adapted to more specific
scenarios, e.g. University, examination scheduling and further be enhanced to create railway time
tables. Thus, through the process of automation of the time-table problem, many an-hours of
creating an effective timetable have been reduced eventually.

References
[1] Introduction of timetable: http://www.timetabler.com/
[2] Anirudha Nanda, Manisha P.Pai, and Abhijeet Gole, An Algorithm to Automatically
Generate Schedule for School Lectures Using a Heuristic Approach , International journal of
machine learning and computing, Vol. 2, No.4, August 2012.
[3] SchaerfA., ASurveyofAutomatedTimetabling, ArtificialIntelligence, 1999.
[4] K. Sandu, Automating Class Scheduling Generation in the Context of a University
Timetabling Information System , The University Timetable Problem-2001.
[5] D. de Werra, An Introduction to Timetabling, European Journal of Operational
Research 19(1985)b, 151-162.
[6] Kuldeep Kumar, Sikander, Ramesh Sharma, Kaushal Mehta, Genetic Algorithm
Approach to Automate University Timetable, International Journal of Technology Research
(IJTR) Vol 1,Issue 1,Mar-Apr 2012.
[7] Shu-Chuan Chu, Yi-Tin Chen , Jiun-Huei Ho, Timetable Scheduling Using Particle Swarm
Optimization, IEEE(2006).
[8] Ahamed Hamdi Abu Absa, Dr. Sana'a Wafa Ai-Sayegh, E-learning Timetable Generator
Using Genetic Algorithms.
[9] Alberto Colorni, Marco Dorigo, Vittorio Maniezzo, A Genetic Algorithm to Solve
theTimetable Problem, Centre for Emergent Computing, Napier University, Edinburgh
EH105DT, UK,2000.
[10] E.K.Burke, D.G.Elliman, R.F.Weare, The Automation of the Timetabling Process in
Higher Education.
[11] Dipti Srinivasan, Tian Hou Seow, Jian Xin Xu, Automated Timetable Generation Using
Multiple Context Reasoning for University Modules,IEEE 0-7803-7282-4/02 2002.
[12] Prof. Swapna Borde, Ms. Ekta Shah, Ms. Priti Rawat, Ms. Vinaya Patil, A New
Approach to Generate Time Table, International Journal of Engineering Research and
Applications (IJERA) ISSN: 2248-9622.
[13] Hana Rudova and Keith Murray, University Course Timetabling with Soft Constraints.
[14] Ashish Jain, Dr. Suresh Jain and Dr. P.K. Chande,Formulation of Genetic Algorithm to
Generate Good Quality Course Timetable, International Journal of Innovation,Management
and Technology, Vol. 1,No. 3,August 2010,ISSN: 2010-0248.
[15] Yao-Te Wang, Yu-Hsin Cheng, Ting-Cheng Chang, S.M. Jen,On the Application of
Data Mining Technique and Genetic Algorithm to an Automatic Course Scheduling System
,IEEE 2008,978-1-4244-1674-5/08.
[16] Mrs. Nikita Desai, Preferences of Teachers and Students for Auto Generation of
Sensitive Timetable: A Case Study, Indian Journal of Computer Science and Engineering
(IJCSE), Vol 2, No. 3,Jun-Jul 2011,ISSN: 0976-5166.
[17] Luisa Carpente,Ana Cerdeira-Pena, Guillermo de Bernardo, Diego Seco, An Integrated
System for School Timetable.
[18] Ho Sheau Fen Irene, Safaai Deris, Siti Zaiton Mohd Hashim, Incorporating of Constraint
- Based Reasoning into Particle Swarm Optimization for University Timetabling Problem,
ISSR 1(1)2009,1-21.
[19] Ruey- Maw Chen, Hsiao-Fang Shih, Solving University Course Timetabling Problem
using Constriction Particle Swarm Optimization with Local Search, Algorithms(2013), 6,
227-244.
[20] Elizabeth Montero, Maria- Cristina Riff, Leopodlo Altamirano, A PSO Algorithm to
Solve a Real Course + Exam Timetabling Problem, ICSI(2011), 241-248.
[21] Danial Qarouni Fard, Amir Najafi Ardabili, M- Hossein Moeinzadeh, Finding
Feasible timetable using Particle Swarm Optimization, IEEE (2008), 387-391.
[22] Liu Su-hua, Li Lin, Study of course scheduling based on particle swarm optimization ,
IEEE(2011), 1692-1695.
[23] ZHU jinrong, Zhao jianbao, LI Xiaoning, A New Adaptive particle Swarm
Optimization Algorithm, IEEE (2008), 456-458.
[24] R. C. Eberhart, Y. Shi, Comparing Inertia weights and Constriction Factor in Particle
Swarm Optimization, IEEE (2000), 84-88.
[25] Safwan M. Shatnawi, Fawzi Albalooshi, Khaleel Rababa'h,Generating Timetable and
Students schedule based on data mining techniques ,International Journal of Engineering
Research and Applications (IJERA), Vol 2,Issue 4 ,pp. 1638-1644, July-August 2012 ISSN:
2248-9622.
[26] Jiawei Han and Micheline Kamber,Data Mining Concepts and Techniques, Morgan
Koufmann Publishers, Second Edition 2006.
[27] Aldy Gunawan, Kien Ming Ng, Kim Leng Poh, An Improvement Heuristic for the
Timetabling Problem, IJCS(2007), Vol. 1, No. 2, 162-178.
[28] Ibrahim Aljarah, Ayad Salhieh, Hossam Faris, An Automatic Course Scheduling
Approach using Instructors preferences, iJET- Volume 7, Issue 1, March 2012.
[29] Safwan M.Shatnawi, Fawzi Albalooshi, Khaleel Rababah Generating Timetable and
student Schedule based on data mining techniques, International Journal of Engineering
Research and Applications, Vol.2, Issue 4, July- August 2012, pp.1638-1644.
[30] Chunfei Zhang, Zhiyi Fang,An Improved K-mean Clustering Algorithm Journal of
Information & Computational Science 10:1 (2013) 193-199.
[31] Shailendra singh Raghuwanshi, PremNarayan Arya, Comparsion of K-means and
Modified K-mean algorithms for Large Data-set,International journal of Computing,
Communications and networking, Volume 1, No.3, November- December 2012.

Time Table Scheduling in Data Mining

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Time Table Scheduling in Data Mining

Diunggah oleh

Hak Cipta:

Format Tersedia

Chapter 1

represents the squares error of the cluster c

Anda mungkin juga menyukai