Networks"
1.Intoduction
1.1 Data Mining:
Data mining is the process of automatically discovering
useful information in large data repositories. Data mining
techniques are deployed to scour large databases in order to
find novel and useful patterns that might otherwise remain
unknown.
They also provide capabilities to predict the outcome of the
future
software
are
available:
statistical,
machine
affinities.
-Associations:
Data
can
be
mined
to
identify
being
purchased
based
on
consumer's
Artificial
models
neural
that
learn
networks:
through
3
Non-linear
training
and
predictive
resemble
CHAID
are
decision
tree
techniques
used
for
Data
visualization:
The
visual
interpretation
of
Noisy,
incomplete
data:
Imprecise
data
is
the
Complex
data
structure:
conventional
statistical
those
optimal
control
parameters
are
used
to
1.7.1 Local
were
traditionally
considered
desktop,
of
this
with
microphones
and
webcams
might
explaining
the
patterns
observed
in
these
networks
and
the
analysis
11
of
them
is
an
early
structural
theories
in
sociology
the
These
1930s
to
approaches
study
were
interpersonal
mathematically
type
of
information
that
is
intentionally
or
from
the
ambiguity
caused
by
synonyms
or
mining, anomaly
is
the
identification
items,
events
or
aggregated
appropriately.
Instead,
a cluster
of
the
data
set. Supervised
anomaly
(the
key
difference
to
many
other statistical
detection). Semi-supervised
anomaly
2.Literature Survey
2.1 Detection and Tracking Pilot Study,
AUTHORS: J. Allan et al
Topic Detection and Tracking (TDT) is a DARPA-sponsored initiative to
investigate the state of the art in finding and following new events in a stream
of broadcast news stories. The TDT problem consists of three major tasks: (1)
segmenting a stream of data, especially recognized speech, into distinct
stories; (2) identifying those news stories that are the first to discuss a new
event occurring in the news; and (3) given a small number of sample news
stories about an event, finding all following stories in the stream. The TDT
Pilot Study ran from September 1996 through October 1997. The primary
participants were DARPA, Carnegie Mellon University, Dragon Systems, and
the University of Massachusetts at Amherst. This report summarizes the
findings of the pilot study. The TDT work continues in a new project
involving larger training and test corpora, more active participants, and a
more broadly defined notion of "topic" than was used in the pilot study.
2.2 Bursty and Hierarchical Structure in Streams:
15
AUTHORS: J. Kleinberg
A fundamental problem in text data mining is to extract meaningful structure
from document streams that arrive continuously over time. E-mail and news
articles are two natural examples of such streams, each characterized by
topics that appear, grow in intensity for a period of time, and then fade away.
The published literature in a particular research field can be seen to exhibit
similar phenomena over a much longer time scale. Underlying much of the
text mining work in this area is the following intuitive premise --- that the
appearance of a topic in a document stream is signaled by a "burst of
activity," with certain features rising sharply in frequency as the topic
emerges. The goal of the present work is to develop a formal approach for
modeling such "bursts," in such a way that they can be robustly and
efficiently identified, and can provide an organizational framework for
analyzing the underlying content. The approach is based on modeling the
stream using an infinite-state automaton, in which bursts appear naturally as
state transitions; in some ways, it can be viewed as drawing an analogy with
models from queuing theory for bursty network traffic. The resulting
algorithms are highly efficient, and yield a nested representation of the set of
bursts that imposes a hierarchical structure on the overall stream. Experiments
with e-mail and research paper archives suggest that the resulting structures
have a natural meaning in terms of the content that gave rise to them.
2.3. Real-Time Change-Point Detection Using Sequentially Discounting
Normalized Maximum Likelihood Coding,
AUTHORS: Y. Urabe, K. Yamanishi, R. Tomioka, and H. Iwai
We are concerned with the issue of real-time change-point detection in time
series. This technology has recently received vast attentions in the area of data
mining since it can be applied to a wide variety of important risk management
16
called stochastic complexity, and approximated by BIC. This holds when the
regressor (design) matrix is non-random or determined by the observed data
as in AR models. The advantages of the criterion include the fact that it can be
evaluated efficiently and exactly, without asymptotic approximations, and
importantly, there are no adjustable hyper-parameters, which makes it
applicable to both small and large amounts of data.
18
3.System Study
3.1 Feasibility Study:
A feasibility study, also known as feasibility analysis, is
an analysis of the viability of an idea. It describes a
preliminary study undertaken to determine and document a
projects viability. The results of this analysis are used in
making the decision whether to proceed with the project or
not. This analytical tool used during the project planning
phrase shows how a business would operate under a set of
assumption, such as the technology used, the facilities and
equipment, the capital needs, and other financial aspects.
The study is the first time in a project development process
that show whether the project create a technical and
economically feasible concept. As the study requires a strong
financial
and
technical background,
outside
consultants
20
investment
decision
as
the
proper
investment
being placed on the client. The developed system must have a modest
requirement, as only minimal or null changes are required for implementing
this system.
3.1.3 Social Feasibility:
Social impact analysis / social feasibility
Social Impact Assessment (SIA) is
provides
analyzing,
framework
and
for
process
prioritizing,
incorporating
that
gathering,
on
social
macro-systems
(eg,
national
and
the
SIA
should
an
23
Integral
part
of
other
24
typically 6 months to 2 years long. Each phase starts with a design goal and
ends with a client reviewing the progress thus far. Analysis and engineering
efforts are applied at each phase of the project, with an eye toward the end
goal of the project.
The steps for Spiral Model can be generalized as
follows:
The new system requirements are defined in as much details
as possible. This usually involves interviewing a number of
users representing all the external or internal users and other
aspects of the existing system.
A preliminary design is created for the new system.
A first prototype of the new system is constructed from the
preliminary design. This is usually a scaled-down system, and
represents an approximation of the characteristics of the final
product.
A second prototype is evolved by a fourfold procedure:
1) Evaluating the first prototype in terms of its strengths,
weakness
2) Defining the requirements of the second prototype.
3) Planning a designing the second prototype.
4) Constructing and testing the second prototype.
At the customer option, the entire project can be aborted if
the risk is deemed too great. Risk factors might involved
development cost overruns, operating-cost miscalculation, or
any other factor that could, in the customers judgment,
result in a less-than-satisfactory final product.
26
27
Moreover, it cannot be applied when the contents of the messages are mostly
nontextual information. On the other hand, the words formed by mentions
are unique, require little preprocessing to obtain (the information is often
separated from the contents), and are available regardless of the nature of the
contents.
4.3 Proposed System:
In this paper, we have proposed a new approach to detect the mergence
of topics in a social network stream.
The basic idea of our approach is to focus on the social aspect of the
posts reflected in the mentioning behavior of users instead of the
textual contents.
We have proposed a probability model that captures both the number of
mentions per post and the frequency of mentioned.
4.3.1 Advantages of Proposed System:
The proposed method does not rely on the textual contents of social
network posts, it is robust to rephrasing and it can be applied to the case
where topics are concerned with information other than texts, such as
images, video, audio, and so on.
The proposed link-anomaly-based methods performed even better than
the keyword-based methods on NASA and BBC data sets
4.4 Developers Responsibilities Overview:
The developer is responsible for:
Developing the system, which meets the SRS and
solving all the requirements of the system?
29
Functional
specifications
or
Functional
Requirement
How
the
system
meets
applicable
regulatory
requirements ?
The functional specification is designed to be read by a
general audience. Readers should understand the system, but
no particular technical knowledge
should be required to understand the document.
4.5.1 Examples of Functional Requirements:
Functional requirements should include functions performed by pecific
screens, outlines of work-flows performed by the system and other business
or compliance requirements the system must meet.
4.5.2 Interface Requirements:
Field accepts numeric data entry.
Field only accepts dates before the current date.
Screen can print on-screen data to the printer.
48
31
spreadsheet
can
secure
data
with
electronic
signatures.
4.5.5 Security Requirements:
Member of the Data Entry group can enter requests but
not approve
or delete requests.
Members of the Managers group can enter or approve a
request, but
not delete requests.
Members of the Administrators group cannot enter or
approve
requests, but can delete requests.
The functional specification describes what the system
must do; how the system does it is described in the Design
Specification. If a User Requirement Specification was
written, all requirements outlined in the user requirement
specification
should
be
addressed
requirements.
32
in
the
functional
Hard Disk
40 GB.
Floppy Drive
1.44 Mb.
Monitor
15 VGA Colour.
Mouse
Logitech.
Ram
512 Mb.
4.8 Software Requirements :
Operating system :
Windows XP/7.
Coding Language :
JAVA/J2EE
IDE
Netbeans 7.4
Database
MYSQL
+5.System
Systems
the architecture,
design is
Design
the
components,
33
process
modules,
of
defining
interfaces,
a system to
satisfy
specified requirements.
disciplines
of systems
analysis, systems
Twitter Trends
Aggregation
Change
Point
Analysis
Level 2:
Key Based
Detection
36
Change point
Analysis
Burst Analysis
Link Based
Detection
(Figure 5.5 Data Flow Diagram of Change Point)
Level 3:
Key Based Detection
Perform Training
Burst Analysis
39
)
5.9 Activity Diagram :
41
42
43
viewing facilities.
3. When the data is entered it will check for its validity. Data can be entered
with the help of screens. Appropriate messages are provided as when needed
so that the user will not be in maize of instant. Thus the objective of input
design is to create an input layout that is easy to follow
5.12 Output Design :
A quality output is one, which meets the requirements of the end user
and presents the information clearly. In any system results of processing are
communicated to the users and to other system through outputs. In output
design it is determined how the information is to be displaced for immediate
need and also the hard copy output. It is the most important and direct source
information to the user. Efficient and intelligent output design improves the
systems relationship to help user decision-making.
1. Designing computer output should proceed in an organized, well thought
out manner; the right output must be developed while ensuring that each
output element is designed so that people will find the system can use easily
and effectively. When analysis design computer output, they should Identify
the specific output that is needed to meet the requirements.
2. Select methods for presenting information.
3. Create document, report, or other formats that contain information
produced by the system.
The output form of an information system should accomplish one or more of
the following objectives.
Convey information about past activities, current status or projections
of the
45
Future.
Signal important events, opportunities, problems, or warnings.
Trigger an action.
Confirm an action.
5.13Related Work:
Detection and tracking of topics have been studied
extensively in the area of topic detection and tracking
(TDT). In this context, the main task is to either classify a
new document into one of the known topics (tracking) or to
detect that it belongs to none of the known categories.
Subsequently, temporal structure of topics have been
modeled and analyzed through dynamic model selection,
temporal text mining, and factorial hidden Markov models.
Another line of research is concerned with formalizing the
notion of bursts in a stream of documents. In his seminal
paper, Kleinberg modeled bursts using time varying
Poisson process with a hidden discrete process that
controls the firing rate. Recently, He and Parker developed
a physics inspired model of bursts based on the change in
the momentum of topics. All the above mentioned studies
make use of textual content of the documents, but not the
social content of the documents. The social content (links)
have been utilized in the study of citation networks
However, citation networks are often analyzed in a
stationary setting. The novelty of the current paper lies in
focusing on the social content of the documents (posts)
and in combining this with a change-point analysis.
46
5. 14 Proposed Method:
The overall flow of the proposed method is shown in Figure
5.1 We assume that the data arrives from a social network
service in a sequential manner through some API. For each
new post we use samples within the past T time interval for
the corresponding user for training the mention model we
propose below. We assign anomaly score to each post based
on the learned probability distribution. The score is then
aggregated over users and further fed into a change point
analysis.
A. Probability Model
We characterize a post in a social network stream by the
number of mentions k it contains, and the set V of names
(IDs) of the users mentioned in the post. Formally, we
consider the following joint probability distribution
Probability Model
.................
......................................... (1)
Here
the
joint
distribution
consists
of
two
parts:
the
is defined as a geometric
as follows:
........................................................
.......................... (2)
On the other hand, the probability of mentioning users in V is
defined as independent, identical multinomial distribution
with parameters
predictive distribution
..................................................................(3)
First we compute the predictive distribution with respect to
the number of mentions P(k|T ). This can be obtained by
assuming a beta distribution as a prior and integrating out the
parameters
where
and
48
where
49
mentions and
in the
T,t] (we use T = 30 days in this paper). Accordingly the linkanomaly score is defined as follows:
The two terms in the above equation can be computed via the
predictive distribution of the number of mentions (4), and the
predictive distribution of the mentionee (5)(6), respectively
C. Combining Anomaly Scores from Different Users
The anomaly score in (7) is computed for each user
depending on the current post of user u and his/her past
behavior
where xi =
by user
(NML)
code
length
that
can
be
computed
, instead of
:={
) (j=1, 2, );
3. 2nd
layer
learning
Let
be the collection of smoothed changepoint score obtained as above. Sequentially learn the
52
a threshold parameter.
summarized in Algorithm :
Algorithm 1 Dynamic Threshold Optimization (DTO)
Given:
scores,
53
total
number of cells,
estimation parameter,
_H:
size
Initialization: Let
be a uniform distribution.
For j = 1; : : : ;M - 1 do
End for.
6.Implementation
6.1 Modules :
Modules Description:
6.1.1Training
6.1.2 Identify individual Anomaly Score
6.1.3 Aggregate
54
6.1.2 Aggregate:
In this subsection, we describe how to combine the anomaly scores
from different users. The anomaly score is computed for each user depending
on the current post of user u and his/her past behavior T tu. To measure the
general trend of user behavior, we propose to aggregate the anomaly scores
obtained for posts x1;...;x xn using a discretization of window size >0.
55
Java VM, the same program written in the Java programming language can
run on Windows 2000, a Solaris workstation, or on an iMac.
Object
serialization:
Allows
lightweight
persistence
and
62
database, whereas the Accounts Payable data source could refer to an Access
database. The physical database referred to by a data source can reside
anywhere on the LAN.
The ODBC system files are not installed on your system by Windows
95. Rather, they are installed when you setup a separate database application,
such as SQL Server Client or Visual Basic 4.0. When the ODBC icon is
installed in Control Panel, it uses a file called ODBCINST.DLL. It is also
possible to administer your ODBC data sources through a stand-alone
program called ODBCADM.EXE. There is a 16-bit and a 32-bit version of
this program and each maintains a separate list of ODBC data sources.
isnt as efficient as talking directly to the native database interface. ODBC has
had many detractors make the charge that it is too slow. Microsoft has always
claimed that the critical factor in performance is the quality of the driver
software that is used. In our humble opinion, this is true. The availability of
good ODBC drivers has improved a great deal recently. And anyway, the
criticism about performance is somewhat analogous to those who said that
compilers would never match the speed of pure assembly language. Maybe
not, but the compiler (or ODBC) gives you the opportunity to write cleaner
programs, which means you finish sooner. Meanwhile, computers get faster
every year.
6.8 JDBC:
In an effort to set an independent database standard API for Java; Sun
Microsystems developed Java Database Connectivity, or JDBC. JDBC offers
a generic SQL database access mechanism that provides a consistent interface
to a variety of RDBMSs. This consistent interface is achieved through the use
of plug-in database connectivity modules, or drivers. If a database vendor
wishes to have JDBC support, he or she must provide the driver for each
platform that the database and Java run on.
To gain a wider acceptance of JDBC, Sun based JDBCs framework on
ODBC. As you discovered earlier in this chapter, ODBC has widespread
support on a variety of platforms. Basing JDBC on ODBC will allow vendors
to bring JDBC drivers to market much faster than developing a completely
new connectivity solution.
JDBC was announced in March of 1996. It was released for a 90 day
66
public review that ended June 8, 1996. Because of user input, the final JDBC
v1.0 specification was released soon after.
The remainder of this section will cover enough information about JDBC for
you to know what it is about and how to use it effectively. This is by no
means a complete overview of JDBC. That would fill an entire book.
are
simple
SELECTs,
INSERTs,
DELETEs
and
Simple
Architecture-neutral
Object-oriented
Portable
Distributed
High-performance
Interpreted
multithreaded
Robust
Dynamic
Secure
Java is also unusual in that each Java program is both compiled and
interpreted. With a compile you translate a Java program into an
intermediate language called Java byte codes the platform-independent
code instruction is passed and run on the computer.
Compilation happens just once; interpretation occurs each time the
program is executed. The figure illustrates how this works.
Interpreter
Java Program
Compilers
My Program
69
You can think of Java byte codes as the machine code instructions for
the Java Virtual Machine (Java VM). Every Java interpreter, whether its a
Java development tool or a Web browser that can run Java applets, is an
implementation of the Java VM. The Java VM can also be implemented in
hardware.
Java byte codes help make write once, run anywhere possible. You
can compile your Java program into byte codes on my platform that has a
Java compiler. The byte codes can then be run any implementation of the
Java VM. For example, the same Java program can run Windows NT,
Solaris, and Macintosh.
6.10 Networking:
6.10.1 TCP/IP stack:
70
6.10.2 IP datagrams:
The IP layer provides a connectionless and unreliable delivery
system. It considers each datagram independently of the others. Any
association between datagram must be supplied by the higher layers. The IP
layer supplies a checksum that includes its own header. The header includes
the source and destination addresses. The IP layer handles routing through
an Internet. It is also responsible for breaking up large datagram into
smaller ones for transmission and reassembling them at the other end.
6.10.3 UDP:
UDP is also connectionless and unreliable. What it adds to IP is a
checksum for the contents of the datagram and port numbers. These are
used to give a client/server model - see later.
71
6.10.4 TCP:
TCP supplies logic to give a reliable connection-oriented protocol above
IP. It provides a virtual circuit that two processes can use to communicate.
6.10. 5 Internet addresses:
In order to use a service, you must be able to find it. The Internet uses
an address scheme for machines so that they can be located. The address is
a 32 bit integer which gives the IP address. This encodes a network ID and
more addressing. The network ID falls into various classes according to the
size of the network address.
6.10.6 Network address:
Class A uses 8 bits for the network address with 24 bits left over for
other addressing. Class B uses 16 bit network addressing. Class C uses 24
bit network addressing and class D uses all 32.
6.10.7 Subnet address:
Internally, the UNIX network is divided into sub networks. Building 11
is currently on one sub network and uses 10-bit addressing, allowing 1024
different hosts.
6.1o.8 Host address:
8 bits are finally used for host addresses within our subnet. This places a
limit of 256 machines that can be on the subnet.
72
6.10.11 Sockets:
A socket is a data structure maintained by the system to handle network
connections. A socket is created using the call socket. It returns an
integer that is like a file descriptor. In fact, under Windows, this handle can
73
74
1. Map Visualizations:
Charts showing values that relate to geographical areas. Some
examples include: (a) population density in each state of the United
States, (b) income per capita for each country in Europe, (c) life
expectancy in each country of the world. The tasks in this project include:
Sourcing freely redistributable vector outlines for the countries of
the world, states/provinces in particular countries (USA in particular, but
also other areas);
Creating
an
appropriate
dataset
interface
(plus
default
developing,
deploying,
and
managing
multi-tier,
server-centric
applications. Java EE builds upon the Java SE platform and provides a set of
APIs (application programming interfaces) for developing and running
portable, robust, scalable, reliable and secure server-side applications.
Some of the fundamental components of Java EE include:
Enterprise JavaBeans (EJB): a managed, server-side component
architecture used to encapsulate the business logic of an application.
EJB technology enables rapid and simplified development of
distributed, transactional, secure and portable applications based on
76
Java technology.
Java Persistence API (JPA): a framework that allows developers to
manage data using object-relational mapping (ORM) in applications
built on the Java Platform.
MIME Type or Content Type: If you see above sample HTTP response
header, it contains tag Content-Type. Its also called MIME type and server
sends it to client to let them know the kind of data its sending. It helps client
78
in rendering the data for user. Some of the mostly used mime types are
text/html, text/xml, application/xml etc.
http://localhost:8080/FirstServletProject/jsps/hello.jsp
http:// This is the first part of URL and provides the communication
protocol to be used in server-client communication.
local host The unique address of the server, most of the times its the
hostname of the server that maps to unique IP address. Sometimes multiple
hostnames point to same IP addresses and web server virtual host takes care
of sending request to the particular server instance.
8080 This is the port on which server is listening, its optional and if we
dont provide it in URL then request goes to the default port of the protocol.
Port numbers 0 to 1023 are reserved ports for well known services, for
example 80 for HTTP, 443 for HTTPS, 21 for FTP etc.
79
to client.
Some of the important work done by web container are:
Communication Support Container provides easy way of
communication between web server and the servlets and JSPs. Because
of container, we dont need to build a server socket to listen for any
request from web server, parse the request and generate response. All
these important and complex tasks are done by container and all we
need to focus is on our business logic for our applications.
Lifecycle and Resource Management Container takes care of
managing the life cycle of servlet. Container takes care of loading the
servlets into memory, initializing servlets, invoking servlet methods
and destroying them. Container also provides utility like JNDI for
resource pooling and management.
Multithreading Support Container creates new thread for every
request to the servlet and when its processed the thread dies. So
servlets are not initialized for each request and saves time and memory.
JSP Support JSPs doesnt look like normal java classes and web
container provides support for JSP. Every JSP in the application is
compiled by container and converted to Servlet and then container
manages them like other servlets.
Miscellaneous Task Web container manages the resource pool, does
memory optimizations, run garbage collector, provides security
configurations, support for multiple applications, hot deployment and
several
other
tasks
81
6.21 MySQL:
MySQL, the most popular Open Source SQL database management
system, is developed, distributed, and supported by Oracle Corporation.
82
the
GPL
(GNU
General
Public
License),
Overview
for
more
(http://www.mysql.com/company/legal/licensing/).
84
information
The MySQL Database Server is very fast, reliable, scalable, and easy
to use.
If that is what you are looking for, you should give it a try. MySQL
Server can run comfortably on a desktop or laptop, alongside your
other applications, web servers, and so on, requiring little or no
attention. If you dedicate an entire machine to MySQL, you can adjust
the settings to take advantage of all the memory, CPU power, and I/O
capacity available. MySQL can also scale up to clusters of machines,
networked together.
You can find a performance comparison of MySQL Server with other
database managers on our benchmark page.
import javax.swing.JButton;
import javax.swing.JComboBox;
import javax.swing.JFrame;
import javax.swing.JPanel;
import javax.swing.JRadioButton;
import javax.swing.JScrollPane;
import javax.swing.JTabbedPane;
import javax.swing.JTable;
import javax.swing.UIManager;
import javax.swing.table.DefaultTableModel;
import org.jfree.ui.RefineryUtilities;
import org.jvnet.substance.SubstanceLookAndFeel;
import com.mycompany.logic.TwitterRestCall;
public class MainForm implements ActionListener {
private JButton jbuttonTrends, jbuttonTraining,
jbuttonAggregation,
jButtonChangePoint, jbuttonBurstAnalysis;
private TwitterRestCall twitterRestCall;
public static DefaultTableModel defaultTableModel,
87
defaultTableModelUser,
defaultTableTime;
private JTable jTable, jTableUser, jTableTime;
private JScrollPane jScrollPane, jScrollPaneUser,
jScroPaneTime;
public static JComboBox<String> jComboBox;
public static JRadioButton jRadioButtonkey;
public static JRadioButton jRadioButtonLink;
static {
try {
SubstanceLookAndFeel
.setCurrentWatermark("org.jvnet.su
bstance.watermark.SubstanceBinaryWatermark");
SubstanceLookAndFeel
.setCurrentTheme("org.jvnet.substa
nce.theme.SubstanceInvertedTheme");
SubstanceLookAndFeel
.setCurrentGradientPainter("org.jvne
t.substance.painter.SpecularGradientPainter");
SubstanceLookAndFeel
.setCurrentButtonShaper("org.jvnet.
substance.button.ClassicButtonShaper");
UIManager.setLookAndFeel(new
88
SubstanceLookAndFeel());
} catch (Exception e) {
// TODO: handle exception
e.printStackTrace();
}
}
public void design() {
twitterRestCall = new TwitterRestCall();
JFrame frame = new JFrame("Discovering Emerging
Topics");
// frame.setLayout(null);
frame.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
JTabbedPane tab = new JTabbedPane();
frame.add(tab, BorderLayout.CENTER);
frame.add(createPanelLayout());
Dimension screenSize =
Toolkit.getDefaultToolkit().getScreenSize();
Dimension size = frame.getSize();
89
screenSize.height = screenSize.height / 2;
screenSize.width = screenSize.width / 2;
size.height = size.height / 2;
size.width = size.width / 2;
int y = screenSize.height - size.height;
int x = screenSize.width - size.width;
// frame.setLocation(x, y);
frame.setSize(1000, 800);
frame.setVisible(true);
}
public JPanel createPanelLayout() {
JPanel jPanel = new JPanel();
jPanel.setLayout(null);
jComboBox = new JComboBox<String>();
jComboBox.setBounds(10, 30, 150, 30);
jPanel.add(jComboBox);
Analysis");
jbuttonBurstAnalysis.setBounds(270, 230, 160, 30);
jbuttonBurstAnalysis.addActionListener(this);
jPanel.add(jbuttonBurstAnalysis);
jPanel.add(jbuttonTrends);
jPanel.add(jbuttonTraining);
jPanel.add(jbuttonAggregation);
jPanel.add(jButtonChangePoint);
defaultTableModel = new DefaultTableModel();
defaultTableModel.addColumn("Trends Name");
defaultTableModel.addColumn("URI");
defaultTableModel.addColumn("In Time");
defaultTableModel.addColumn("Out Time");
defaultTableModel.addColumn("Retweet Count");
jTable = new JTable(defaultTableModel);
jTable.getColumnModel().getColumn(1).setPreferredWidth(250
);
jScrollPane = new JScrollPane(jTable);
jScrollPane.setBounds(10, 470, 850, 200);
92
jPanel.add(jRadioButtonkey);
jPanel.add(jRadioButtonLink);
93
jPanel.add(jScrollPane);
jPanel.add(jScrollPaneUser);
jPanel.add(jScroPaneTime);
return jPanel;
}
public static void main(String[] args) {
MainForm mainForm = new MainForm();
mainForm.design();
}
@Override
public void actionPerformed(ActionEvent e) {
// TODO Auto-generated method stub
if (e.getSource() == jbuttonTrends) {
twitterRestCall.getTrends();
} else if (e.getSource() == jRadioButtonLink) {
jComboBox.removeAllItems();
defaultTableModel.setRowCount(0);
defaultTableModelUser.setRowCount(0);
jRadioButtonLink.setSelected(true);
jRadioButtonkey.setSelected(false);
94
defaultTableModel.getValueAt(i, 2).toString();
String outTime =
defaultTableModel.getValueAt(i, 3)
.toString();
long inMilli =
convertStringtoDate(inTime);
long outMilli =
convertStringtoDate(outTime);
long diff = (outMilli - inMilli);
Vector<String> rowData = new
Vector<String>();
rowData.add(String.valueOf(inMilli));
rowData.add(String.valueOf(outMilli));
rowData.add(String.valueOf(diff));
if (jRadioButtonLink.isSelected()) {
String retweet_count =
defaultTableModel.getValueAt(i,
4).toString();
rowData.add(retweet_count);
} else {
96
rowData.add(String.valueOf(100));
}
defaultTableTime.addRow(rowData);
}
}
} else if (e.getSource() == jbuttonBurstAnalysis) {
try
{
BarChart barChart=new BarChart("Burst
Detection");
barChart.pack();
RefineryUtilities.centerFrameOnScreen(barChart);
barChart.setVisible(true);
}
catch(Exception ee)
{
ee.printStackTrace();
}
97
}
}
} else if (getInMonth.contains("JUN")) {
month = Calendar.JUNE;
} else if (getInMonth.contains("JUL")) {
month = Calendar.JULY;
} else if (getInMonth.contains("AUG")) {
month = Calendar.AUGUST;
} else if (getInMonth.contains("SEP")) {
month = Calendar.SEPTEMBER;
} else if (getInMonth.contains("OCT")) {
month = Calendar.OCTOBER;
} else if (getInMonth.contains("NOV")) {
month = Calendar.NOVEMBER;
} else if (getInMonth.contains("DEC")) {
month = Calendar.DECEMBER;
}
String timeStamp = inSplitter[3];
calendar.set(2014, month, date);
int hour = Integer.parseInt(timeStamp.split(":")[0]);
int minutes = Integer.parseInt(timeStamp.split(":")
[1]);
99
100
7. System Testing
The purpose of testing is to discover errors. Testing is
the process of trying to discover every conceivable fault or
weakness in a work product. It provides a way to check the
functionality of components, sub assemblies, assemblies
and/or a finished product It is the process of exercising
software with the intent of ensuring that the Software system
meets its requirements and user expectations and does not
fail in an unacceptable manner. There are various types of
test. Each test type addresses a specific testing requirement.
of
components
is
correct
and
consistent.
exposing the
accepted.
Invalid Input
be rejected.
Functions
Output
must be exercised.
102
flows;
data
fields,
predefined
processes,
and
is
based
on
process
descriptions
and
flows,
104
Integration Testing:
Software
integration
testing
is
the
incremental
8.1.2 Perform Training for key based detection which will retrieves user
106
108
8.1.4 Change point analysis to check time taken to reach 100 tweets:
111
8.1.7 Result analysis between link based detection and key based
detection:
112
8.2 Conclusion:
In this paper, we have proposed a new approach to detect the emergence of
topics in a social network stream. The basic idea of our approach is to focus
on the social aspect of the posts reflected in the mentioning behavior of users
instead of the textual contents. We have proposed a probability model that
captures both the number of mentions per post and the frequency of
mentionee. We have combined the proposed mention model with the SDNML
change-point detection algorithm [3] and Kleinbergs burst-detection model
[2] to pinpoint the emergence of a topic. Since the proposed method does not
rely on the textual contents of social network posts, it is robust to rephrasing
and it can be applied to the case where topics are concerned with information
other than texts, such as images, video, audio, and so on. We have applied the
proposed approach to four real data sets that we have collected from Twitter.
The four data sets included a wide-spread discussion about a controversial
topic (Job hunting data set), a quick propagation of news about a video
leaked on Youtube (Youtube data set), a rumor about the upcoming press
conference by NASA (NASA data set), and an angry response to a foreign
TV show (BBC data set). In all the data sets, our proposed approach
showed promising performance. In three out of four data sets, the detection by
the proposed link-anomalybased methods was earlier than the text-anomalybased counterparts. Furthermore, for NASA and BBC data sets, in which
the keyword that defines the topic is more ambiguous than the first two data
sets, the proposed link-anomaly-based approaches have detected the
emergence of the topics even earlier than the keyword-based approaches that
use hand-chosen keywords. All the analysis presented in this paper was
conducted offline, but the framework itself can be applied online. We are
planning to scale up the proposed approach to handle social streams in real
113
114
REFERENCES
[1] J. Allan et al., Topic Detection and Tracking Pilot Study: Final Report,
Proc. DARPA Broadcast News Transcription and Understanding
Workshop, 1998.
[2] J. Kleinberg, Bursty and Hierarchical Structure in Streams, Data Mining
Knowledge Discovery, vol. 7, no. 4, pp. 373-397, 2003.
[3] Y. Urabe, K. Yamanishi, R. Tomioka, and H. Iwai, Real-Time ChangePoint Detection Using Sequentially Discounting Normalized Maximum
Likelihood Coding, Proc. 15th Pacific-Asia Conf. Advances in
Knowledge Discovery and Data Mining (PAKDD 11), 2011.
[4] S. Morinaga and K. Yamanishi, Tracking Dynamics of Topic Trends
Using a Finite Mixture Model, Proc. 10th ACM SIGKDD Intl Conf.
Knowledge Discovery and Data Mining, pp. 811-816, 2004.
[5] Q. Mei and C. Zhai, Discovering Evolutionary Theme Patterns from
Text: An Exploration of Temporal Text Mining, Proc. 11th ACM
SIGKDD Intl Conf. Knowledge Discovery in Data Mining, pp. 198207, 2005.
[6] A. Krause, J. Leskovec, and C. Guestrin, Data Association for Topic
Intensity Tracking, Proc. 23rd Intl Conf. Machine Learning (ICML
06), pp. 497-504, 2006.
[7] D. He and D.S. Parker, Topic Dynamics: An Alternative Model of Bursts
in Streams of Topics, Proc. 16th ACM SIGKDD Intl Conf.
Knowledge Discovery and Data Mining, pp. 443-452, 2010.
115
Indexes
117
Law enforcement, 7
Local, 7
Manufacturing, 6
method, 4, 14, 26, 30, 34, 43,
48, 50, 51, 57, 62, 71, 73, 85, 103
networks, 2, 3, 7, 9, 10, 42, 65,
96, 97, 98
Optimization, 3, 48
patterns, 1, 2, 6, 7, 10, 12, 16,
35
Proposed System, 25
quantities, 4
Regulatory/Compliance
Requirements, 28
SDLC METHDOLOGIES, 22
Social Feasibility, 18, 19
Social impact analysis, 19
SPIRAL MODEL, 22
Technical Feasibility, 18, 19
visualization, 4
Wide Area Network, 7, 9
Existing System, 25
Advantages, 5, 26
algorithms, 3, 14
Artificial, 3
Associations, 2
Business Requirements, 28
categorize, 2
Clusters, 2
Data, 1, 2, 3, 4, 5, 6, 7, 25, 28,
31, 32, 33, 41, 56, 105, 106, 107
data mining, 1, 2, 4, 5, 6, 10,
11, 13, 14
Decision, 3
Disadvantages, 25
Economical Feasibility, 18
Finance, 6
Functional Requirements, 27, 28
Genetic, 3
induction, 4
Interface Requirements, 28
118