Anda di halaman 1dari 11

IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. X, NO.

X, MONTH 2015 1

Statistical Features Based Real-time Detection of


Drifted Twitter Spam
Chao Chen, Yu Wang, Jun Zhang,Yang Xiang,Wanlei Zhou, and Geyong Min

AbstractTwitter Spam has become a critical problem nowa- a new spam link before it is blocked by blacklists [32]. In
days. Recent works focus on applying machine learning tech- order to address the limitation of blacklists, researchers have
niques for Twitter spam detection, which make use of the proposed some machine learning based schemes which can
statistical features of tweets. In our labelled tweets dataset,
however, we observe that the statistical properties of spam tweets make use of spammers or spam tweets statistical features to
vary over time, and thus the performance of existing machine detect spam without checking the URLs [12], [35].
learning based classifiers decreases. This issue is referred to Machine Learning (ML) based detection schemes involve
as Twitter Spam Drift. In order to tackle this problem, we several steps. First, statistical features, which can differentiate
firstly carry out a deep analysis on the statistical features of one spam from non-spam, are extracted from tweets or Twitter
million spam tweets and one million non-spam tweets, and then
propose a novel Lfun scheme. The proposed scheme can discover users (such as account age, number of followers or friends and
changed spam tweets from unlabelled tweets and incorporate number of characters in a tweet). Then a small set of samples
them into classifiers training process. A number of experiments are labelled with class, i.e. spam or non-spam, as training data.
are performed to evaluate the proposed scheme. The results show After that, machine learning based classifiers are trained by the
that our proposed Lfun scheme can significantly improve the labelled samples, and finally the trained classifiers can be used
spam detection accuracy in real-world scenarios.
to detect spam. A number of ML based detection schemes have
Index TermsOnline Social Networks Security, Twitter Spam, been proposed by researchers [1], [30], [34], [38].
Machine Learning. However, the observation in our collected data set shows
that the characteristics of spam tweets are varying over time.
I. I NTRODUCTION We refer to this issue as Twitter Spam Drift. As previous
ML based classifiers are not updated with the changed spam
T WITTER has become one of the most popular social
networks in the last decade. It is rated as the most popular
social network among teenagers according to a recent report
tweets, the performance of such classifiers are dramatically
influenced by Spam Drift when detecting new coming spam
[14]. However, the exponential growth of Twitter also con- tweets. Why do spam tweets drift over time? It is because
tributes to the increase of spamming activities. Twitter spam, that spammers are struggling with security companies and
which is referred to as unsolicited tweets containing malicious researchers. While researchers are working to detect spam,
link that directs victims to external sites containing malware spammers are also trying to avoid being detected. This leads
downloads, phishing, drug sales, or scams, etc. [1], not only spammers to evade current detection features through posting
interferes user experiences, but also damages the whole In- more tweets or creating spam with the similar semantic
ternet. In September 2014, the Internet of New Zealand was meaning but using different text [29], [34].
melt down due to the spread of malware downloading spam. In this work, we firstly illustrate the Twitter spam drift
This kind of spam lured users to click links which claimed to problem through analysing the statistical properties of Twitter
contain Hollywood star photos, but in fact directed users to spam in our collected dataset and then its impact on detection
download malware to perform DDoS attacks [24]. performance of several classifiers. By observing that there are
Consequently, security companies, as well as Twitter itself, changed spam samples in the coming tweets, we propose
are combating spammers to make Twitter as a spam-free a novel Lfun (Learning from unlabelled tweets) approach,
platform. For example, Trend Micro uses a blacklisting service which updates classifiers with the spam samples from the
called Web Reputation Technology system to filter spam unlabelled incoming tweets. In summary, our contributions are
URLs for users who have its products installed [22]. Twitter listed below:
also implements blacklist filtering as a component in their We collect and label a real-world dataset, which contains

detection system called BotMaker [17]. However, blacklist 10 consecutive days tweets with 100k spam tweets and
fails to protect victims from new spam due to its time lag 100k non-spam tweets in each day (2 million tweets in
[15]. Research shows that, more than 90% victims may visit total). This dataset is available for researchers to study
Twitter spam.
C. Chen is with University of Electronic Science and Technology of China We investigate the Twitter Spam Drift problem from
and the School of Information Technology, Deakin University, Australia e-
mail: chao.chen@deakin.edu.au. both data analysis and experimental evaluation aspects.
Y. Wang, J. Zhang, Y. Xiang and W. Zhou is with the School of Infor- To the best of our knowledge, we are the first to study
mation Technology, Deakin University, Australia e-mail: {y.wang, jun.zhang, this problem in Twitter spam detection.
yang.xiang, wanlei.zhou}@deakin.edu.au.
G. Min is with University of Electronic Science and Technology of China We propose a novel Lfun approach which learns from
and University of Exeter, UK e-mail: g.min@exeter.ac.uk. unlabelled tweets to deal with Twitter Spam Drift.
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. X, NO. X, MONTH 2015 2

Through our evaluations, we show that our proposed works [1], [20], [30], [33], their feature set can outperform all
Lfun can effectively detect Twitter spam by reducing the the previous works.
impact of Spam Drift issue. Instead, [31] and [21] solely relied on the embedded URLs
The rest of this paper is organized as follows. Section II in tweets to detect spam. A number of URL based features
presents a review on machine learning based methods for Twit- were used by [31], such as the domain tokens, path tokens and
ter spam detection. In Section III, the collection and labelling query parameters of the URL, along with some features from
of the data used in our work is introduced. Meanwhile, the the landing page, DNS information, and domain information.
Spam Drift problem is illustrated and justified. Then we In [21], the authors studied the characteristics of Correlated
introduce our Lfun approach in Section IV, and analyse the URL Redirect Chains, and further collected relevant features,
performance benefit of our approach. Section V evaluates our like URL redirect chain length, Relative number of different
Lfun approach and compares it with four traditional machine initial URLs etc. These features also showed their discrimina-
learning algorithms. Finally, Section VII concludes this work tive power when used for classifying spam.
and introduces our future work. However, all the above mentioned works do not consider the
Spam Drift problem. Their detection accuracy will decrease
as time goes on, since spammers are changing strategies to
II. R ELATED W ORK
avoid being detected. Egele et al. proposed a historical model
Due to the increasing popularity of Twitter, spammers have based spam detection scheme, whose detection accuracy would
transferred from other platforms, such as email and blog, to not be affected by Spam Drift [10]. They built several
Twitter. To make Twitter as a clean social platform, security models, like Language model and Posting Time model, for
companies and researchers are working hard to eliminate each user. Once the model behaved abnormally, there might
spam. Security companies, such as Trend Micro [22], mainly be a compromise of this account, and this account was likely
rely on blacklists to filter spam links. However, blacklists to be used to spread spam by attackers. This method can
fail to protect users on time due to the time lag. To avoid only detect whether an account was compromised or not, but
the limitation of blacklists, some early works proposed by cannot identify the spamming accounts which were created by
researchers use heuristic rules to filter Twitter spam. [36] spammers fraudulently.
used a simple algorithm to detect spam in #robotpickupline Different to related works, we are going to thoroughly
(the hashtag was created by themselves) through these three study the Spam Drift problem. In addition, we will propose
rules: suspicious URL searching, username pattern matching an innovative scheme called Lfun which learns from the
and keyword detection. [19] simply removed all the tweets unlabelled tweets and can tackle this issue in identifying
which contained more than three hashtags to filter spam in Twitter spam. Thus, our work can make great contributions
their dataset to eliminate the impact of spam for their research. to the research area of Twitter spam detection.
Later on, some works applied machine learning algorithms
for Twitter spam detection. [1], [20], [30], [33] made use III. P ROBLEM OF T WITTER S PAM D RIFT
of account and content based features, such as account age,
the number of followers/followings, the length of tweet, etc. A. 10-day groundtruth
to distinguish spammers and non-spammers. Wang et al. A labelled dataset is important for classification tasks, such
proposed a Bayesian classifier based approach to detect spam- as Twitter spam detection. In this work, we used Twitters
mers on Twitter [33], while Benevenuto et al. detected both Streaming API to collect tweets with URLs in a period of 10
spammers and spam by using Support Vector Machine [1]. In consecutive days. While it is possible to send spam without
[30], Stringhini et al. trained a Random Forest classifier, and embedding URLs on Twitter, the majority of spam contains
used the classifier to detect spam from three social networks, URLs [10]. We have inspected hundreds of spam tweets by
Twitter, Facebook and MySpace. Lee et al. deployed some hand and only find a few tweets without URLs which could
honeypots to get spammers profiles, and extracted the statis- be considered as spam. In addition, spammers mainly use
tical features for spam detection with several ML algorithms, embedded URLs to make it more convenient to direct victims
such as Decorate, RandomSubSpace and J48 [20]. to external sites to achieve their goals, such as phishing, scams,
Features used in previous works [1], [20], [30], [33] can be and malware downloading [38]. Therefore, we only focus on
fabricated easily through purchasing more followers, posting spam tweets with URLs.
more tweets, or mixing spam with normal tweets [34]. Thus, Currently, researchers use two ways to build ground-truth,
some researchers [29], [34] proposed robust features which manual inspection and blacklists filtering. While manual in-
rely on the social graph to avoid feature fabrication. Song et spection can label a small number of training data, it is very
al. extracted the distance and connectivity between a tweet time- and resource-consuming. A large group of people are
sender and its receiver to determine whether it was spam or needed to check tens of thousands of tweets. Although HIT
not [29]. After importing their features into previous feature (human intelligence task) websites can help label the tweets,
set, the performance of several classifiers were improved to it is also costly and sometimes the results are doubtful [3].
nearly 99% True Positive and less than 1% False Positive. Others apply existing blacklisting services, such as Google
While in [34], Yang et al. proposed more robust features, such SafeBrowsing and URIBL [13] to label spam tweets. Never-
as Local Clustering Coefficient, Betweenness Centrality and theless, these services API limits make it impossible to label
Bidirectional Links Ratio. By comparing with four existing a large amount of tweets.
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. X, NO. X, MONTH 2015 3

TABLE I: Extracted Features


Feature No. Feature Name Description
f1 account age The age (days) of an account since its creation until the time of sending the most recent tweet
f2 no follower The number of followers of this twitter user
f3 no following The number of followings/friends of this twitter user
f4 no userfavourites The number of favourites this twitter user received
f5 no lists The number of lists this twitter user added
f6 no tweets The number of tweets this twitter user sent
f7 no retweets The number of retweets this tweet
f8 no hashtag The number of hashtags included in this tweet
f9 no usermention The number of user mentions included in this tweet
f10 no urls The number of URLs included in this tweet
f11 no char The number of characters in this tweet
f12 no digits The number of digits in this tweet

We apply Trend Micros Web Reputation Technology to from the aspect of the change of mean value of each feature
identify which tweets are deemed spam [4]. Trend Micros from day to day.
WRT system maintains a large dataset of URL reputation Fig. 1 shows the changing trend of average value of each
records, which are derived from their customers opt-in URL feature for two classes in 10 days. In general, the variation of
filtering records. WRT system is dedicated to collecting the average value of feature from spam tweets is greater than that
latest and the most popular URLs, to analysing them, and then of non-spam tweets. Fig. 1a shows that, the average value of
to providing Trend Micro customers with real-time protection Account Age for spam tweets ranges from 530 to 730, and the
while they are surfing the web. Hence, through checking URLs variation is dramatic. However, it deviates from 710 to 740 for
with the WRT system, we are able to identify whether a URL non-spam tweets, which is relatively stable. It is due to the fact
is malicious or not. We define those which contain malicious that spammers are creating a large number of new accounts to
URLs as Twitter spam. In our collected data, we labelled one send spam once their old account are blocked. For instance,
million spam tweets and one million non-spam tweets for 10 we have 3 spammers with account age of 2 days, 6 days, 10
days, with 100k spam tweets and 100k non-spam tweets for days in the first day, the average value of Account Age is (2
each day. +6 +10) / 3 = 6 days. In the second day, if the spammer whose
Feature extraction is a key component in machine learning account age is 2 days is detected and removed, the average
based classification tasks [37]. Some studies [1], [30], [33] value of Account Age is (6+10) / 2 = 8 days, which increases.
have applied a few features which make use of historical in- In addition, spammers may also generate new accounts with 0
formation of a user, such as tweets that the user sent in a period day Account Age to spread spam after some of their accounts
of time. While these features may be more discriminative, it is are block, which can lead the decrease of average value of
not possible to collect them due to the restrictions of Twitters Account Age. That is why the average value of Account Age
API. Other researchers [29], [34] applied some social graph is fluctuating. Naturally, spammers tend to keep following new
based features, which are hard to be evaded. Nevertheless, It is friends as they want to be exposed to public more frequently,
significantly expensive to collect those features, as they cannot whereas for non-spammers, their number of followings are not
be calculated until the social graph is formed. Thus, those changing too much once they have built their friend circle,
expensive features are not suitable for real-time detection, as we can see from Fig. 1c. As expected, most of the other
despite that they have more discriminative power in separating features have the same trend: the average value of one feature
spammers and legitimate users. The longer time a spam tweet varies for spam tweets, while it is stable for non-spam tweets.
exists, the more chance it can be exposure to victims. Thus, it To sum up, the characteristics of spam tweets is varying
is very important to detect spam tweets as early as possible. from day to day, while that of non-spam tweets is not changing
To reduce the loss caused by spam, real-time detection is much, as we see from Fig. 1. Spam Drift is a crucial issue
in demand. Consequently, we only focus on extracting light- in Twitter spam detection, which is in great need to be solved.
weight features which can be used for timely detection. These
features can be straightforwardly extracted from the collected C. Problem Justification
tweets JSON data structure [23] with little computation. We
In previous section, we simply compare some representative
have totally extracted 12 features from our dataset as listed in
statistics, such as the mean values of features to show the
TABLE I.
Spam Drift problem. To further illustrate the changing of
the statistical features in a dataset, a natural approach is to
model the distribution of the data [8]. There are two kinds
B. Problem Statement
of approaches: parametric and non-parametric. Parametric
In the real world, the statistical features of spam tweets are approaches are very powerful when the specific distribution
changing in unpredicted ways over time. As a result, machine of the dataset, like Normal Distribution, is already known.
learning based detection system becomes inaccurate. The issue However, the distribution of the Twitter spam data is unknown,
is referred to as Spam Drift problem in our previous paper thus it is not possible to apply parametric approaches. Con-
[5]. Here, we present an investigation of Spam Drift problem sequently, non-parametric methods, such as statistical tests,
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. X, NO. X, MONTH 2015 4

750 1400 0.5

1300 0.45

700 1200
0.4

1100

Avg. of Feature Value


Avg. of Feature Value
Avg. of Feature Value

0.35
650
1000
0.3
900
600
0.25
800

0.2
700
550

600 0.15
spam spam spam
nonspam nonspam nonspam
500 500 0.1
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
Day Day Day

(a) Account Age (b) No. of Followings (c) No. of User Mentions per Tweet

Fig. 1: Changes of Average Values of Features

TABLE II: KL Divergence of Spam and Nonspam Tweets of two Consecutive Days
D1 VS D2 D2 VS D3 D3 VS D4 D4 VS D5 D5 VS D6 D6 VS D7 D7 VS D8 D8 VS D9 D9 VS D10
f1 0.36 0.04 0.34 0.03 0.44 0.04 0.24 0.03 0.26 0.03 0.27 0.03 0.29 0.05 0.26 0.03 0.34 0.04
f2 0.24 0.1 0.22 0.1 .26 0.1 0.19 0.1 0.21 0.1 0.21 0.1 0.17 0.1 0.38 0.1 0.35 0.1
f3 0.28 0.07 0.22 0.07 0.32 0.07 0.15 0.07 0.22 0.07 0.2 0.07 0.2 0.08 0.26 0.08 0.23 0.08
f4 0.16 0.07 0.13 0.07 0.14 0.08 0.14 0.07 0.17 0.07 0.19 0.07 0.13 0.07 0.27 0.08 0.19 0.08
f5 0.02 0.01 0.02 0.01 0.03 0.01 0.02 0.01 0.01 0.01 0.02 0.01 0.01 0.01 0.05 0.01 0.05 0.01
f6 0.98 0.35 0.52 0.35 0.63 0.35 0.36 0.35 0.45 0.34 0.4 0.34 0.45 0.35 0.5 0.35 0.52 0.36
f7 0.1 0.04 0.08 0.03 0.04 0.04 0.04 0.04 0.05 0.03 0.07 0.04 0.06 0.04 0.1 0.04 0.08 0.04
f8 0.19 0 0 0 0.04 0 0.03 0 0.02 0 0.03 0 0.01 0 0.04 0 0.02 0
f9 0.09 0 0.03 0 0.01 0 0.02 0 0.01 0 0.01 0 0 0 0.04 0 0.01 0
f10 0 0 0.03 0 0.03 0 0.01 0 0.1 0 0 0 0.01 0 0.32 0 0.27 0
f11 0.26 0.01 0.06 0.01 0.06 0.01 0.11 0.01 0.1 0 0.09 0 0.26 0.01 0.28 0.01 0.2 0.02
f12 0.04 0 0 0 0.02 0 0.03 0.01 0.03 0 0.04 0 0.04 0 0.46 0 0.46 0

entropy is defined as
Changed X P (i)
spam Dkl (P kQ) = P (i)log .
i
Q(i)
It is used to compare two probability distributions. We need
to map data points into distributions to apply the formula.
According to [7], let s = {x1 , x2 , . . . , xn } be a multi-set from
a finite set F containing numerical feature values, and denote
N (x|s) the number of appearances of x s, thus the relative
Non-spam proportion of each x is donated by
N (x|s)
. Ps (x) =
Original n
spam Decision However, the ratio of p/q is undefined if Q(i) = 0. As
Boundary suggested by [18], the estimate Ps is replaced as,
Fig. 2: Illustration of Spam Drift N (x|s) + 0.5
Ps (x) = .
n + |F |/2
when |F | is number of elements in the finite set F. The
distance between two days tweets, D1 and D2 is,
which make no assumptions of the dataset distributions are X PD1 (x)
D(D1kD2) = PD1 (x)log .
used by researchers [11]. PD2 (x)
xF

The statistical tests are to compute the distance of two We compute the KL Divergence of each feature of spam
distributions to determine the change. One of the most com- and non-spam tweets in two adjacent days, which is listed
mon measures to compute the distance of distributions is in TABLE II. The shadowed ones are the KL Divergence
Kullback-Leibler (KL) Divergence [8], [27]. The suitability of of features of non-spam tweets, while the others are the
KL Divergence to be used in measuring distributions can be KL Divergence of features of spam tweets. KL Divergence
found in [8]. KL Divergence, which is also known as relative indicates the dissimilarity of two distributions. The larger the
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. X, NO. X, MONTH 2015 5

/DEHOOHGWZHHWV extraordinary expensive to have human label a large amount


of changed tweets. Consequently, we make use of the above
mentioned two components to automatically extract changed
1RQVSDPWZHHWV
8QODEHOOHGWZHHWV spam tweets from a set of unlabelled tweets, which are very
5)&ODVVLILHU
Spam tweets
easy to collected from Twitter. Once getting enough labelled
6SDPWZHHWV
LDT
changed spam tweets, we implement the scheme which
employs a sufficiently powerful algorithm, Random Forest,
to perform classification. Our Lfun scheme is summarised in
Algorithm 1.
8QODEHOOHGWZHHWV $FWLYHO\
5)&ODVVLILHU 3U W 5 ODEHOOHGWZHHWV
A. Learning from Detected Spam Tweets
LHL
LDT is used to deal with a classification scenario where
there is a sufficiently robust algorithm, but in lack of more
7HVWLQJWZHHWV
data [25]. By learning from a large number of unlabelled data,
5)&ODVVLILHU &ODVVLILFDWLRQ5HVXOWV LDT can obtain sufficient new information, which can be used
to update the classification model.
Fig. 3: Lfun Framework In a LDT learning scenario, we are given a labelled data
set Tl = {(x1 , y1 ), (x2 , y2 ), . . . , (xm , ym )}, containing m
labelled tweets, where xi Rk (i = 1, 2, . . . , m) is the
value is, the more different the two distributions are. As shown
feature vector of a tweet, yi {spam, non spam} is the
in Table II, the KL Divergence of spam tweets in two adjacent
category label of a tweet. We are also given a large data
days are much larger than that of the non-spam tweets for
set Tu = {xm+1 , ym+1 ), (xm+2 , ym+2 ), . . . , (xm+n , ym+n )}
more than half the features. Taking f1 (account age) for
containing n unlabelled tweets (n >> m). Then a classifier
example, the KL Divergence of spam between Day 1 and
is trained by Tl . can be used to divide Tu into spam Tspam
Day 2 is 0.36, while it is only 0.04 for non-spam, which
and non-spam Tnonspam . Labelled spam tweets from Tu will
indicates that the distribution of f1 of spam in Day 1 is
be added into the labelled data set Tl to form a new training
much different to it in D2, compared with non-spam tweets
data set.
distribution. From these KL Divergence values, we can see
The basic of LDT is to find a function : Rk
that the distribution of spam tweets features is changing
{spam, non spam} to predict the label y {spam, non
unpredictably from day to day. Nevertheless, the distribution of
spam} of new tweets when trained by Tl+spam , which is the
training data is unchanged. As the knowledge structure which
combination of the labelled data set Tl and spam tweets Tspam
learns from the unchanged training data is not updated while
identified from Tu . Particularly, the unlabelled data set Tu used
being used to classify new incoming tweets, the performance
in LDT does not have to share the same distribution with the
of classifiers becomes inaccurate. As it is illustrated in Fig. 2,
labelled data set Tl [16]. In addition, only detected spam tweets
while the spam changes, the decision boundary is not updated.
will be added into the training data. The reason is that, weve
Consequently, more spam tweets are misclassified as non-
already gained sufficient information of non-spam tweets, as
spam.
the statistical properties are not changing for non-spam tweets.
IV. P ROPOSED S CHEME : Lfun It is not necessary for us to gain more information about non-
spam tweets.
Existing machine learning based spam detection methods
However, the spam tweets detected by the classifier that is
suffer from the problem of Spam Drift due to the change
trained using Tl also have the same or similar distribution of
of statistical features of spam tweets as time goes on. When
old spam. We need samples from changed spam to calibrate
spam drifts, the old classification model is not updated with
the classifier. We then use LHL (in Section IV-B) to get
changed spam samples, as a result, the classification results
changed spam samples.
will gradually become inaccurate. To solve this problem,
obtaining the changed samples to update the classification
model is very important. By observing that there are such B. Learning from Human Labelling
samples in the unlabelled incoming tweets which are very In a supervised spam detection system, a learning algorithm,
easy to collect, we propose a scheme called Lfun to address such as Random Forest, must be trained by sufficient labelled
Spam Drift problem. data to obtain more accurate detection results. However, la-
This section presents our Lfun scheme to deal with the belled instances are very expensive and time-consuming to
drift problem in Twitter spam detection. Fig. 3 illustrates obtain. Fortunately, we have a huge number of unlabelled
the framework of our proposed scheme. There are two main tweets which can be easily collected. The LHL in our Lfun is
components in this framework: LDT is to learn from detected best suited where there are numerous unlabelled data instances,
spam tweets and LHL is to learn from human labelling. In and human annotator anticipating to label many of them to
Drifted Spam Detection scenario, we have already got a train an accurate system [28]. LHL aims to minimize the
small amount of labelled spam and non-spam tweets. How- labelling cost by using different learning criteria to select most
ever, there are not enough samples of changed spam. It is informative samples from unlabelled data to be labelled by a
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. X, NO. X, MONTH 2015 6

Require: labelled training set {1 , ..., N }, The learning criteria is set to select the most useful instances
unlabelled tweets Tunlabelled , Xselected and add them to the training set Ttraining for achiev-
a binary classification algorithm , ing some certain objectives. Let us consider this objective as
the minimization of generation error of a learned function
Ensure: manually SNlabelled selected tweets Tm trained by Ttraining . So the learning criteria can be donated
1: Tlabelled i=1 i
// Use to create a classifier Cls from Tlabelled : as
2: Cls : Tlabelled Error(Ttraining {Xselected }).
// Tunlabelled is classified as Tspam and Tnonspam :
The goal of this kind of learning is to select instances
3: Tspam + Tnonspam Tunlabelled
Xselected which can minimize the generalisation error
// Merge spam tweets Tspam classified by Cls into
Error(Xselected ):
Tlabelled :
4: Tex Tlabelled + Tspam argmin Error(Xselected ).
// use Tex to re-train the classifier Cls :
5: Cls : Tex As a result, good selection criteria must be estimated to
// determine the incoming tweets suitability for minimize the error. In Lfun scheme, we apply the selection
selection: criteria, called Probability Threshold Filter Model, to select
6: U the most informative tweets to tackle Spam Drift. In order
7: for i = 1 to k do to achieve this, Random Forest (RF) is used to determine
8: if Ui meet the selection criteria S then the probability of a tweet whether it belongs to spam or not.
9: U (U Ui ) Random Forest [2] can generate many classification trees after
10: end if being trained with Tex from Asymmetric Self-Learning. When
11: end for classifying a new incoming tweet, each tree in the forest will
// manually labelling each ui in U give a class prediction. Then forest chooses the classification
12: Tm result which has the most votes. In our case, we set the
13: for i = 1 to k do number of trees to m, if n trees vote for the class spam, the
n
14: manually label each ui probability of the tweet to be classified as spam is P r = m .
15: Tm (Tm ui ) Through our empirical study, the mis-classification mostly
16: end for occurred when P r [0.4, 0.7]. So we set the threshold to
Algorithm 1: Lfun Algorithm P r [0.4, 0.7]. After we pre-filter some candidate tweets to
be labelled using the Probability Threshold Filter Model, the
number of tweets is still too many. We then randomly select
human annotator [39]. We also import active learning in our a smaller number of tweets from the candidate tweets (we set
Lfun scheme. it to be 100 in our experiments) to be manually labelled. As
Now let us define our learning component in a for- shown in Fig. 3, the manually labelled tweets, along with Tex
mal way. In supervised Twitter spam detection, we will be used to train a new classifier, which can tackle Spam
are given a labelled training data set Ttraining = Drift problem.
{(x1 , y1 ), (x2 , y2 ), . . . , (xm , ym )}, containing m labelled
tweets, where xi Rk (i = 1, 2, . . . , m) is the feature vector of
C. Performance Benefit Justification
a tweet, yi {spam, non spam} is the category label of a
tweet. The label yi of a tweet xi is donated as y = f (x). The We study the performance benefit of the proposed Lfun
task is then to learn a function f which can correctly classify scheme by providing the theoretical analysis in this section.
a tweet to spam or non-spam. We use generalisation error to Fig. 4 illustrates the performance benefit by using simulation.
measure the accuracy of the learned function: We use three normal distributions (listed below) to simulate
  this: w0 represents the distribution of non-spam, while w1 and
P (x).
X
Error(f) = L f (x), f (x) w2 represents the distribution of spam before and after using
xTtraining our Lfun approach, respectively.
In practice, f (x) is not available for testing data instances.
w0 N (0 , 02 )
Therefore, it is usual to estimate the generalisation error by 2
w1 N (1 , 12 )
the test error: 2
w2 N (2 , 12 )
 
P (x),
X
Error(f) = L f (x), f (x) The PDFs (probability distribution functions) [9] of these
xTtesting
three distributions, w0 , w1 and w2 are illustrated as p0 , p1
where Ttesting refers to the testing tweets, and prediction error and p2 in Fig. 4. We assume that only the mean 1 of w1
can be measured by a loss function L, such as mean squared changes to 2 , but the variance 12 is not changing.
error (MSE) [26]: As p1 translated to p2 , we can always find m, which can
   2 make

LM SE f (x), f (x)
= f (x) f (x) . m c2 = 1 2 , (1)
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. X, NO. X, MONTH 2015 7

2
0.14
spam 1
nonspam
The differentiation of Equation 10 is 0 (x) = 12 ex /2 > 0.
spam 2

0.12
So, we can have (a) > (b) when a > b. From Equation. 8,
we know c212
2
> c1
12
1
. Consequently,
0.1
translation c2 2 c1 1
p0 ( ) > ( ). (11)
0.08
12 12
c1 0 c2 0
p2
As c1 > c2 , we have 0 > 0 . Then, we know
0.06 p1

c1 0 c2 0
( ) > ( ). (12)
0.04
0 0
0.02
Substitute Equation. 11 and 12 into 9, we will have
P1 (error) P2 (error) > 0. (13)
0
0 5 10 15 20 25 30 35 40 45 50
c2 c1 m u0
u2 u1
Obviously, our proposed approach can effectively reduce the
Fig. 4: Performance Benefit Illustration probability of error from Equation 13.

V. P ERFORMANCE E VALUATION
and In this section, we evaluate the performance of the proposed
p1 (m) = p2 (c2 ). (2) Lfun scheme in detecting drifted Twitter spam. All the
As c2 < c1 , we have experiments are carried out on our real-world 10 consecutive
days tweets with each day containing 100k spam tweets and
p0 (c2 ) < p0 (c1 ). (3) 100k non-spam tweets.
As in existing works [34], we also use F-measure and
We also have Detection Rate to measure the performance. Despite that both
p0 (c1 ) = p1 (c1 ), p0 (c2 ) = p2 (c2 ). (4) of the metrics are used to evaluate all the classes performance,
we only focus on the F-measure and Detection Rate of spam
From Equation. 3 and Equation. 4, we get class. F-measure is an evaluation metric which combines
precision and recall to measure the per-class performance of
p1 (c1 ) > p2 (c2 ). (5)
classification or detection algorithms. It can be calculated by
From Equation. 2 and Equation. 5, we can have 2 P recision Recall
F measure = .
p1 (c1 ) > p1 (m). (6) P recision + Recall
Detection Rate is defined as the ratio of those tweets correctly
As a result, classified as belonging to class spam to the total number of
m > c1 . (7) tweets in class spam, it can be calculated by
Taking into account Equation. 7 and Equation. 1, we can have TP
DetectionRate = .
c1 c2 < 1 2 . So, TP + FN
In the evaluation, we have designed three sets of experi-
c2 2 > c1 1 . (8)
ments in order to show the impact of spam drift (in Section
The error rate of classification before Lfun, V-A) firstly, then the benefit of our proposed Lfun (in Section
V-B) and the comparisons with other traditional machine learn-
P1 (error) = P (x > c1 ) + P (x < c2 )
Z Z c1 ing algorithms (in Section V-C). We repeat the experiments for
= p1 (t)dt + p0 (t)dt 100 times with different random training samples and report
c1 the average values on all the 100 runs.
c1 1 c1 0
= 1 ( ) + ( ).
12 0 A. Impact of Spam Drift
Similarly, we have the error rate after using Lfun In order to evaluate the impact of Spam Drift problem, we
c2 2 c2 0 perform a number of experiments in this section. It is aiming
P2 (error) = 1 ( ) + ( ). to show that the performance of a traditional classifier, for
12 0
example C4.5 Decision Tree, varies over time when Spam
The difference of P1 (error) and P2 (error),
Drift exists.
P1h(error) P2 (error) i h During these experiments, Day 1 data is divided into two
i
= ( c2 2
) ( c1 1
) + ( c1 0
) ( c2 0
) , parts, half for training pool where training data can be ex-
12 0 0
12
tracted from, and another half for testing purpose. We create
(9)
a classifier by using a supervised classification algorithm, and
while x
1
Z
2 train it with 10k spam and 10k non-spam tweets which are
(x) = et /2
dt. (10) randomly sampled from the training pool of Day 1. Then the
2 0
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. X, NO. X, MONTH 2015 8

1 1 1

0.9 0.9 0.9

0.8 0.8 0.8

0.7 0.7 0.7

0.6 0.6

Detection Rate
Detection Rate
0.6
Detection Rate

0.5 0.5 0.5

0.4 0.4 0.4

0.3 0.3 0.3

0.2 0.2 0.2

0.1 0.1 0.1


spam spam spam
nonspam nonspam nonspam
0 0 0
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
Day Day Day

(a) Random Forest (b) C4.5 Decision Tree (c) Bayes Network

Fig. 5: Trend of Detection Rate

classifier is used to classify the testing data in Day1, as well


as the testing samples in Day 2 to Day 10. 1 1

0.9 0.9

Fig. 5 shows the Detection Rate of both spam and non-spam 0.8 0.8

tweets on three classifiers, Random Forest, C4.5 Decision 0.7 0.7

0.6

Detection Rate
0.6

Detection Rate
Tree and Bayes Network. We can see that, the Detection 0.5 0.5

0.4 0.4
Rate of non-spam is very stable, it keeps above 90% for 0.3 0.3

Random Forest and C4.5 Decision Tree, and near 90% for 0.2 0.2

0.1 0.1
RFLfun RFLfun
Bayes Network, despite the change of testing data. However, 0
2 3 4 5 6 7 8
RFOriginal
9
0
3 4 5 6 7 8 9
RFOriginal
10
Day Day

when it comes to spam tweets, the Detection Rate fluctuates


dramatically, and the overall trend is decreasing. The Detection (a) Day 1 training, Day 2 to 9 testing(b) Day 2 training, Day 3 to 10 testing
Rates for Random Forest and C4.5 Decision Tree are 90% in Fig. 6: Detection Rate of Lfun
the first day, but they could decrease to less then 40% in the
9th day. This phenomenon also applies with Bayes Network,
the Detection Rate decreases from 70% on 1st day to less than 1 1

50% for most of the other testing days. 0.9

0.8
0.9

0.8

0.7 0.7

0.6 0.6
Fmeasure

Fmeasure

B. Performance of Lfun 0.5

0.4
0.5

0.4

We evaluate the performance of Lfun here, by using F- 0.3

0.2
0.3

0.2

measure and Detection Rate. The number labelled training 0.1


RFLfun
RFOriginal
0.1
RFLfun
RFOriginal
0 0
samples from old day (i.e. Day 1 and Day 2 in this case) is 2 3 4 5
Day
6 7 8 9 3 4 5 6
Day
7 8 9 10

5000. The number of manually labelled samples during Lfun (a) Day 1 training, Day 2 to 9 testing(b) Day 2 training, Day 3 to 10 testing
is set to 100.
Fig. 6 shows the Detection Rate of Lfun, when Day 1 data Fig. 7: F-measure of Lfun
(Fig. 6a) or Day 2 data (Fig. 6b) is used for training and the
rest days are used for testing. We can see from Fig. 6a that,
ing from 80% to 55% as the testing data changes from Day
the Detection Rates of original Random Forest are relatively
2 to Day 9 in Fig. 7a. However, once it is applied with
low. For example, the Detection Rate when testing on Day 9 is
our Lfun approach, the F-measure becomes stable, which
only around 40%. However, our RF-Lfun can reach over 90%
is always greater than 80%, except on Day 8. Similarly,
Detection Rate on the same day. While Random Forest can
when the training data is from Day 2, F-measure of Random
only achieve Detection Rate ranging from 45% to 80%, our
Forest is decreasing as well. But F-measure of our Lfun-
RF-Lfun can rise as high as 90% Detection Rate. This also
RF is not fluctuating, as shown in Fig. 7b. Nevertheless, the
happens when training data is from Day 2, and testing data is
proposed Lfun can effectively improve the F-measure and the
from Day 3 to Day 10, as illustrated in Fig. 6b. The highest
improvement is up to 25% in the best case.
Detection Rate of Random Forest is around 85%, but that of
RF-Lfun is over 95%. Generally, our Lfun can detect most of
the spam tweets even with Spam Drift. The reason is that, C. Comparisons with other Algorithms
our Lfun brings more samples of changed spam tweets to In this section, we compare our Lfun approach with four
update the training process. traditional machine learning algorithms (Random Forest, C4.5
Fig. 7 shows the F-measure of Random Forest using Lfun Decision Tree, Bayes Network and SVM) to detect spam tweets
approach compared with it without using Lfun. We can see in the drift scenario. There are two sets of experiments
that, the F-measure of original Random Forest keeps decreas- carried out. One set is to evaluate the performance while
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. X, NO. X, MONTH 2015 9

training data is from Day 1, and testing data are varying from without much training data at the beginning, but to be up-
Day 2 to Day 9. Another set is to evaluate the performance dated when new training data comes. Different to online and
when training and testing data are from two specified days, but incremental learning, we incorporate both automated labelling
the number of labelled training data is changing from 1000 to and human labelling. The LDT component learns from the
10000. detected tweets. This competent is automatically updated with
1) Comparisons with Changing Days: Fig. 8 demonstrates detected spam tweets with no human effort. To better adjust
the experimental results in terms of overall accuracy, F- the prediction model, we also import LHL component, which
measure and detection rate of Lfun compared to other algo- learns from human labelling. To minimize human effort, LHL
rithms, when the testing days are varying. We can see from only samples a very small number of tweets for labelling, for
Fig. 8a that, the overall accuracy of Lfun outperforms all the example, 100 tweets in our experiments. In addition, it does
other algorithms, followed by Random Forest, C4.5 Decision not randomly pick up tweets to label, but to be in line with
Tree, Bayes Network and SVM. In terms of F-measure (see selection criteria called Probability Threshold Filter Model
Fig. 8b), our Lfun is also the best among all the algorithms. which can choose the most useful tweets. Benefiting from
For example, it is over 30% higher than C4.5 Decision Tree these two components, our Lfun approach can successfully
when testing data is from Day 9. Furthermore, the performance deal with spam drift, but with the least human effort.
of Lfun is much better in terms of detection rate. Fig. 8c show
that, the detection rate of Lfun is above 90% for most of the
VII. C ONCLUSION AND F UTURE W ORK
days. However, the detection rate of all the others is below
80%. Especially, Bayes Network has the lowest detection rate, In this paper, we firstly identify the Spam Drift problem
which is below 50%. In general, our Lfun is the best among in statistical features based Twitter spam detection. In order to
all the algorithms evaluated by all the three metrics. solve this problem, we propose a Lfun approach. In our Lfun
2) Comparisons with Changing Labelled Training Samples: scheme, classifiers will be re-trained by the added changed
Fig. 9 and Fig. 10 report the evaluation results when the spam tweets which are learnt from unlabelled samples, thus
number of labelled training samples is changing. The training it can reduce the impact of Spam Drift significantly. We
and testing data is from Day 1 and Day 5 in Fig. 9, while the evaluate the performance of Lfun approach in terms of De-
training and testing data is from Day 4 and Day 8 in Fig. 10. tection Rate and F-measure. Experimental results show that
We can see that the overall accuracy of Lfun increases from both detection rate and F-measure are improved a lot when
70% to 80% with the increase of labelled training samples. applying with our Lfun approach. We also compare Lfun to
It is better than the four algorithms in comparison, as the four traditional machine learning algorithms, and find that
best of them (C4.5 Decision Tree) can only achieve less our Lfun outperforms all four algorithms in terms of overall
than 74% overall accuracy. When it comes to F-measure, the accuracy, F-measure and Detection Rate.
performance of Lfun is still the best; it is 10% higher than There is also a limitation in our Lfun scheme. The benefit
that of C4.5 Decision Tree and nearly 30% higher than that of old labelled spam is to eliminate the impact of spam
of SVM. In terms of detection rate, our Lfun is about 30% drift to classify more accurate spam tweets in future days.
higher than the second best algorithm. Similarly in Fig. 10, The effectiveness of old spam has been proved by our
Lfun outperforms all the other algorithms. experiments during a short period. However, the effectiveness
will decrease as the correlation of very old spam becomes
VI. D ISCUSSIONS less with the new spam in the long term run. In the future, we
In research community, there are also some machine learn- will incorporate incremental adjustment to adjust the training
ing approaches related to our proposed method. For exam- data, such as dropping the too old samples after a certain
ple, online learning and incremental learning. They are both time. It can not only eliminate unuseful information in the
common machine learning algorithms to continuously update training data but also make it faster to train the model as the
the prediction model with new training data for better future number of training samples decrease.
classification. They can generate a prediction model and put
it into operation without much training data at first, but they ACKNOWLEDGMENT
require new training data to update the model. When it comes
to online Twitter spam classification, it is very difficult to label The authors would like to thank Trend Micro for providing
enough training samples to update the model. The reasons are us the service to label spam tweets. This work is supported
two-folds. Firstly, it is significantly time-consuming to label a by ARC Linkage Project LP120200266.
large amount of tweets by human. Secondly, it is difficult to
gain enough spam tweets even we have got a large number R EFERENCES
of human-labelled tweets, as the spam rate of Twitter is about
5% [6]. If there are not enough spam samples (Lfun does not [1] F. Benevenuto, G. Magno, T. Rodrigues, and V. Almeida. Detecting
spammer on twitter. In Seventh Annual Collaboration, Electronic
need non-spam samples as non-spam tweets are not drifting) messaging, Anti-Abuse and Spam Conference, July 2010.
to retrain the model, it is not able to solve the spam drift [2] L. Breiman. Random forests. Machine Learning, 45(1):532, 2001.
issue. [3] C. Castillo, M. Mendoza, and B. Poblete. Information credibility on
twitter. In Proceedings of the 20th international conference on World
Our Lfun approach has the same advantage of online wide web, WWW 11, pages 675684, New York, NY, USA, 2011.
learning and incremental learning, i.e., it can be deployed ACM.
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. X, NO. X, MONTH 2015 10

1 1 1

0.9 0.9 0.9

0.8 0.8 0.8

0.7 0.7 0.7


Overall Accuracy

0.6 0.6 0.6

Detection Rate
Fmeasure
0.5 0.5 0.5

0.4 0.4 0.4

0.3 0.3 0.3

0.2 RF 0.2 RF 0.2 RF


C4.5 C4.5 C4.5
BayesNet BayesNet BayesNet
0.1 0.1 0.1
SVM SVM SVM
Lfun Lfun Lfun
0 0 0
2 3 4 5 6 7 8 9 2 3 4 5 6 7 8 9 2 3 4 5 6 7 8 9
Day Day Day

(a) Overall Accuracy (b) F-measure (c) Detection Rate

Fig. 8: Comparisons with other Algorithms (changing testing days)

1 1 1

0.9 0.9 0.9

0.8 0.8 0.8

0.7 0.7 0.7


Overall Accuracy

0.6 0.6 0.6

Detection Rate
Fmeasure

0.5 0.5 0.5

0.4 0.4 0.4

0.3 0.3 0.3

0.2 RF 0.2 RF 0.2 RF


C4.5 C4.5 C4.5
BayesNet BayesNet BayesNet
0.1 0.1 0.1
SVM SVM SVM
Lfun Lfun Lfun
0 0 0
1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
No. of Labelled Traning Samples No. of Labelled Traning Samples No. of Labelled Traning Samples

(a) Overall Accuracy (b) F-measure (c) Detection Rate

Fig. 9: Comparisons with other Algorithms (training on Day 1 and testing on Day 5)

1 1 1

0.9 0.9 0.9

0.8 0.8 0.8

0.7 0.7 0.7


Overall Accuracy

0.6 0.6 0.6


Detection Rate
Fmeasure

0.5 0.5 0.5

0.4 0.4 0.4

0.3 0.3 0.3

0.2 RF 0.2 RF 0.2 RF


C4.5 C4.5 C4.5
BayesNet BayesNet BayesNet
0.1 0.1 0.1
SVM SVM SVM
Lfun Lfun Lfun
0 0 0
1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
No. of Labelled Traning Samples No. of Labelled Traning Samples No. of Labelled Traning Samples

(a) Overall Accuracy (b) F-measure (c) Detection Rate

Fig. 10: Comparisons with other Algorithms (training on Day 4 and testing on Day 8)

[4] C. Chen, J. Zhang, X. Chen, Y. Xiang, and W. Zhou. 6 million spam information-theoretic approach to detecting changes in multi-
tweets: A large ground truth for timely twitter spam detection. In dimensional data streams. In In Proc. Symp. on the Interface of
IEEE ICC 2015 - Communication and Information Systems Security Statistics, Computing Science, and Applications, 2006.
Symposium (ICC15 (11) CISS), pages 86898694, London, United [9] R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification (2Nd
Kingdom, June 2015. Edition). Wiley-Interscience, 2000.
[5] C. Chen, J. Zhang, Y. Xiang, and W. Zhou. Asymmetric Self-Learning [10] M. Egele, G. Stringhini, C. Kruegel, and G. Vigna. Compa: Detecting
for tackling twitter spam drift. In The Third International Workshop on compromised accounts on social networks. In Annual Network and
Security and Privacy in Big Data (BigSecurity 2015), pages 237242, Distributed System Security Symposium, 2013.
Hong Kong, Hong Kong, Apr. 2015. [11] J. a. Gama, I. Zliobaite, A. Bifet, M. Pechenizkiy, and A. Bouchachia.
[6] C. Chen, J. Zhang, Y. Xiang, W. Zhou, and J. Oliver. Spammers are A survey on concept drift adaptation. ACM Comput. Surv., 46(4):44:1
becoming smarter on twitter. IT Professional, 18(2):1418, Mar.-April. 44:37, Mar. 2014.
2016. [12] H. Gao, Y. Chen, K. Lee, D. Palsetia, and A. Choudhary. Towards online
[7] I. Csiszar and J. Korner. Information theory: coding theorems for spam filtering in social networks. In NDSS, 2012.
discrete memoryless systems. Cambridge University Press, 2011. [13] H. Gao, J. Hu, C. Wilson, Z. Li, Y. Chen, and B. Y. Zhao. Detecting
[8] T. Dasu, S. Krishnan, S. Venkatasubramanian, and K. Yi. An and characterizing social spam campaigns. In Proceedings of the 10th
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. X, NO. X, MONTH 2015 11

ACM SIGCOMM conference on Internet measurement, IMC 10, pages [36] S. Yardi, D. Romero, G. Schoenebeck, and D. Boyd. Detecting spam
3547, New York, NY, USA, 2010. ACM. in a twitter network. First Monday, 15(1-4), January 2010.
[14] A. Greig. Twitter overtakes facebook as the most popular social network [37] J. Zhang, C. Chen, Y. Xiang, W. Zhou, and Y. Xiang. Internet
for teens, according to study. DailyMail, October 2013. traffic classification by aggregating correlated naive bayes predictions.
[15] C. Grier, K. Thomas, V. Paxson, and M. Zhang. @spam: the under- Information Forensics and Security, IEEE Transactions on, 8(1):515,
ground on 140 characters or less. In Proceedings of the 17th ACM Jan 2013.
conference on Computer and communications security, CCS 10, pages [38] X. Zhang, S. Zhu, and W. Liang. Detecting spam and promoting
2737, New York, NY, USA, 2010. ACM. campaigns in the twitter social network. In Data Mining (ICDM), 2012
[16] K. Huang, Z. Xu, I. King, M. Lyu, and C. Campbell. Supervised self- IEEE 12th International Conference on, pages 11941199, 2012.
taught learning: Actively transferring knowledge from unlabeled data. [39] I. Zliobaite, A. Bifet, B. Pfahringer, and G. Holmes. Active learning
In Neural Networks, 2009. IJCNN 2009. International Joint Conference with drifting streaming data. IEEE transactions on neural networks and
on, pages 12721277, June 2009. learning systems, 25(1):2739, 2014.
[17] R. Jeyaraman. Fighting spam with botmaker. Twitter Engineering Blog,
August 2014.
[18] R. Krichevsky and V. Trofimov. The performance of universal encoding.
Information Theory, IEEE Transactions on, 27(2):199207, Mar 1981.
[19] H. Kwak, C. Lee, H. Park, and S. Moon. What is twitter, a social
network or a news media? In Proceedings of the 19th international
conference on World wide web, WWW 10, pages 591600, New York,
NY, USA, 2010. ACM.
[20] K. Lee, J. Caverlee, and S. Webb. Uncovering social spammers: social
honeypots + machine learning. In Proceedings of the 33rd international
ACM SIGIR conference on Research and development in information
retrieval, SIGIR 10, pages 435442, New York, NY, USA, 2010. ACM.
[21] S. Lee and J. Kim. Warningbird: A near real-time detection system for
suspicious urls in twitter stream. IEEE Transactions on Dependable and
Secure Computing, 10(3):183195, 2013.
[22] J. Oliver, P. Pajares, C. Ke, C. Chen, and Y. Xiang. An in-depth
analysis of abuse on twitter. Technical report, Trend Micro, 225 E. John
Carpenter Freeway, Suite 1500 Irving, Texas 75062 U.S.A., September
2014.
[23] I. Ounis, C. Macdonald, J. Lin, and I. Soboroff. Overview of the
trec-2011 microblog track. In Proceeddings of the 20th Text REtrieval
Conference (TREC 2011), 2011.
[24] C. Pash. The lure of naked hollywood star photos sent the internet into
meltdown in new zealand. Business Insider, September 2014.
[25] R. Raina, A. Battle, H. Lee, B. Packer, and A. Y. Ng. Self-taught
learning: transfer learning from unlabeled data. In Proceedings of the
24th international conference on Machine learning, pages 759766.
ACM, 2007.
[26] N. Rubens, D. Kaplan, and M. Sugiyama. Active learning in recom-
mender systems. In Recommender Systems Handbook, pages 735767.
Springer, 2011.
[27] R. Sebastiao and J. a. Gama. Change detection in learning histograms
from data streams. In Proceedings of the Aritficial Intelligence 13th
Portuguese Conference on Progress in Artificial Intelligence, EPIA07,
pages 112123, Berlin, Heidelberg, 2007. Springer-Verlag.
[28] B. Settles. Active learning literature survey. University of Wisconsin,
Madison, 52(55-66):11, 2010.
[29] J. Song, S. Lee, and J. Kim. Spam filtering in twitter using sender-
receiver relationship. In Proceedings of the 14th international conference
on Recent Advances in Intrusion Detection, RAID11, pages 301317,
Berlin, Heidelberg, 2011. Springer-Verlag.
[30] G. Stringhini, C. Kruegel, and G. Vigna. Detecting spammers on
social networks. In Proceedings of the 26th Annual Computer Security
Applications Conference, ACSAC 10, pages 19, New York, NY, USA,
2010. ACM.
[31] K. Thomas, C. Grier, J. Ma, V. Paxson, and D. Song. Design and
evaluation of a real-time url spam filtering service. In Proceedings of
the 2011 IEEE Symposium on Security and Privacy, SP 11, pages 447
462, Washington, DC, USA, 2011. IEEE Computer Society.
[32] K. Thomas, C. Grier, D. Song, and V. Paxson. Suspended accounts in
retrospect: an analysis of twitter spam. In Proceedings of the 2011 ACM
SIGCOMM conference on Internet measurement conference, IMC 11,
pages 243258, New York, NY, USA, 2011. ACM.
[33] A. H. Wang. Dont follow me: Spam detection in twitter. In Security
and Cryptography (SECRYPT), Proceedings of the 2010 International
Conference on, pages 110, 2010.
[34] C. Yang, R. Harkreader, and G. Gu. Empirical evaluation and new
design for fighting evolving twitter spammers. Information Forensics
and Security, IEEE Transactions on, 8(8):12801293, 2013.
[35] C. Yang, R. Harkreader, J. Zhang, S. Shin, and G. Gu. Analyzing
spammers social networks for fun and profit: a case study of cyber
criminal ecosystem on twitter. In Proceedings of the 21st international
conference on World Wide Web, WWW 12, pages 7180, New York,
NY, USA, 2012. ACM.