Privacy Aware Collaborative Spam Detection

ALPACAS:A Large-scale Privacy-
Aware Collaborative Anti-spam

System
Guided by Presented by
M.Karthiga B.E Devasenapathi..A
Radhesh.M
ABSTRACT
The first is a feature-preserving message
transformation technique that is highly
resilient against the latest kinds of spam
attacks. The second is a privacy-preserving
protocol that provides enhanced privacy
guarantees to the participating entities.
INTRODUCTION
 To protect email privacy, digest approach has been
proposed in the collaborative anti-spam systems to
both provide encryption for the email messages and
obtain useful information from spam email.
 The digest calculation has to be a one-way function

such that it should be computationally hard to
generate the corresponding email message
System Requirements
Front End : Java.
Back End : My-SQL
IDE : Eclipse or Net beans 6.9.

Existing System
 The DCC system attempts to address the privacy
issue by using hash functions.
 Here, the participating servers do not share the

actual emails they have received and classified.
 They share the emails’ digests, which are computed

through hashing functions such as MD5 over the
email body.
Drawbacks in DCC
1. Hashing schemes like MD5 generate completely

different hash values even if the message is altered by
a single byte.
2. The DCC scheme does not completely address the

privacy issue.
 In designing the ALPACAS framework, this paper makes
two unique contributions:
1) Feature-preserving transformation
2) Protection via privacy-preserving protocol

THE ALPACAS ANTI-SPAM FRAMEWORK
ALPACAS framework addresses to design the
challenges of the collaborative anti-spam system.
1. To protect email privacy, it is obvious that the

messages have to be encrypted.
2. to minimize the information revealed during the

collaboration process.
 The ALPACAS framework essentially consists of a set of
collaborative anti-spam agents.
 An email agent can either be an entity that participates in

the ALPACAS framework on behalf of an individual end-
user, or it may represent an email server having multiple
end-users.
 Each email agent of the ALPACAS framework maintains

a spam knowledge base and a ham knowledge base ,
containing information about the known spam and ham
emails.
Feature-Preserving Fingerprint
 The fingerprint of an email is a set of digests that
characterize the message content.
 The set of digests is referred to as the transformed

feature set (TFSet) of the email.
 The individual digests are called the feature elements.
 The transformed feature set of a message Ma is

represented as TFSet(Ma).
Shingle-based Message Transformation
 Feature preserving fingerprint technique is based upon
the concept of Shingles
 Shingles are essentially a set of numbers that act as a

fingerprint of a document.
 Shingles have the unique property that if two documents

vary by a small amount their shingle sets also differ by a
small amount.
 The similarity between two messages Ma and
Mb can be calculated as
Term-level Privacy Preservation
 The possibility of inferring a word or a group of words is to

shuffle the tokens of the original email and compute TFset
on the shuffled email.
 To shuffle the email content in an acceptable manner, our

feature-preserving fingerprint scheme adopts a controlled
shuffling strategy wherein the tokens are shuffled in a
predetermined format.
 The position of a token after shuffling is always within a

fixed range of its original position.
Privacy-preserving Collaboration
Protocol
If the score is greater than a configurable threshold λ,
Ma is classified as spam. Otherwise it is classified as
ham.
Robustness Against Attacks
The robustness of the ALPACAS approach against

two common kinds of camouflage attacks.
1.one is good-word attack

2.character replacement attack.
Literature Review
 Understanding the Network Level Behavior of
Spammers
• spam is being sent from a few regions.
• IP address space, and that spammers appear to be
using transient
• Few pieces of email over very short periods
• Finally, a small, yet non-negligible, amount of spam
is received from IP addresses that correspond to
short-lived BGP
• routes, typically for hijacked prefixes.
Reference 2
SMTP Path Analysis
This paper presents a new

learning algorithm for learning the reputation
of email domains and IP addresses based on
analyzing the paths used to transmit known
spam and known good mail.
SMTP Path Analysis
This algorithm achieves many of the benefits
offered by domain-authentication systems,
black-list services, and white-list services
provide without any infrastructure costs or
rollout requirements.
Reference 3
On Attacking Statistical Spam Filters
Spammershavetriedmanythingsfromusing
HTMLlayout tricks, letter substitution, to
adding random data. While at times their
attacks are clever, they have yet to work
strongly against the statistical nature that
drives many altering systems.
Reference 3
 Here, examine the general attack methods
spammers use, along with challenges faced
by developers and spammers. It also
demonstrate an attack that, while easy to
implement, attempts to more strongly work
against the statistical nature behind alters.
Conclusion
We plan to establish this idea in voice ip spam
detection for privacy and securing purpose.

Privacy Aware Collaborative Spam Detection

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Privacy Aware Collaborative Spam Detection

Diunggah oleh

Hak Cipta:

Format Tersedia

ALPACAS:A Large-scale Privacy-

Aware Collaborative Anti-spam

 The digest calculation has to be a one-way function

Front End : Java.

Back End : My-SQL

IDE : Eclipse or Net beans 6.9.

 Here, the participating servers do not share the

 They share the emails’ digests, which are computed

1. Hashing schemes like MD5 generate completely

2. The DCC scheme does not completely address the

2) Protection via privacy-preserving protocol

1. To protect email privacy, it is obvious that the

2. to minimize the information revealed during the

 An email agent can either be an entity that participates in

 Each email agent of the ALPACAS framework maintains

 The set of digests is referred to as the transformed

 The individual digests are called the feature elements.

 The transformed feature set of a message Ma is

 Shingles are essentially a set of numbers that act as a

 Shingles have the unique property that if two documents

 The possibility of inferring a word or a group of words is to

 To shuffle the email content in an acceptable manner, our

 The position of a token after shuffling is always within a

The robustness of the ALPACAS approach against

1.one is good-word attack

This paper presents a new

Anda mungkin juga menyukai