Anda di halaman 1dari 20

Anti-spam techniques

Lorenzo Peraldo

February 10, 2008


Contents

1
Chapter 1

Introduction

The abuse of electronic messaging to send unauthorized and inappropriate bulk messages is commonly
named spamming. Spam is nowadays widely spread in different media, for example instant messaging
spam, web search engines spam, spam in blogs or forums, even mobile phone messaging spam, but the
most widely recognized and common form of spam is for sure e-mail spam.
E-mail spam is also known as unsolicited bulk e-mail (UBE) or unsolicited commercial e-mail(UCE)
and consists of sending e-mail messages, usually with commercial content, in large quantities to an
indiscriminate set of recipients. E-mail spamming started since the beginning of the internet and it
grew exponentially over the following years and nowadays spam e-mails represent the 80-85% of all
e-mail messages in the world. One of the reasons why the volume of spam has become higher and
higher every year is the fact that spamming has no costs for spammers. Therefore they can manage
very huge mailing lists without any operating costs thus adding more and more users to advertise
with bulk messages. Advertising messages are the most common but lately also other kinds of spam
messages started to travel through the net, such as political or religious purposes messages.
Although spamming has no costs for spammers, its effects are devastating in order of consumption
of computer and network resources and human attention and time. Moreover it has a high direct cost
for companies and internet service providers who want to fight spam, as well as indirect costs borne
by the victims of spam, such as financial theft, identity theft, data and intellectual property theft,
fraud, viruses and other malware infections that usually accompany spam messages.
Even though sending of junk e-mail has been prohibited from the beginning of the internet, enforced
by the Terms of Service/ Acceptable Use Policy (ToS/AUP) of the internet service providers, in many
states more permissive laws have been applied instead of tough laws against spam, especially in the US
(because of CAN-SPAM Act of 2003), while in other countries like Australia and the member countries
of the European Union anti-spam laws have been passed. As a result we can see from statistics that
nowadays the most spam e-mail are produced in the USA, while for example Australia’s rank in this
negative list has decreased since these tough laws against spamming were applied.

2
Chapter 2

Spam

In order to find a solution to the problem of spam it’s very important to define what is really considered
as spam and how spammers exploit weeknesses of the networks to spam.

2.1 Definition of spam


To be considered spam, an e-mail message must be first of all sent in bulk, that means it’s not sent to
a single recipient but to a larger mailing list, and whats more important, it must be an unwanted or
unsolicited e-mail, which means the recipient had never actually subscribed and confirmed subscription
to that mailing list. For this reason another name for e-mail spam is UBE (unsolicited bulk e-mail).
Another term often used to identify spam is UCE (unsolicited commercial e-mail) which just refers to
those spam messages having a commercial content. We’ll see later how these definitions of spam are
important because many anti-spam techniques are based on these definitions for spam filtering and
blacklisting.

2.2 How do spammers operate?


First of all spammers need a list of recipients to whom they’ll send spam messages. Both spammers
themselves and list merchants scan the net in order to find as many e-mail addresses as possible to
add to their lists. This process of e-mail addresses research is called address harvesting and is done
without the consent of the recipient. There are different ways in which this harvesting could be made.
The simplest one is gathering e-mail addresses from websites, usenet posts or discussion mailing lists.
As spam messages often contain viruses, these could include functions to scan the victim’s computer
for e-mail addresses even if they’ve never been exposed on the web. In some cases these viruses may
also scan the victim’s network interfaces, letting the spammer also gather e-mail addresses from traffic
addressed to the same network of the victim. Not all of these addresses harvested from the web are
valid and deliverable addresses, so spammers use some methods to find out if an address is valid or
not, for example if a recipient replies to a spam e-mail, or he clicks on a web link for unsubscribing
from a mailing list (which usually just reveals the e-mail address to more spammers).
Hardly ever spammers send spam e-mails from their own computers and in any case they usually
obfuscate their address with address spoofing. Spammers usually have many different accounts on
free webmail services in order to send tons of e-mails they couldn’t send from a single account. Even
though most of webmail service now adopt a system called catcha to avoid automated bots to create
accounts, spammers have found a means of circumventing this measure. Spammers also found the
way to protect themselves by hiding their tracks and at the same time get others’ systems to deliver
messages for them. To do so they started creating the so called botnets, made of several compromised
machines, and started to exploit the weaknesses of the network such as open relays and open proxies.
Open relays just pass along messages sent to it from any location to any recipient, so that a spammer
could just leave that relay the work of delivering all messages; open proxies instead create connections
from any client to any server without authentication, so that a spammer could simply connect to a mail
server and send spam trough it. Both open proxies and open relays were designed when spamming

3
wasn’t a problem yet, but as spam from these insecure resources grew, DNSBL operators started
listing their IP addresses in order to block spam coming form them.
Also for this reason, since 2003 spammers, rather than searching the global network for exploitable
services, began creating services on their own by commissioning computer viruses designed to deploy
proxies and other spam-sending tools on thousands of end-user computers. Virus-infected computers
not only serve spammers as spamming tools by sending spam messages, thus acting as proxies, but
also by perpetrating distributed denial-of-service attacks. To fight spam, many anti-spam techniques
have been implemented, with good or not so good results, but as years are passing by spammers are
always finding new methods to cheat these techniques.

4
Chapter 3

Anti-spam techniques

To prevent e-mail spam various anti-spam techniques are used both by end users and e-mail systems
administrators. Depending on who these techniques are executed by, they can be divided into four
categories: end-user techniques, if action by individual users is required; automated techniques for
e-mail administrators, if they can be automated and implemented directly on proxies or MTAs; auto-
mated techniques for e-mail senders, if they’re implemented on end-users’ computers maybe embedded
in products or software; techniques for researchers and law enforcement officials. None of these tech-
niques represent a complete and definitive solution to the problem of spam, as they all have a trade-off
between not blocking all spam vs rejecting legitimate messages, and the associated costs in time and
effort.

3.1 End-user techniques


These techniques can be applied by single users in order to reduce their attractiveness to spam and
restrict the availability of their e-mail addresses on the net. To do this there are many little expedient
everyone can make; some of these measures are actually just some little rules users should remember
and observe when they send e-mails or receive spam messages. For example it is very important
never to reply spam e-mail, first of all because many spammers see the reply as a proof that your
address is actually a valid address. Moreover, as sender’s addresses in spam e-mails are often forged or
invalid addresses, a reply would be totally useless and sometimes even reach innocent users. Another
important thing is not to trust links contained in spam messages because though they promise you
to be removed from the spammer’s mailing list they just lead to more spam. Another measure users
could use is the so called address munging, which consists of altering ones e-mail address so that
another user can still recognize it is a valid address, but machines cannot, in order to avoid address
harvesting to collect this address. Also posting anonymously or using disposable e-mail addresses are
good techniques to avoid spam. And finally, disabling the display of HTML, URLs and images in
e-mails can avoid offensive images to be shown and spyware to be installed on our machines.

3.2 Automated techniques for e-mail administrators


E-mail administrators can use many software systems and services in order to reduce the load of spam
in their systems and mailboxes. The two most known approaches are blocking and filtering. The
former depends upon rejecting messages from internet sites likely to send spam, the latter relies on
automatically analysing the content of e-mails and blocking those which look like spam. Many of
this filtering systems use machine learning techniques, which improves their accuracy over manual
methods, but in general filtering techniques are often found intrusive to privacy by some people so
that blocking is preferred by many e-mail administrators.
Some systems do not detect whether a message is spam or not, but they just accept messages
from trusted sites; this technique is known as authentication and repudiation and it uses the DNS
just like DNSBLs but instead of listing spammers sites, it’s used to list authorized sites. Another
method is requiring unknown senders to pass various tests, or better challenges, before their messages

5
are delivered. Some e-mail servers could decide to reject all messages coming from certain countries
they expect to never communicate with; therefore they use a country-based filtering technique based
on country of origin of the e-mail determined by the senders IP address. Very often used are DNSBLs,
or DNS-based Blackhole Lists. These lists, published via the DNS, list sites know to emit spam, open
mail relays or proxies or ISPs known to support spam, so that mail servers can easily reject mail from
those sources. Other DNS-based anti-spam system may instead use white listing and mark as good
(white) IPs domains or URLs. Some mail administrators could also reduce spam by setting restrictions
on the MTA, for example enforcing technical requirements of the SMTP and blocking mail coming
from systems not compliant with the RFC standards. For example a simple HELO/EHLO checking
can reduce spam significantly.
The PTR DNS records in the reverse DNS can be used for different things; for example most
e-mail MTAs use FCrDNS verification and if there is a valid domain name, put it into the Received:
trace header field. Some MTAs perform FCrDNS verification on the domain name given on the
SMTP HELO and EHLO commands, but in this case e-mail is not rejected by default. PTR DNS
records may be also used to check the domain names in the rDNS to see if they’re likely from dial-up
users, dynamically assigned addresses, or home-based broadband customers. And finally a Forward
Confirmed reverse DNS verification can create a weak form of authentication that there is a valid
relationship between the owner of a domain name and the owner of the network that has been given
an IP address. Despite this authentication is weak, it can be strong enough to be used for whitelisting
purposes because spammers and phishers cannot usually bypass this verification when they use zombie
computers to forge the domains.

3.2.1 Filtering techniques


Filtering techniques can rely on many different characteristics of e-mail messages. Content filtering
techniques rely on the specification of lists of words or regular expressions disallowed in mail messages,
so that the mail servers would reject any message containing these phrases. Header filtering instead
inspects the header of the e-mail, where information about the message is contained. This fields
are often spoofed by spammer in order to hide their identities or try to make the e-mail look more
legitimate than it is, but many of these spoofing techniques can be detected.
Spammers always try to disguise their messages in order to sidestep filtering. To do so they for
example spell words frequently used in spam messages, and therefore included in filtering lists, in
different ways to make it harder for the administrator to recognize them, or they may introduce
invisible-to-the-user HTML comments in the middle of those words; this techniques are anyway quite
easy to detect as the technique of sending spam consisting entirely of images so that the anti-spam
software can’t analyse the words. Content filtering can also be implemented to analyse the URLs
present in an e-mail message (spamvertise).
Statistical content filtering is a kind of document classification system which uses naive Bayes
classifiers to predict whether a message is spam or not, based on collections of spam and nonspam
(ham) e-mails submitted by users. This system requires no maintenance, but users must mark messages
as spam or ham so that the filtering software can learn from these judgements. Thus a statistical filter
can respond quickly to a change in spam content, without administrative intervention. Spammers
try to fight this technique by inserting many random but valid noise words or sentences into their
messages while attempting to hide them from view, making it more likely that the filter will classify
them as neutral. However these noise countermeasures are largely ineffective.

3.3 Automated techniques for e-mail senders


Not only e-mail administrators can control the amount of spam delivered. Also e-mail senders can use
different techniques to make sure they don’t send spam, so that they cannot be blocked and be put
on DNSBLs.
A recent method known as CAPTCHA is often used by ISPs and web e-mail providers on new
accounts to verify they’re legitimate users and not maybe a spammer trying to create new account
with automated machines. Also e-mail providers should verify credit cards used for subscription are

6
not stolen and check the Spamhaus Project ROKSO list before accepting new customers. One feature
spammers always try to exploit is the difficulty of implementation of opt-in mailing lists. To avoid this
it’s very important that mailing lists use instead confirmed opt-in , so that an address is never added to
a mailing list until the owner of that address confirms the opt-in. This point is very important because
it’s at the basis of anti-spam techniques and black lists such as those implemented by Spamhaus. To
combat spam firewall and routers can be useful too; these could for example be programmed to stop
SMTP traffic (through port 25) from those machines that are not supposed to send e-mail. As it may
happen that also home users are blocked by an ISP doing this, e-mail could still be sent from those
computers through port 587. All port 25 traffic can also be intercepted by a NAT (Network Address
Translator) and redirected to a mail server for verifications, for example for rate limiting.
An important contribution to fight spam is always well accepted from e-mail users. Spamcop for
example gathers spam reports from users and, by monitoring these reports, ISPs can learn of problems
before their mail servers are blacklisted.

3.4 Ongoing research


Many other new approaches have been proposed to improve the e-mail systems in fighting spam.
Some of these techniques are based on a sort of certification attached to the e-mail message, such
as a so called ham password, a proof that the message is a ham (not spam) message, or some kind
of electronic stamps which would imply a system of electronic micropayments with electronic money.
Others are actually based on real money; these are the so called cost-based systems that rely on the
fact that one of the reasons why spam has grown so much is that sending e-mail is completely for free,
so if a sender had to pay some cost in order to send spam it would be probably too expensive.
Another techniques that has been proposed is the proof-of-work system, which implies a payment
not in terms of money but in terms of computational load. A sender has to perform a calculation that
takes some time and the receiver will later verify this calculation but in much less time; doing so the
computational load for a spammer who wants to send millions of spam messages would be too high,
while a legitimate user who wants to send e-mail will just have to wait a few seconds more.
Also Microsoft Corp. chairman Bill Gates is active in spam fighting and proposed similar methods
and a new one based on money but not in all cases; the recipient of the e-mail message is free to decide
whether a message is spam or not. In the former case the sender (that is the spammer) would be
charged for a fixed sum, while someone sending a wanted and legitimate e-mail wouldn’t be charged
for anything by the recipient. Bill Gates was confident and quite sure about this method he announced
in 2002 that spam would have been over in 2 years, but as we all can see we’re still pretty much far
from a solution.

3.5 Techniques for researchers and law enforcement


Increasingly, anti-spam efforts have required co-ordination between law enforcement, researchers, ma-
jor consumer financial services companies and Internet service providers who need e-mail spam, identity
theft and phishing evidence to track and monitor the risks and activities. To do so honeypots are
often used. As we’ll see later in detail, honeypots are simply an imitation MTA looking like an open
relay or proxy, thus attracting spammers. This system will collect a large amount of spam e-mail and
will then submit addresses to DNSBLs, store the messages for further analysis or just discard them.

7
Chapter 4

Honeypots

A honeypot is a trap set to detect, deflect or in some manner counteract attempts at unauthorized
use of information systems. It is always disguised as something containing valuable information or
resources to attract attackers, in our case spammers. Honeypots are assigned unused IP addresses
and they have no production value, so that all the traffic they see is surely malicious or unauthorized.
For this reason we are sure that all the traffic passing through honeypots designed to thwart spam
is illicit. Honeypots’ IP addresses are usually hidden so that no user can find them, but they can be
collected by address harvesting techniques in order to be added in spammers mailing lists.
Honeypots can be classified depending on two factors. Based on the deployment, we can recognize:

• production honeypots, easy to use, mainly used to improve the security of an organization,
captures limited information about attacks and attackers;

• research honeypots, usually run by non-profit organizations to capture extensive information


about attacks and attackers and learn how to better protect against them.

The second classification is based on the level of involvement of the honeypot. We can distinguish the
following categories:

• low-interaction honeypots, called honeyd, GPL licensed daemons that works by emulating com-
puters on the unused IP addresses of a network and provides simple functionalities;

• mwcollect and nepenthes, used to collect autonomously spreading malware and obtain the mal-
ware binaries without being infected (as all it’s done in a virtualized environment);

• honeytraps, which create port listeners based on TCP connection attempts to monitor traffic
and handle some unknown attacks;

• high-interaction honeypots, called honeynets, which are networks of real systems containing
several honeypots.

After seeing all these classifications and types of honeypots, let’s concentrate on what we’re most
interested in: spam honeypots. These honeypots have been created to masquerade as abusable re-
sources such as open mail relays and open proxies which are very attractive for attackers, in order to
discover the activities of the spammers. Honeypots have very important functionalities. Not only they
block spam, but they make possible the determination of the source of the attack and bulk capture
of the spam, which will be analysed and will be useful to determine URLs and response mechanisms
used by spammers. For example for open relay honeypots it’s easy to deceive spammers determining
the e-mail addresses (dropboxes) used by spammers to target their test messages and transmitting
any illicit relay e-mail received addressed to that dropbox e-mail address, in order to indicate to the
spammer that the honeypot is a real abusable open relay. So, since the introduction of honeypots as
anti-spam tools, spammer have started using chains of abused systems to send spam, to make detec-
tion of the actual source more difficult. So one merit of honeypots is for sure having made the abuse
less easy and less safe for spammers.

8
Many non-profit organizations started using honeypots and spamtraps in order not only to block
a large amount of spam passing through or directly addressed to their honeypots, but also to analyse
spam messages and their senders. Doing so they were able to create large Block Lists (DNSBLs),
published on the web for free, that any ISP or mail server can query to control the traffic over
the respective networks. These organizations include The Spamhaus Project (www.spamhaus.org),
SORBS (www.au.sorbs.net) and SpamCop.net (www.spamcop.net).

9
Chapter 5

The Spamhaus Project

The Spamhaus Project is a volunteer effort founded by Steve Linford in 1998 that aims to track e-mail
spammers and spam-related activity. Spamhaus is responsible for three widely used DNS Blocklists
that many internet service providers use to reduce the amount of spam they take on.
Generating these three Blocklists, Spamhaus follows a strict policy and a precise definition of
spam is needed. So as we said before, e-mail messages are considered spam if they’re both bulk and
unsolicited (UBE); spam is not an issue about content, doesn’t matter what’s written in the message,
but about consent. For this reason it’s very important to understand the meaning of Opt-in, Opt-out,
Confirmed Opt-in. To Opt-in means to have one’s e-mail address added to a mailing list. Spammers
exploit the fact that once ad address is opted-in, the recipient rarely opts-out in a formal way to delete
his address from that mailing list, so he will go on sending spam to that address. From the legal point
of view that is still unsolicited e-mail and therefore spam. To send solicited e-mail the recipient must
have verifiably confirmed permission for the address to be included on the specific mailing list, by
confirming (responding to) the list subscription request verification.

5.1 Spamhaus DNSBLs


Spamhaus DNSBLs are a free public service offered to mail server operators on the internet. ISPs
and other large sites doing large numbers of queries can also sign-up for an rsync-based feed of these
DNSBLs, which Spamhaus calls its Data Feed, as long as they are not in Spamhaus’s top ten worst
spam service ISPs list, and they must also pass a background check to make sure they do not knowingly
or intentionally provide services to spammers. The three main DNSBLs of the Spamhaus Project are
the Spamhaus Block List (SBL), the Exploits Block List (XBL) and the Policy Block List (PLB).

5.1.1 DNSBL filtering


A DNSBL is a database that is queried in realtime by internet mail servers for the purpose of obtaining
an opinion on the origin of incoming email. The role of a DNSBL is to provide an opinion, to anyone
who asks, on whether a particular IP address meets Spamhaus’ own policy for acceptance of inbound
email. Every internet network that chooses to implement spam filtering is, by doing so, making a
policy decision governing acceptance and handling of inbound email. The receiver unilaterally makes
the choices on whether to use DNSBLs, which DNSBLs to use, and what to do with an incoming
email if the email message’s originating IP address is ”listed” on the DNSBL. The DNSBL itself, like
all spam filters, can only answer whether a condition has been met or not.

5.1.2 Spamhaus Block List - SBL


The Spamhaus Block List targets verified spam sources such as spammers, spam gangs and spam
support services. It is a database of IP addresses which do not meet Spamhaus’ policy for acceptance
of inbound e-mail. SBL listings are made based on the definition of spam as UBE and therefore there’s
no check on the content or legality of the message, but just a check whether it complies that definition
of spam or not. The listing criteria for the SBL is the following: sources of unsolicited bulk e-mail

10
sent to Spamhaus spamtraps or submitted to Spamhaus by trusted third party intelligence are listed
in the SBL; spam services, including mail, web, DNS and other servers identified as being an integral
part of a spam operation or being under the direct control of spammers are listed in the SBL; the SBL
also lists known spam operations and gangs listed in the ROKSO list (we’ll see it later), and services
supporting these known spam operations.
IP addresses are removed immediately from the SBL database upon receipt by the SBL Team of
notification from the IP owner (the Internet Service Provider responsible for assigning or routing the
IP address) that the reason for listing has been corrected or terminated. If this doesn’t happen, SBL
records are automatically removed after they time out. This time-out can be different for any entry
of the SBL list, depending on the spam source (anyway it’s always the entry editor to decide it). For
unidentified spammers it can be 2 to 14 days, persistent spammers may have time-outs of 6 months,
while known spam gangs can be listed for up to 1 year or more.

5.1.3 Exploits Block List - XBL


The Exploits Block List is a realtime database of IP addresses of hijacked PCs infected by illegal third
party exploits, including open proxies (HTTP, socks, AnalogX, wingate, etc), worms/viruses with
built-in spam engines, and other types of trojan-horse exploits. The XBL includes listings gathered
by Spamhaus as well as by other contributing DNSBL operations, the Composite Blocking List (CBL)
and the Not Just Another Bogus List (NJABL), two highly-trusted DNSBL sources, with tweaks by
Spamhaus to maximise the data efficiency and lower False Positives. The XBL can be used by setting
the mail server’s anti-spam DNSBL feature to query xbl.spamhaus.org this query will return a code
denoting the source of the data in the XBL zone. For example a return code such as 127.0.0.4 means
the data source is the CBL list, the return code 127.0.0.5 means the data source is the NJABL list
and so on.

5.1.4 Policy Block List - PBL


The Spamhaus PBL is a DNSBL database of end-user IP address ranges which should not be delivering
unauthenticated SMTP email to any internet mail server except those provided for specifically by an
ISP for that customer’s use, like dynamic and DHCP type IP address space designated as not allowed
to make direct SMTP connections, or static assignments that shouldn’t be sending email without prior
arrangement. Examples of such are an ISP’s core routers, corporate users required by policy to send
via their internal mail server, and unassigned IP addresses. Much of the data is provided to Spamhaus
by the owners (ISPs) of the IP address space. PBL IP address ranges are added and maintained by
each network participating in the PBL project, and by the Spamhaus PBL team particularly for those
networks not partecipating themselves to the project and where spam received by those IP ranges is
consistent with spaces containing high concentrations of botnet zombies, a major cource of spam.
The PBL can be queried directly as pbl.spamhaus.org. As response there will be also in this case
a return code which will be either 127.0.0.10 if the IP was entered by a participating ISP or 127.0.0.11
if it was entered by Spamhaus. NS lookup of an (inverse) address which is not listed in the PBL will
return NXDOMAIN.

5.1.5 Combined DNSBLs


Spamhaus also provides two combined DNSBLs. One is the SBL+XBL, which allows users to query
sbl-xbl.spamhaus.org once and get return codes from both lists. A newer combination is called ZEN,
which allows users to query zen.spamhaus.org once and get return codes from the SBL+XBL and the
newer PBL.
ZEN is the combination of all Spamhaus DNSBLs into one single blocklist to make querying faster
and simpler. ZEN can be queried from zen.spamhaus.org and as the other Spamhaus DNSBLs, it
returns a code. This code will be:

• 127.0.0.2, if the data source is the SBL, which will contain direct UBE sources, spam services
and ROKSO spammers;

11
• 127.0.0.4−8, if the data source is the XBL, which will contain illegal third party exploits (proxies,
worm, trojan;

• 127.0.0.10 − 11, if the data source is the PBL, which will contain non-MTA IP address ranges
set by outbound mail policy.

5.2 ROKSO
The Spamhaus Register of Known Spam Operations (ROKSO) is a database of ”hard-core spam
gangs” - spammers and spam operations who have been terminated from three or more ISPs due to
spamming. The ROKSO list is not a DNSBL; it is, rather, a directory of publicly-sourced information
about these persons and their business and at times criminal activities.
To be placed on the ROKSO list a spammer must first be terminated by a minimum of 3 ISPs for
AUP violations. Once listed in ROKSO, IP addresses under the control of ROKSO-listed spammers
are automatically and preemptively listed in the Spamhaus Block List. For qualified Law Enforcement
Agencies Spamhaus provides a special version of this ROKSO database which gives access to records
with evidence, logs and information on illegal activities of many of these gangs, too sensitive to publish
here.
Each spam operation, or ”spam gang”, consists on average of between 1 to 5 spammers. The
majority of the spammers on the ROKSO List operate illegally and move from network to network
and country to country seeking out Internet Service Providers with poor security or known for not
enforcing of anti-spam policies. Many of these spam operations pretend to operate ”offshore”. Those
who don’t hide behind anonymity pretend to be small ISPs themselves, claiming to their providers
that the spam is being sent not by them but by non-existent customers. When caught, almost all use
the age old tactic of lying to each ISP long enough to buy a few days or weeks more of spamming and
when terminated simply move on to the next ISP already set up and waiting.

5.3 DROP
The Spamhaus Don’t Route Or Peer (DROP) List is an advisory ”drop all traffic” list, consisting
of stolen zombie netblocks and netblocks controlled entirely by professional spammers. DROP is a
tiny sub-set of the SBL designed for use by firewalls and routing equipment. DROP is simply a
text list of these IP address spaces, with the numbers of the underlying SBL listings as comments.
When implemented at a network or ISP’s core routers, DROP can protect all the network’s users from
spamming, scanning, harvesting and DDoS attacks originating on rogue netblocks.

12
Chapter 6

SORBS

SORBS stands for Spam and Open Relay Blocking System. It is an open proxy and open mail relay
DNSBL, later improved with complementary lists that include various other classes of hosts. The
SORBS DNSBL was created in 2002 first as a private list, then launched to the public in 2003. In
the beginning it was conceived as an anti-spam project based on a daemon checking ”on-the-fly” if
the e-mail it received had passed through proxies and open relay servers. The DNSBL created in this
way listed thousands of compromised hosts and proxy servers. Lately SORBS has also expanded to
include in its list hacked and hijacked servers, formmail scripts, trojan infestations and now it also
pre-emtively lists all dynamically allocated IP address spaces.
SORBS provides many different zones identified as *.sorbs.net. Some examples are dnsbl.sorbs.net
(including all the other DNS zones except spam.dnsbl.sorbs.net), rhsbl.sorbs.net (containint all RHS
zones), and obviously all their sub-zones. SORBS also provides other aggregated zones such as
safe.dnsbl.sorbs.net, problems.dnsbl.sorbs.net, relays.dnsbl.sorbs.net, proxies.dnsbl.sorbs.net. This
zones are those which servers query and address for new entries requests. In addition to providing the
SORBS zones, SORBS also makes the ASPEWS and SPEWS data available by DNSBL lookup, but
as the policy of SORBS was the publishing of data that is fully under SORBS control, the ASPEWS
and SPEWS zones are not included in the SORBS aggregate zone.

6.1 DUHL
SORBS adds IP ranges that belong to dialup modem pools, dynamically allocated wireless, and DSL
connections as well as DHCP LAN ranges by using reverse DNS PTR records, WHOIS records, and
sometimes by submission from the ISPs themselves. These IPs form the so called DUHL (Dynamic
User and Host List). It is similar to other DUL lists, but while these list dial-up ranges only, the DUHL
also lists IP spaces where addresses are assigned dynamically, as the increasing use of cable modem
and DSL connections has made dial-up quite rare and simple DUL lists are no longer so efficient.
SORBS DUHL originally started life as a straight import of the Dynablock list maintained by
Easynet NL. SORBS accepts requests for adding or removing entries from ISPs responsible for a certain
IP address space, beside listing dynamically allocated addresses that SORBS comes across, typically
after receiving spam from them, and performing reverse DNS naming. Using rDNS, SORBS uses IETF
draft ”draft-msullivan-dnsop-generic-naming-schemes-00.txt” about static and dynamic assignment
recommendations, to understand whether a network allocated static or dynamic addresses, relying
on the respect of recommendations about naming schemes. Matthew Sullivan of SORBS proposed in
this draft that generic reverse DNS addresses include purposing tokens such as ”static” or ”dynamic”.
This draft has actually expired, and generally it is considered more appropriate for ISPs to simply
block outgoing traffic to port 25 if they wish to prevent users from sending email directly, rather
than specifying it in the reverse DNS record for the IP. Another very important thing is that SORBS
expects hosts with long TTLs, as short TTL values (especially under 1 hour) usually indicate the
record is about to change. Removal/deletion requests for example need the Time To Live of the PTR
record to be 43200 seconds or more.

13
6.2 Submissions and queries
Submissions to SORBS can be made for three different lists:

• The Dynamic User/Host List (DUHL). This is a IP based list, and therefore forms part of
dnsbl.sorbs.net, and is available seperately as dul.dnsbl.sorbs.net. SORBS accepts submissions
to DUHL only from its registered logins with registered e-mail address matching the WHOIS
record for the domain.

• The Bad DNS Config List. This is a domain based list (sometimes knows as a Right Hand
Side Block List - RHSBL), and forms part of rhsbl.sorbs.net. It is available seperately as
baddns.rhsbl.sorbs.net. This list is explictitly for domains with bad DNS configurations, that
can cause real problems with some mail servers. There are two reasons why hosts and do-
mains could be listed here: the first one is that at least one MX record points to 127.0.0.1/32,
0.0.0.0/8 or 255.255.255.0/8. The second one is that at least one MX record points to 10.0.0.0/8,
172.16.0.0/12, 192.168.0.0/16 or to any address 224.0.0.0 - 254.255.255.255 and does not have a
MX record in normal address space.

• The No e-mail from this domain list. Like the previous one, this is a domain based list part of
rhsbl.sorbs.net. It lists hosts and domains that will never be used for sending legitimate e-mail.
For example SuperNews admins have indicated that no mail will ever be sent from the domains
*.supernews.net.

SORBS can be queried by providing the address we want to check. This query will produce a return
code that indicates which database the test result was obtained from. If the query is made on aggregate
zones, the return code will still identify the specific zone from which the result was obtained. All return
codes are in the form 127.0.0.x. For example 127.0.0.2 refers to http.dnsbl.sorbs.net, 127.0.0.8 refers
to block.dnsbl.sorbs.net. If an IP address appears in more than one database, all applicable codes are
returned, so we can have multiple codes returned in order to know all the databases containing that
IP address.

6.3 SORBS certificates


SORBS also has its own CA (the SORBS Certificate Authority), a self-signed authority which issues
and signs certificates for e-mail clients, browsers and web servers. This certificate can be freely
downloaded from SORBS website (www.au.sorbs.net) and can be used to sign own e-mail messages.

14
Chapter 7

SpamCop

SpamCop is a free spam reporting service, which allows recipients of unsolicited bulk e-mail (UBE)
and unsolicited commercial e-mail (UCE) to report offenses to the senders’ ISPs, and sometimes their
web hosts. SpamCop uses these reports to compile a DNSBL of computers sending spam called the
”SpamCop Blocking List” (SCBL) and websites referenced in the spam are used to create the Spam
URI Realtime Blocklists (SURBL) RHSBL. SpamCop has tools for ISPs to manage the reports sent
to them, to see details on individual spam messages, and to mark incidents as resolved.

7.1 SpamCop Blocking List


The SpamCop Blocking List (SCBL) is a list of IP addresses which have transmitted reported email
to SpamCop users, which in turn is used to block and filter unwanted email. The SCBL is a fast and
automatic list of sites sending reported mail, with a number of report sources, including automated
reports and SpamCop user submissions. Being time-based the SCBL also quickly and automatically
delists these sites when reports stop.
The SCBL aims to block spam with minimal blocking or misidentification of wanted email. Wanted
e-mail may also be blocked and this may happen often, given the power of the SCBL and for this
reason this method should always be used together with whitelists containing wanted senders of e-mail.
The SCBL lists IP addresses reported both by SpamCop users and spamtraps. The system sending
spam e-mail to which the address refers, could either be a direct e-mail source such as a site’s primary
mail server or an indirect source like an open relay or open proxy that have been abused to send spam.
The number of reports referencing an IP are weighted by the SCBL against the total amount of e-mail
sent by that IP. However this is not a very good method as IPs sending a lot of spam may never be
listed if they also send a large amount of non-spam e-mail. SpamCop also monitors traffic through
sites using its SCBL as it’s queried at every SMTP transaction; the total amount of queries for each
IP address are counted, and the presence of that IP on the SCBL is checked, in order to estimate how
much e-mail is transmitted by each IP. When a sampled site queries the SCBL about an IP address
sending mail which is not reported mail, that host is given a reputation point, which will be used for
listing.
Some blocking lists block mail from misconfigured or insecure servers such as open proxies or open
relays, or from certain classes of machines such as machines with dynamically-assigned IP addresses
(see SORBS DUHL). The SCBL does not consider these characteristics. Instead, the SCBL lists only
IP addresses of machines that are sending reported email.

7.1.1 SCBL rules


Timeliness is key to the SCBL’s value. The automated queries results in fast listing of spam, which
increases the accuracy of the SCBL. Also, without any additional reports, a reported address stays on
the SCBL for only 24 hours. This limits the amount of damage if users make a mistake and report
legitimate mail using SpamCop.

15
The listing system operates based on the following rules, taking into account the reputation points
and number of reports.

• The SCBL lists IP addresses with a large number of reports relative to reputation points. The
treshold is manually set by the SpamCop team in order to make the list as accurate as possible.

• Reports are weighted in terms of freshness, which means on how recently the e-mail was received:

– most recently received reports are counted 4 : 1;


– reports for e-mail 48 hours and older are counted 1 : 1, with a linear sliding scale between
the most recent and 48 hours past;
– reports for e-mail more than one week old are ignored.

• total reports are weighted with respect to spamtrap reports scores in the following way: for
spamtrap scores less than 6, the number of spamtrap reports is multiplied by 5; for spamtrap
scores more than 7, this number is squared. This scores are then added to the total of reports.
For example:

– an IP address with 2 spamtrap reports and 3 SpamCop user reports will have a weighted
score of (2 ∗ 5) + 3 = 13
– a host with 7 spamtrap reports and 3 manual reports will have weighted score (7∗7)+3 = 52.

• The SCBL does not count reports regarding URLs or addresses in the body of the email. There-
fore, the SCBL does not list websites or email addresses used to receive replies in reported email,
unless that IP is also used to send the mail.

• The SCBL will not list an IP address with only one report filed.

• With only two reports against an IP address, the SCBL will list the IP address for a maximum
of 12 hours after the most recent reported mail was sent.

• The SCBL will not list an IP address if there are no reports against it within 24 hours.

• If a server sends bounces to an SCBL spamtrap in sufficient quantity to meet the listing criteria,
the SCBL will list that server. This situation results as some mail servers do not reject mail
during the SMTP transaction, but rather accept the mail and then send a bounce message later.
Viruses and spam often contain a forged From: field so if the e-mail is rejected or blocked during
the SMTP transaction, the bounce will go to the connecting IP. If the bounce comes after the
mail is accepted for delivery, then the bounce will go to the address in the From: field. Viruses
and spam often use addresses from the list of recipients to populate the From: field. Sometimes,
these addresses are spamtraps.

7.2 Limitations
For first-time SpamCop Reporters, the SpamCop Parsing and Reporting Service requires the reporter
manually verify that each submission is spam and that the destinations of the spam reports are correct.
People who use tools to automatically report spam, who report e-mail that is not spam, or report to
the wrong people may be fined or banned. This verification requires extra time and effort. Despite
these steps, reports to innocent bystanders do happen and ISPs may need to configure SpamCop to
not send further reports if they don’t want to see them again. SpamCop Reporters with a proven
track record are allowed to file Quick Reports, reducing both time and effort.
It is not clear whether reporting spam using SpamCop’s reporting service actually reduces the
amount of spam received, and complaints on SpamCop’s online forum provide anecdotal evidence
to support some skepticism about its effectiveness. Spammers who determine the identity of the
complaintants can, by doing so, also verify that the email addresses are still in use. What is clear is

16
that much spam email is filtered or blocked by the SCBL, which is fed by many SpamCop Reporters
reporting their spam.
That said, SpamCop is effective at helping ISPs, web hosts and email providers identify accounts
that are being abused and shut them down before the spammer finishes operations. Finally, SpamCop
provides information from its reports to third parties who are also working to fight spam, amplifying
the impact of its services beyond its own reach.
It is also remarkable in its own right that SpamCop has survived for so many years, considering
the severity of opposition other anti-spam companies have faced in the past. SpamCop has dealt with
attacks by spammers thus far by hiring services from Akamai, but is still the target of many hackers
and could face serious difficulties if it continues to grow in size and effectiveness. Significant offensive
weapons can be wielded by the criminal syndicates behind spammers. SpamCop views itself as an
attempt to stop spam without the necessity of governmental intervention, but because it lacks the
power of a government or large ISP, it may have greater difficulty dealing with spammers’ expertise
as well as the large ”bot” networks that they control and that they could use to perform a massive
DDoS attack.

17
Chapter 8

Conclusions

We’ve seen many different anti-spam techniques and in particular some based on honeypots and
spamtraps and how these techniques are used to create useful blocking lists and databases. The
introduction of these methods, as we’ve already said has the great merit to have made the abuse of
network exploitable resources harder and more subject to risks for spammers. Beside this, associations,
like The Spamhaus Project, which implement not only lists of simple IP addresses but also databases
with detailed descriptions and evidence of spammer’s attacks and techniques used, can be really helpful
if joint with an efficient legislation and law enforcement from the State.
Furthermore some interesting aspects come from the listing policies of these DNSBLs. Some are
created just thanks to feeds from honeypots or trusted third parties, while for example the SpamCop
SCBL also accepts feeds from its registered users and this can thus balance the filtering and listing
with respect to what users actually consider spam. On the other hand it’s true that not always this
method is efficient or at least we have no assurance of this, as for example not all spam reported by
some users will be blocked as the listing criteria is slightly more complicated.
Another relevant point about Spamhaus, SORBS, SpamCop and all the other honeypot-based
anti-spam organizations is the fact that there will always be a trade-off between not rejecting all the
spam vs blocking legitimate mail; some of them are often considered too aggressive. For this reason
it is very important to have a balanced listing criteria and it’s advisable to use whitelists in order
to prevent messages from wanted senders to be blocked. The last point to be considered is the price
in time to be paid for queries to the DNSBLs and databases, but this depends on each mail server
administrator’s sake.
In conclusion we can say that despite not being the ultimate anti-spam tool which will defeat
the problem of spam forever, honeypots have had a good impact in fighting spam and the three
organizations analyzed have been for years reason of matters for spammers. As I said, they still
require more co-ordination with law enforcement, that’s what they were created for, and less tolerance
on the State side, so that spammers would not just be blocked by few servers, but blocked in front of
a court.

18
Bibliography

[1] www.wikipedia.org

[2] www.cbsnews.com

[3] www.spamhaus.org

[4] www.au.sorbs.net

[5] www.spamcop.net

[6] Matthew Sullivan Spam and Open Relay Blocking System IETF Internet Draft

19

Anda mungkin juga menyukai