Anda di halaman 1dari 3

Entropy Based Analysis of DNS Query Traffic in the Campus Network

Dennis Arturo Ludeña Romaña


Graduate School of Science and Technology, Kumamoto University
Kumamoto 860-8555 JAPAN
and
Yasuo Musashi
Center for Multimedia and Information Technologies, Kumamoto University
Kumamoto 860-8555 JAPAN

ABSTRACT

We carried out the entropy based study on the DNS


query traffic from the campus network in a university
through January 1st, 2006 to March 31st, 2007. The
results are summarized, as follows: (1) The source IP
addresses- and query keyword-based entropies change
symmetrically in the DNS query traffic from the
outside of the campus network when detecting the
spam bot activity on the campus network. On the other
hand (2), the source IP addresses- and query keyword-
based entropies change similarly each other when
detecting big DNS query traffic caused by prescanning
or distributed denial of service (DDoS) attack from the
campus network. Therefore, we can detect the spam
bot and/or DDoS attack bot by only watching DNS
query access traffic.

Keywords: Bot, Bot Worm, Detection, DNS, Entropy Figure 1. A schematic diagram of a network observed in the
present study.

1. INTRODUCTION 2. OBSERVATIONS

It is of considerable importance to raise up a detection Network Systems


rate of bot worms (BWs), because they compromise We investigated traffic of the DNS query packet
not only the PC clients but also hijack the access between the top domain DNS (tDNS) server
compromised PC clients. After the hijacking, the BW- and the PC clients. Figure 1 shows an observed
infected PC clients become almost components of the network system in the present study, an optional
bot networks (bots) that are used to send a log of configuration of the BIND-9.2.6 server program
unsolicited mails like spam, phising, and mass mailing daemon in tDNS. The DNS server, tDNS, is one of
(SMTP proxy) and to execute distributed denial of the top level DNS (kumamoto-u) servers and plays an
service attacks [1-4]. important role of domain name resolution and
subdomain delegation services for many PC clients
Previously, we reported that the entropy based on the and the subdomain network servers in the university,
frequency of the DNS query keywords in the DNS respectively, and the operating system is CentOS 4.3
query traffic from the outside campus decreases Final and is currently employed kernel-2.6.9 with the
considerably when the entropy based on the frequency Intel Xeon 3.20 GHz Quadruple SMP system, the 2GB
of the source IP addresses increases [5] i.e. we can core memory, and Intel 1000Mbps EthernetPro
detect bot worm (BW) activity, especially as spam Network Interface Card.
bots on the campus network by only watching the
DNS query traffic from the other sites on the internet.
However, it is likely that we can find no investigation Capture of DNS Query Packets
on the comparison between the entropy based on the
frequency from the inside of the campus network. In tDNS, BIND-9.2.6 program package has been
employed as a DNS server daemon [6]. The DNS
In this paper, (1) we carried out the entropy based query packets and their keywords have been captured
study on the entropy based analysis of the DNS query and decoded by a query logging option (Figure 1,
traffic from the campus network, and (2) we discuss see % man named.conf in more detail). The log of
on the difference between the entropy based on the DNS query access has been recorded in the syslog
frequency from the inside of the campus network. files. All of the syslog files are daily updated by the
crond system. The line of syslog message mainly

42 SYSTEMICS, CYBERNETICS AND INFORMATICS VOLUME 6 - NUMBER 5 ISSN: 1690-4524


consists of the content of the DNS query packet like a
time, a source IP address of the DNS client, a fully
qualified domain name (A and AAAA resource record
(RR) for IPv4 and IPv6 addresses, respectively) type,
an IP address (PTR RR) type, and a mail exchange
(MX RR) type.

Estimation of Entropy

We employed Shannon’s function in order to calculate


entropy (randomness) H(X), as

H ( X ) = −∑ P (i ) log 2 P(i ) (1)


i∈X
where X is the data set of the frequency
freq(j) of IP addresses or that of the DNS query
keywords in the DNS query packet traffic from the
outside of the campus network, and the probability
P(i) is defined, as
freq(i )
P (i ) =
∑ freq( j )
j
(2)

where i and j (i, j X) represent the source IP address


or the DNS query keywords in the DNS query packet,
and the frequency freq(i) are estimated with the
following script program:

#!/bin/tcsh -f
Figure 2. Entropy changes in the DNS query traffic from the
cat querylog | grep -v "client 133\.95\." | tr '#' ' ' \ outside (A) and the inside (B) of the campus network to the
top domain name system (tDNS) server through January 1st,
| awk '{print $7}' | sort -r | uniq -c | \ 2006 to March 31st, 2007 (day-1 unit). The both solid and
sort -r >freq-sIPaddr dotted lines show entropies based on the data set of the
number of the unique source IP addresses and on the
cat querylog | grep -v "client 133\.95\." |\ frequency of the unique DNS query keywords, respectively.
awk '{print $9}' | sort -r | uniq -c |\ extracts only a seventh keyword as “source IP
address” in the message-line,the “sort -r | uniq -c |
sort -r >freq-querykeywords sort -r” commands sort the dataset of “source IP
Chart 1 addresses” into the dataset of “unique source IP
addresses” and estimate the frequencies of the unique
where “querylog” is a syslog file including syslog source IP addresses and the final results are written
messages of the BIND-9.2.6 DNS server daemon into the file “freq-sIPaddr”. (3) The last program
program[6]. The syslog message (one line) consists of group extracts the DNS query keywords from the
keywords as “Month”, “Day”, syslog message-lines, sorts the dataset of “DNS query
“hours:minutes:seconds”, “server name”, “named keywords” into the dataset of “unique DNS query
[process identifier]:”, “client”, ”source IP address# keywords” and estimates the frequencies of the unique
source port address:”, “query:”, and “DNS query DNS query keywords. Finally, the results of the last
keywords”. This script program consists of three program group are written into the file
program groups: (1) The first program group is a first “freqquerykeywords”. In the last program group,
line only including “#!/bin/tcsh -f” means that this although almost the commands, arguments, and their
script is a TENEX C Shell (tcsh) coded script options take the same as the second program group,
programs. (2) The second program group estimates the unix command “tr” and its arguments are removed
frequencies of the unique source IP addresses and the and a new argument “ ’{print $9}’ ” replaces the
unique source IP addresses, consisting of of unix arguments of the unix command “awk” in the second
commands from “cat” to “sort -r” because the program group. Entropy based packet traffic analysis
backslash “\” connects the line terminated by “\” with was suggested by Wagner and Plattner , recently [7].
the next line in the tcsh program. In this program
group, the “cat” shows all the syslog message-lines
from the syslog file “querylog”, the “grep -v” (or 3. RESULTS AND DISCUSSION
“grep”) command extracts only the message-lines
excluding (or including) the source IP address of Entropy Analysis on DNS Query Traffic
“133.95.x.y”, the “tr” replaces a character ’#’ with a
white space ’ ’, the unix command “awk ’{print $7}’ ” We illustrate the calculated entropy for the frequencies
of the unique source IP addresses and the DNS query

ISSN: 1690-4524 SYSTEMICS, CYBERNETICS AND INFORMATICS VOLUME 6 - NUMBER 5 43


keywords in the DNS traffic from the inside and the entropies change similarly each other when detecting
outside of the campus network to the top domain DNS big DNS query traffic caused by prescanning or
(tDNS) server through January 1st, 2006 to March distributed denial of service (DDoS) attack from the
31st, 2007, as shown in Figure 2. campus network.

In Figure 2A, we can observe several significant peaks From these results, it can be concluded that we can
of (i) January 24th, (ii) March 22nd, (iii) April 29th, detect the spam bot and/or DDoS attack bot in the
(iv) May 2nd, (v) June 5th, (vi) August 19th, (vii) campus network by only watching DNS query access
September 5th, 2006, and (viii) January 24th, 2007. traffic.
Fortunately, since we have received a lot of
complaining E-mail against the described peaks from We continue to develop detection technology based on
the other sites, the peaks have been assigned to the the results of the present paper and to evaluate of the
security incidents, as follows: the spam bot activities detection rate.
for (i)-(vi), the misconfiguration in the campus
subdomain DNS server for (vii), and the unknown for 5. REFERENCES
(viii), respectively. Interestingly, we can also notice
that the both entropy curves change symmetrically at
[1] P. Barford and V. Yegneswaran, “An Inside Look
each peaks. These results show that security incidents
at Botnets, Special Workshop on Malware Detection”,
in the university campus network can be detectable
Advances in Information Security, Springer Verlag,
when only observing the frequency of the domain
2006.
name resolution access from the outside of the campus
network. [2] J. Nazario, “Defense and Detection Strategies
against Internet Worms”, I Edition; Computer Security
In Figure 2B, on the other hand, we can find several Series, Artech House, 2004.
peaks of (a) July 29th, (b) August 20th, (c) September
[3] (a) J. Kristoff, “Botnets, detection and mitigation:
9th, 2006, (d) January 15th, and (e) March 15th, 2007.
DNS-based techniques”, Northwestern University,
Also, these peaks have already fixed, as: E-mail
2005, http://www.it.northwestern.edu/bin/docs/bots
spamming activity for (a), a crash of the local E-mail
server by the big SMTP traffic for (b), the DNS kristoff_jul05.ppt. (b) J. Kristoff, “Botnets”, North
misconfiguration for (c) in the local subdomain DNS American Network Operators Group (NANOG32),
servers, the big historical domain name resolution Reston, Virginia (2004), http://www.nanog.org/mtg-
traffic for (d) and (e) in which the DNS query traffic 0410/kristoff.html
were generated by crashes of the NIS-based [4] D. David, C. Zou, and W. Lee, “Model Botnet
authentication systems. Propagation Using Time Zones”, Proceeding of the
Network and Distributed System Security (NDSS)
Furthermore, we can obtain new findings when Symposium 2006;
comparing the unique source IP addresses- and DNS http://www.isoc.org/isoc/conferences/ndss/06/proceedi
query keywords based entropy curves each other in ngs/html/2006/
Figures 2A and 2B, respectively. This is because the
symmetrical changes emerges when detecting the [5] D. A. Ludeña R., H. Nagatomi, Y. Musashi, R.
spam bot activity, however, simultaneous changes take Matsuba, and K. Sugitani, “A DNS-based
place when receiving the unusual big DNS query Countermeasure Technology for Bot Worm-infected
traffic from the campus network because of DNS PC terminals in the Campus Network”, Journal for
related misconfiguration at the local subdomain and/or Academic Computing and Networking, Vol. 10, No. 1,
overloaded crash at the local E-mail servers. pp.39-46 (2006)
[6] BIND-9.2.6: Internet Systems Consortium
As a result, it can be clearly concluded that entropy http://www.isc.org/products/BIND/
based analysis on the DNS query traffic provides an
important information on the security incidents in the [7] A. Wagner and B. Plattner, “Entropy Based Worm
campus network. and Anomaly Detection in Fast IP Networks,
Proceeding of 14th IEEE Workshop on Enabling
Technologies: Infrastructure for Collaborative
4. CONCLUSIONS Enterprises (WETICE 2006), Liköping, Sweden,
pp.172-177, 2005
We investigated on the DNS query traffic from the
campus network in a university through January 1st,
2006 to March 31st, 2007 employing entropy based
statistical analysis method. The following interesting
results are obtained, as: (1) The source IP addresses-
and query keyword-based entropies change
symmetrically in the DNS query traffic from the
outside of the campus network when detecting the
spam bot activity on the campus network. (2), the
source IP addresses- and query keyword-based

44 SYSTEMICS, CYBERNETICS AND INFORMATICS VOLUME 6 - NUMBER 5 ISSN: 1690-4524

Anda mungkin juga menyukai