ABSTRACT
Keywords: Bot, Bot Worm, Detection, DNS, Entropy Figure 1. A schematic diagram of a network observed in the
present study.
1. INTRODUCTION 2. OBSERVATIONS
Estimation of Entropy
#!/bin/tcsh -f
Figure 2. Entropy changes in the DNS query traffic from the
cat querylog | grep -v "client 133\.95\." | tr '#' ' ' \ outside (A) and the inside (B) of the campus network to the
top domain name system (tDNS) server through January 1st,
| awk '{print $7}' | sort -r | uniq -c | \ 2006 to March 31st, 2007 (day-1 unit). The both solid and
sort -r >freq-sIPaddr dotted lines show entropies based on the data set of the
number of the unique source IP addresses and on the
cat querylog | grep -v "client 133\.95\." |\ frequency of the unique DNS query keywords, respectively.
awk '{print $9}' | sort -r | uniq -c |\ extracts only a seventh keyword as “source IP
address” in the message-line,the “sort -r | uniq -c |
sort -r >freq-querykeywords sort -r” commands sort the dataset of “source IP
Chart 1 addresses” into the dataset of “unique source IP
addresses” and estimate the frequencies of the unique
where “querylog” is a syslog file including syslog source IP addresses and the final results are written
messages of the BIND-9.2.6 DNS server daemon into the file “freq-sIPaddr”. (3) The last program
program[6]. The syslog message (one line) consists of group extracts the DNS query keywords from the
keywords as “Month”, “Day”, syslog message-lines, sorts the dataset of “DNS query
“hours:minutes:seconds”, “server name”, “named keywords” into the dataset of “unique DNS query
[process identifier]:”, “client”, ”source IP address# keywords” and estimates the frequencies of the unique
source port address:”, “query:”, and “DNS query DNS query keywords. Finally, the results of the last
keywords”. This script program consists of three program group are written into the file
program groups: (1) The first program group is a first “freqquerykeywords”. In the last program group,
line only including “#!/bin/tcsh -f” means that this although almost the commands, arguments, and their
script is a TENEX C Shell (tcsh) coded script options take the same as the second program group,
programs. (2) The second program group estimates the unix command “tr” and its arguments are removed
frequencies of the unique source IP addresses and the and a new argument “ ’{print $9}’ ” replaces the
unique source IP addresses, consisting of of unix arguments of the unix command “awk” in the second
commands from “cat” to “sort -r” because the program group. Entropy based packet traffic analysis
backslash “\” connects the line terminated by “\” with was suggested by Wagner and Plattner , recently [7].
the next line in the tcsh program. In this program
group, the “cat” shows all the syslog message-lines
from the syslog file “querylog”, the “grep -v” (or 3. RESULTS AND DISCUSSION
“grep”) command extracts only the message-lines
excluding (or including) the source IP address of Entropy Analysis on DNS Query Traffic
“133.95.x.y”, the “tr” replaces a character ’#’ with a
white space ’ ’, the unix command “awk ’{print $7}’ ” We illustrate the calculated entropy for the frequencies
of the unique source IP addresses and the DNS query
In Figure 2A, we can observe several significant peaks From these results, it can be concluded that we can
of (i) January 24th, (ii) March 22nd, (iii) April 29th, detect the spam bot and/or DDoS attack bot in the
(iv) May 2nd, (v) June 5th, (vi) August 19th, (vii) campus network by only watching DNS query access
September 5th, 2006, and (viii) January 24th, 2007. traffic.
Fortunately, since we have received a lot of
complaining E-mail against the described peaks from We continue to develop detection technology based on
the other sites, the peaks have been assigned to the the results of the present paper and to evaluate of the
security incidents, as follows: the spam bot activities detection rate.
for (i)-(vi), the misconfiguration in the campus
subdomain DNS server for (vii), and the unknown for 5. REFERENCES
(viii), respectively. Interestingly, we can also notice
that the both entropy curves change symmetrically at
[1] P. Barford and V. Yegneswaran, “An Inside Look
each peaks. These results show that security incidents
at Botnets, Special Workshop on Malware Detection”,
in the university campus network can be detectable
Advances in Information Security, Springer Verlag,
when only observing the frequency of the domain
2006.
name resolution access from the outside of the campus
network. [2] J. Nazario, “Defense and Detection Strategies
against Internet Worms”, I Edition; Computer Security
In Figure 2B, on the other hand, we can find several Series, Artech House, 2004.
peaks of (a) July 29th, (b) August 20th, (c) September
[3] (a) J. Kristoff, “Botnets, detection and mitigation:
9th, 2006, (d) January 15th, and (e) March 15th, 2007.
DNS-based techniques”, Northwestern University,
Also, these peaks have already fixed, as: E-mail
2005, http://www.it.northwestern.edu/bin/docs/bots
spamming activity for (a), a crash of the local E-mail
server by the big SMTP traffic for (b), the DNS kristoff_jul05.ppt. (b) J. Kristoff, “Botnets”, North
misconfiguration for (c) in the local subdomain DNS American Network Operators Group (NANOG32),
servers, the big historical domain name resolution Reston, Virginia (2004), http://www.nanog.org/mtg-
traffic for (d) and (e) in which the DNS query traffic 0410/kristoff.html
were generated by crashes of the NIS-based [4] D. David, C. Zou, and W. Lee, “Model Botnet
authentication systems. Propagation Using Time Zones”, Proceeding of the
Network and Distributed System Security (NDSS)
Furthermore, we can obtain new findings when Symposium 2006;
comparing the unique source IP addresses- and DNS http://www.isoc.org/isoc/conferences/ndss/06/proceedi
query keywords based entropy curves each other in ngs/html/2006/
Figures 2A and 2B, respectively. This is because the
symmetrical changes emerges when detecting the [5] D. A. Ludeña R., H. Nagatomi, Y. Musashi, R.
spam bot activity, however, simultaneous changes take Matsuba, and K. Sugitani, “A DNS-based
place when receiving the unusual big DNS query Countermeasure Technology for Bot Worm-infected
traffic from the campus network because of DNS PC terminals in the Campus Network”, Journal for
related misconfiguration at the local subdomain and/or Academic Computing and Networking, Vol. 10, No. 1,
overloaded crash at the local E-mail servers. pp.39-46 (2006)
[6] BIND-9.2.6: Internet Systems Consortium
As a result, it can be clearly concluded that entropy http://www.isc.org/products/BIND/
based analysis on the DNS query traffic provides an
important information on the security incidents in the [7] A. Wagner and B. Plattner, “Entropy Based Worm
campus network. and Anomaly Detection in Fast IP Networks,
Proceeding of 14th IEEE Workshop on Enabling
Technologies: Infrastructure for Collaborative
4. CONCLUSIONS Enterprises (WETICE 2006), Liköping, Sweden,
pp.172-177, 2005
We investigated on the DNS query traffic from the
campus network in a university through January 1st,
2006 to March 31st, 2007 employing entropy based
statistical analysis method. The following interesting
results are obtained, as: (1) The source IP addresses-
and query keyword-based entropies change
symmetrically in the DNS query traffic from the
outside of the campus network when detecting the
spam bot activity on the campus network. (2), the
source IP addresses- and query keyword-based