Summary
This document is the report for my diploma thesis project. What it deals with is PeerTo-Peer Wireless Network Confederation (P2PWNC) and, in particular, an Administrative Domains Local Traffic Logging/Accounting and Monitoring System. The above terms will be more thoroughly discussed in following chapters. As stated in [1], [2], [3] and [4], a Peer-To-Peer Wireless Network Confederation (P2PWNC) is a community of WLAN Administrative Domains (ADs) that offer network access to each others registered users. An AD provides internet services to P2PWNC users of other AD to compensate for the services that its own registered users enjoy from other AD when they roam. This roaming scheme is decentralized, leaving ADs make their own decisions on the amount of resources they contribute to the P2PWNC system. ADs are composed of some modules, such as the WLAN control module, the user authentication module, the P2PWNC module, etc. This document deals with the traffic logging and analysis subsystem, which can be considered as a section of the LANs Control Module of an AD. As a matter of fact, though, the information this subsystem makes available can be used by other modules, too. The traffic logging subsystem is divided in two sections. The first one is a network packet capturing, analysis and logging daemon program. It captures packets that pass from the network interface where a wireless LAN access point is connected and analyses them. The purpose of this analysis is to gather aggregate traffic statistics and application layer protocol information about the P2PWNC users. The second section of the system is an XML-based statistics retrieval and exchange protocol. It is a client server protocol designed for the retrieval and exchange of the statistics that the packet logging and analysis daemon generates. Clients do not have direct access to the database where the statistics are stored. Instead, they issue properly formed requests (XML documents) to a server, which, in turn, queries the database and returns the results to the clients in messages of a protocol-specified format. Apart from the specification of the protocol, the implementation of a typical server and client of it is discussed and presented. As this is the first version of this system, it is expected that further improvement is possible or needed on some topics. Future work must be bone on issues that have to do with the packet logging and analysis daemons performance as well as its traffic analysis capabilities. As far as the statistics retrieval, exchange and presentation is concerned, issues of security are of greater importance. Also, extending the statistics exchange protocol and developing servers and clients with more functionality and operating system independence would enhance statistics retrieval and presentation capabilities.
Table of Contents
Chapter 1 INTRODUCTION 6
1.1 About Peer-To-Peer Networking 1.2 Peer-To-Peer Wireless Network Confederation 1.2.1 System Overview and Terminology 1.2.2 Modules and Subsystems 1.2.2.1 WLAN Control Module .. 1.2.2.2 Authentication and User Identification Module .. 1.2.2.3 Local AD Services Module . 1.2.2.4 Internet Connectivity Module .. 1.2.2.5 P2PWNC Management Module .. 1.2.2.6 Local P2PWNC Policy Module .. 1.2.3 Administrative Domains Local Traffic Accounting and Monitoring System . 6 6 6 8 8 8 8 8 8 8 9
2.5.2.2.2. HTTP Request Tracking Algorithm 46 2.5.2.3 FTP Tracking .. 53 2.4.2.3.1 Typical FTP Scenario . 53 2.4.2.3.2 FTP Connection Tracking Algorithm .. 55 2.5.2.4 SMTP Tracking ... 59 2.4.2.4.1 Typical SMTP Scenario .. 59 2.4.2.4.2 SMTP Connection Tracking Algorithm .. 60 2.5.2.5 Pop3 Tracking . 64 2.5.2.5.1 Typical POP3 Scenario 64 2.5.2.5.2 POP3 Connection Tracking Algorithm 66 2.6 Demonstration .. 73
3.5.1 Introduction . . 3.5.2 XML parsing ... 3.5.3 Server implementation 3.5.3.1 Architecture of the server program . 3.5.3.2 Login and Logout Functions ... 3.5.3.3 Password changing function 3.5.3.4 Statistics retrieval function . 3.5.4 Client Implementation 3.5.4.1 Login Function . 3.5.4.2 Logout Function .. 3.5.4.3 Password changing function 3.5.4.4 Statistics retrieval function . 3.6 Demonstration ...
97 98 101 101 102 104 106 112 113 113 113 114 115
Conclusions ... 126 APPENDIX .... 128 Installation of the programs .... 128 Setting up the database .... 129 Configuration ... 129 Execution and usage . 130 REFERENCES .... 132 LINKS .... 133
Figures
Figure 1. P2PWNC Architecture . 7 Figure 2. Administrative Domains Linux Box .. 11 Figure 3. Traffic Logging Subsystem High Level View .. 12 Figure 4. Login State .. 13 Figure 5. Logout State . 14 Figure 6. Subsystems IPC communication using Shared Memory 16 Figure 7. shmAppendUser() Determining new nodes position .. 22 Figure 8. removeFromList() shifting nodes back . 24 Figure 9. After removeFromList() . 24 Figure 10. Libpcap architecture .. 27 Figure 11. Packet Capturing Daemon Architecture .....36 Figure 12. Ethernet Packet Format .. 38 Figure 13. FTP command reply sequence 55 Figure 14. POP3 command reply sequence . 66 Figure 15. FTP stored statistics ... 73 Figure 16. SMTP stored statistics 74 Figure 17. Statistics Client and Server Architecture and Interconnection with the Traffic Logging Subsystem .....75 Figure 18. XML Parser Benchmark Test . 98 Figure 19. Login Form .116 Figure 20. General Statistics 117 Figure 21. Aggregate traffic ttatistics ..117 Figure 22. HTTP statistics . .118 Figure 23. FTP statistics .. 119 Figure 24. SMTP statistics ..119 Figure 25. POP3 statistics 120 Figure 26. Password changing form 121 Figure 27. Connected administrator information .121 Figure 28. About box 122
Chapter 1 INTRODUCTION
1.1 About Peer-To-Peer Networking
A Peer-To-Peer (P2P) Network is a network comprised of autonomous and equivalent entities. This is a fairly old concept in the area of Computer Networks and Communications. A pure P2P system is decentralized. That is, there is no central entity coordinating communication and interaction between peers. Today, there are numerous examples of the Peer-To-Peer network model. Most of them, like Kazaa (www.kazaa.com) and Gnutella are file sharing systems.
be rewarded) and the whole community (its function would be more regular and systems use would be more sufficient). Rule enforcement is based on a distributed accounting model which is briefly described below. Although peers (ADs) have the autonomy as to the usage and availability of their resources, the system imposes distributed constraint structures so that peers have an incentive to conform to the community rules. Every time a peer receives or offers service, messages are exchanged and distributed accounting records are updated. The messages exchanged include signed receipts that prove the provision of the service. Therefore, forging the global accounting statistics of the system is made harder to achieve. It is easy for a peer to deduce the rate of consumption of any other and this can be achieved by inspecting and aggregating the above receipts. Although forging the statistics is possible, distributed accounting provides the functionality of gathering aggregated opinion about a peer by querying other peers about the services offered to / provided by it and thus assessing its reputation. The combination of the P2PWNC peer-to-peer nature and the set of community-wide rules described above offer the system the following advantages over other solutions: - Scalability - Decentralization - Flexibility and low complexity - Economic efficiency The architecture of a P2PWNC is shown in the following figure
AD Black
AD White AD Grey
P2P view
communication between peers and are a distinctive characteristic of a Peer-To-Peer Wireless Network Confederation member (P2PWNC Management Module, Local P2PWNC Policy Module)
AND
ANALYSIS
The traffic logging subsystem is in fact a packet capturing and analysis daemon. This daemon is responsible for capturing packets from a defined network device that belongs to the router and analyzing them so that it can gather some aggregate traffic statistics as well as application-specific information about the P2PWNC users. The information is grouped by application protocol. For example, for the HTTP protocol the statistics available include the HTTP requests users have made and in particular the host name, request method (GET, POST, etc), the user agent (e.g. Mozilla, Microsoft IE, etc) and the request URI. In a similar way, the traffic logging daemon can track down information about other application level protocols (FTP, SMTP, POP3). Data is stored in a MySQL Database.
2.1.1 Hardware
The system was developed and tested on an Intel Pentium III (800 MHz) computer with 256 Mbytes of RAM. It was also tested on an Intel Celeron (500 MHz) with 64 Mbytes of RAM. These systems included an 802.11b Access Point, and a Network Interface Card for Internet connectivity. The experiments included two Compaq N610c laptop machines, with 512 Mbytes of RAM and a ZoomAir 4100 802.11b card in ad hoc mode.
The programs were implemented in the C language and the compiler used was gcc. The editor used was mainly KWrite. The debugger used was gdb. For database viewing, phpMyAdmin was used ([7]). phpMyAdmin is a web interface to MySQL written in php.
Wireless Users
Internet Gateway
A. P
The packet capturing daemon captures every single packet that the network interface that is being watched sends or receives. However, not every packet captured is important for the system. We only care about packets that are sent or received by users registered to the P2PWNC and are online at the moment. This means that there has to be a mechanism of distinguishing which packets are important for accounting reasons and should be further analyzed and which should be ignored by the system. The above are shown in the next figure, which describes at a high level what is happening when a user sends a network packet.
11
Sending Packet Source IP: xxx.xxx.xxx.xxx Packet Captured user@domain ip: xxx.xxx.xxx.xxx
IP src/dst check
Packet Ignored
Database
Statistics / Info
Packet Analysis
Figure 3. Traffic Logging Subsystem High Level View The above example involved a user registered to an AD of the P2PWNC who was causing network traffic through the AD. The system figured out that a packet it captured belonged to him and the next step was to further analyze the packet (its header and probably its payload data) and update the database statistics. For example, if it was an ftp packet, the system would increment the total ftp upload statistics by the length of this packet and if the packet payload carried some extra information about the ftp connection (e.g. the users ftp account name or his password), this information would be tracked written in the database. Capturing a packet that is to be received by the user is an almost identical case. In case the daemon captures a packet which is not found to belong to any online registered user, it is ignored and no further analysis takes place. There are three discrete user states that can be distinguished in this system, the login state, the connected state and the logout state.
any moment will be thoroughly discussed at a following chapter). From now on, the user is identified by the username IP address pair and he enters the connected state. What takes place during the login state is shown in the following figure.
DB Updating Database and Online Users List Traffic Logging Subsystem Shared Memory Online Users List
Figure 4. Login State the login state. At this time, the user can be identified by the user name IP address pair. During the connected state the user can make use of the internet services and facilities an Administrative Domain can offer. User accounting starts at the moment he has logged in and stops when the user has exited the connected state. During this state, every packet the user sends / receives is captured and examined. After the examination and extraction of any useful information, user statistics are updated, as mentioned before.
13
DB
Updating Database and Online Users List (Setting user offline, removing from online users list
14
Pipes can be regarded as files, which can be named or unnamed. A pipe is a one-way communication channel between two processes. Unnamed pipes are used mainly for communication between a parent and a child (or forked) process. Named pipes are more appropriate for communication between different programs that share the same file system. Sockets are a more general and efficient way of IPC than pipes. They can be considered as logical files that can achieve two-way communication. Usually, sockets are used in network and distributed programming. Some socket types are used for communication between kernel and userspace (e.g. netlink sockets). Message Queues are similar to pipes. However, they allow messages to be tagged with specific message types. Therefore, they allow messages different message types to be exchanged. Unlike sockets, they can only used for communication between processes running on the same machine. Message queues and pipes were mainly used in older UNIX systems and the idea of using them in modern programs has started to be abandoned. Threads, which are in fact Lightweight Processes, enable processes to share their fundamental parts, that is their code, data, stack, file I/O and signal tables. There are both user-level and kernel-level threads. Finally, a way of IPC in Linux and UNIX environments is by using shared memory. As its name implies, shared memory is a memory segment where more than one processes can have access. The shared memory segment is created by one process and other processes can access it, given that they have access privileges to that segment. This is a fast way of IPC and it is appropriate for cases when processes need to use a shared resource (memory).
15
shmhandleuser function
Authentication Module
Figure 6. Subsystems IPC communication using Shared Memory shmhandleuser is in fact the process that implements the middle level between the two modules. Every time a login / logout event takes place, the authentication module must call this process, which adds / removes the user to/from the list of online users (which is in the shared memory segment). The presence of this middle level between the two subsystems is needed so that they are as more independent from one another as possible. Every time a user logs in and is assigned an IP address or a user logs out and his IP address is released, the authentication module can call the external program shmhandleuser via a system, exec or a similar system call. shmhandleusers arguments are: - the P2PWNC users username - the users IP address (newly assigned or released IP) - a flag (0 or 1) indicating whether a logout or login event has taken place. For example, if the authentication module program was written in C, a call of the following format would be issued:
/* authentication program code */ . . . system (shmhandleuser username@domain xxx.xxx.xxx.xxx 1); . . . /* more code */
The above piece of code adds the user username@domain with the assigned IP address xxx.xxx.xxx.xxx (it is supposed that all database updates concerning the new user is a task the authentication module has already carried out). Obviously, the authentication program could have been written in another programming language. In that case, the equivalent system call should be issued. The generality and independence of this approach lies in the fact that the communication module is not bound to the authentication module. Therefore, even if the authentication module was created again from scratch, there would not have to be any changes in the middle level. The only thing the programmer would have to do would be to include the shmhandleuser call in his code every time a login / logout event would take place. Also, this approach enables the creation of a central entity that can control the whole AD system. One could create a controlling module which would coordinate the authentication, the traffic logging and other AD modules. In such a case, for example, the controlling module could search in the database where user
16
information is stored on a regular basis (e.g. every second) to find out if a new user has logged into the system or if logouts have taken place. After that, the controlling module would issue shmhandleuser calls for every user that has arrived / exited the system. The list of the online users, as it may have been made clear, is located in a memory segment that is shared between the packet capturing daemon and the shmhandleuser program. This list is implemented as a kind of a linked list. It is composed of nodes which have the following format:
struct usrnode{ /* User List Node */ char username[200]; char ipaddr[20]; int count; int updated; int pos; struct usrnode *next; }; typedef struct usrnode unode;
usrnode: This struct is a node of the online users list data structure username: Users username (usually of the format: username@domain ) ipaddr: The assigned IP address count: Number of nodes currently in the list. This field only makes sense for the head of the list next: Pointer to the next node of the list. The above structure, as well as the list handling functions are defined and implemented in the usrlist.h file. If the pcap daemon program wishes to find out whether a newly captured packet is sent or is to be received by an online user, what it has to do is search the user list to check if the packets source or destination IP address matches with any of the users in the list. Obviously, this way the packet capturing daemon is always informed of the users that are online at any time and is instantly notified of any change in a users state (online / offline). As mentioned before, there are two processes that can have access to the shared memory block. Of these two processes, only shmhandleuser actually writes on that block. The packet capturing daemon (packet_cap process) only reads from that memory area. That is, the daemon only searches the list of users located there and never actually writes anything on it. The other process is the one that adds and removes nodes from the list. As it seems, the synchronization problems are made less serious, because there is not any chance that both processes will try to write on the same segment at the same time. The way the two processes work in terms of the shared memory is as follows. - The packet capturing daemon (packet_cap) first creates the shared memory segment:
17
/*creating mem segment*/ mid = shmget(M_KEY, MAXUSERS*sizeof(unode), IPC_CREAT|PERMS); if (mid == -1) { printf("ERROR GETTING MEM..\n"); exit(1); } [ code taken from packet_cap.c ]
The above function (shmget) takes three parameters. The first is the key of the shared memory segment. The second argument is the size of the shared memory block that will be allocated. In this case, we have to allocate size as much as the size of the maximum number of users (MAXUSERS, which in our case has the value 2000) our system permits. The third argument includes the flags that control access to the shared memory block. IPC_CREAT indicates that shmget must create a new shared memory segment, whereas PERMS defines access rights to the block (in our case, it is 0666). Then, packet_cap must map the shared memory block to its own address space. This is achieved with the following call:
usrListHead = (unode*)shmat(mid, NULL, 0);
usrListHead is a pointer to a unode struct which is declared as static in another part of the program (static unode* usrListHead;) and represents the head of the users list residing in the shared memory. usrListHead is in fact a dummy node. It is used for access to the users list. The members username and ipaddr have no value. The most important thing is that the count member of usrListHead reports the number of nodes in the list. Also, the pos member, which indicates the position of the node in the list, has a zero value. A call to the shmat function attaches the shared memory segment identified by mid to the address space of the calling process and returns a pointer to that memory area. Following that, the packet_cap process has to search the database to check if there are any online users. In issues the following SQL query:
SELECT u_username, u_ip_addr FROM users WHERE u_online='y'
The above fields are self-explanatory. The packet capturing daemon then checks the query results and adds the users that are found online to the online users list (of the shared memory). This requires a call to the shmAppendUser function, which is declared in the urslist.h file.
shmAppendUser(usrListHead, usrListHead + (usrListHead-> count)*sizeof(unode), row[0], row[1]);
The above function adds a new user in the end of the online users list, in the shared memory block. The first argument of the function is the pointer to the head of the user list (usrListHead) as it was described before. The second parameter is the exact position in the shared memory block where the new node will be placed (the new node has to be placed inside the shared memory block and, particularly, at the end of the list). The third parameter is the new nodes username and the fourth the new
18
nodes ipaddr. The above is the only case when the packet capturing daemon actually writes to the shared memory segment. In all other cases, the daemon only reads. The reason why the packet capturing program has to check the database for online users on its startup is that the authentication module may be already running at the moment that the traffic logging module is starting up. This means that there is the possibility that users may have already been assigned an IP address (as it was said before, the database updates as far as the users status is involved are a task that the authentication module is responsible for carrying out). - Every time a login or logout event takes place, the shmhandleuser is called by the authentication module. shmhandleuser gets a descriptor of the shared memory segment where the online users list is located in a similar way as the packet capturing program did:
mid = shmget(M_KEY, MAXUSERS*sizeof(unode), PERMS); if (mid == -1) { printf("ERROR GETTING MEM..\n"); exit(1); } [ code taken from shmhandleuser.c ]
mid is the shared memory descriptor returned by the shmget function. The parameters this function takes were described in the previous section. It should be noted that the third argument of the function contains only the access permissions to the memory block, while in the previous case there was the flag IPC_CREAT which indicated that shmget was creating the memory segment. Then, shmhandleuser program must obtain a pointer to the shared memory area. This is achieved calling shmat function, which was also described in the previous section. After attaching the memory block to the address space of the calling function, shmhandleuser decides what to do with the specified user. According to the flag specified as the last argument of shmhandleuser the program can either add or remove the user from the online users list. These are shown in the next code fragment:
if (atoi(argv[3])) { shmAppendUser(mem, mem + (mem->count)*sizeof(unode), argv[1], argv[2]); } else { removeFromList(mem, argv[1]); } [ code taken from shmhandleuser.c ]
Obviously, if the third argument is non-zero, the user with a username specified by the first argument of the program and an ipaddr specified by the second command line argument is appended to the users list. In case the third argument (flag) is zero, the user is removed from the list. The function that removes users is removeFromList. Its first parameter is the shared memory descriptor and the second one the username field of the node that is to be removed. 19
Finally, shmhandleuser must detach the shared memory segment it has attached to its address space. This is achieved by a call to shmdet:
/*detaching mem block..*/ shmdt(mem);
The parameter shmdt takes is the pointer to the shared memory block, which was acquired by the shmat call. - After discussing what is happening on packet_cap startup and what is taking place every time a login / logout event takes place, it is time for discussing what actions the packet capturing daemon has to perform when it terminates. Normal program termination takes place when the packet capturing daemon is sent the SIGINT or SIGTERM signal, that is, when a user sends the Ctrl-C command or the kill %processid command. In such a case, the signal handling function is called. Its prototype is :
void termhandler(int sig);
The parameter sig is the signal code of the received signal (SIGINT or SIGTERM). The operations termhandler performs, as far as the shared memory is concerned) are show in the next piece of code:
shmdt(usrListHead); if (shmctl(mid, IPC_RMID, NULL)) { printf("ERROR REMOVING MEM...\n"); exit(1); } else { printf("Shared memory segment removed successfully. mid: %d\n", mid); } [ code taken from packet_cap.c ]
First the shared memory block is detached from the processs address space and then the block (with the mid descriptor) is removed calling the function shmctl, passing the flag IPC_RMID as the second argument. In case there is an abnormal program termination (e.g. a SIGSEGV signal) there is a facility program called killmem which removes the shared memory segment created by the packet capturing program with the shmget call. In the above functions, we made use of the variables M_KEY and PERMS. These variables are static:
static int M_KEY; static int PERMS;
Their values are read by the packet capturing daemons configuration file (packet_cap.conf). The function that reads the information stored in the configuration
20
file and gives values to the appropriate values is called parseConf. It is a void function with the following prototype:
void parseConf();
If the configuration file is not in this location, then the parseConf function searches for the file in the same path where the applications binary is located. These are shown in the next piece of code which is in the body of the parseConf function:
if (!(f = fopen(PACKET_CAP_CONFFILE, "r+"))) { if (!(f = fopen("packet_cap.conf", "r+"))) { printf("ERROR OPENING CONF FILE\n"); return; } } [ code taken from packet_cap.c ]
second is a unode pointer which points to the place in memory that the new node will be placed. In order to ensure that the new nodes address is inside the allocated shared memory block, we pass as the second parameter of shmAppendUser the address of the first empty place in the shared memory segment where a unode can be placed. The address of this position is:
usrListHead + (usrListHead->count)*sizeof(unode);
where usrListHead is the pointer to the head of the list, a dummy node which stores the number of nodes in the list (usrListHead->count). The following figure will help to better understand this method.
head
Unode1
Last unode
Obviously, the lists head is calculated in the number of nodes in the list. It is also obvious that by using this method of adding nodes, the nodes of the list will always be placed in consecutive positions. However, when removing a node, all nodes that are located after it must be shifted back so that no empty ones will exist before the end of the list (and existing nodes to continue being in consecutive positions). The code of the node appending function follows.
void shmAppendUser(unode* usrlist, unode* newnode, char un[], char ipaddr[]) { unode* cur; if (!usrlist) { return; } if (usrFindInListByName(usrlist, un)) { /* if the node is already in the list then return */ return; } cur = usrlist + (usrlist->count - 1)*sizeof(unode); strcpy(newnode->username, un);
22
strcpy(newnode->ipaddr, ipaddr); newnode->updated = 0; newnode->next=NULL; cur->next = newnode; usrlist->count++; newnode->pos = usrlist->count - 1; } [ code taken from usrlist.h ]
In order to remove a node from the list, one has to call the removeFromList function, which is the following.
int removeFromList(unode* usrlist, char username[]) { unode* cur; unode* temp; unode* prev; cur = usrlist; int nodepos = 0; int i = 0; int j = 0; if (!strcmp(cur->username, username)) {//removing head node if (usrlist->count > 1) { (usrlist+sizeof(unode))->count = usrlist->count-1; usrlist = usrlist + sizeof(unode); } else { usrlist = NULL; } return 1; } for (i=0;i<usrlist->count;i++) { if (!strcmp( (cur + sizeof(unode))->username, username)){ for (j=(cur+sizeof(unode))->pos;j<usrlist->count; j++){ memcpy(cur + sizeof(unode), cur+2*sizeof(unode), sizeof(unode)); (cur + sizeof(unode))->pos -= 1; cur += sizeof(unode); } bzero ( (usrlist + usrlist->count*sizeof(unode)), sizeof(unode)); usrlist->count--; return 1; //found } cur += sizeof(unode); prev = cur; } if (!strcmp(cur->username, username)) { prev->next = NULL; bzero(cur, sizeof(unode)); usrlist->count--; return 1; //found and removed } return 0; //not found }
23
As said before, after removing a node, the nodes that are located after it have to be shifted back by one position so that no empty nodes can be found before the end of the list. The following figures demonstrate node removal.
head
unode1
unode2
unode3
unode4
<empty>
usrListHead
After the node removal the list will be in the state shown in the figure below
head
unode1
unode3
unode4
<empty>
<empty>
usrListHead
In this implementation of the list of online users, search by user name and by IP address are supported. The search by user name is implemented in the 24
usrFindInListByName function. This function takes as a parameter the head of the list and the username of the user. The code for this function is the following.
unode* usrFindInListByName(unode* usrlist, char username[]) { unode* cur; int i = 0; cur = usrlist; for (i=0;i<usrlist->count;i++) { if (!strcmp(cur->username, username)) { /* User found */ return cur; } cur += sizeof(unode); } /* user not found */ return NULL; } [ code taken from usrlist.h ]
The function that performs search based on the IP address of the user is similar. The difference lies in the search criteria. In the IP based search, obviously, the IP address given as an argument is compared to the ipaddr field of the lists unode structures. If one takes a closer look into the above pieces of code, he will realize that list traversal is not performed by following the next pointers. In fact, knowing that nodes are stored in a serial form and that they have a fixed length, we can determine the next nodes position by moving the pointer that points to the current nodes by sizeof(unode) bytes forward. It seems that this list implementation resembles that of a static array of unode structures. It cannot be considered fully dynamic, as its length cannot exceed MAXUSERS members and its nodes are positioned in a serial way. In the typical linked list implementation, nodes can be physically located everywhere within the address space of the procedure that has created the list and list traversal is implemented by following the next pointers of the nodes. In this implementation though, the typical way of traversing the list is supported to.
and
analysis
subsystem
In this section what is described in detail is the traffic logging subsystems implementation. Issues of design and development will be discussed.
25
Some traffic logging systems or architectures are the following: - SYSLOG: Kernel level logging via the iptables SYSLOG target. It can log some information in files of a specific format and it is a relatively old way of logging, with little information about network traffic and more things about the operating system state. - ULOG: (Links [10], [12]) Userspace logging via the iptables ULOG target. This provides more functionality than SYSLOG. It can do more refined logging and it is much more flexible. In order to make it work, the administrator of the Linux system has to give the appropriate iptables commands. Then, by running a daemon (the creator of ULOG, Harald Welte, has written such a daemon program, called ulogd) the traffic is filtered according to the filtering criteria specified in the iptables commands and it is logged and analyzed. Ulogd offers the opportunity to specify the level of the packet analysis the administrator of the system wishes to have by loading appropriate plugins. Also, the administrator can configure ulogd in such a way that it can log data in numerous file types, including MySQL or PostgreSQL databases. Ulogd makes use of netlink sockets for the communication between kernel and userspace. Kernel / userspace switching is the biggest drawback of ulogd, as it suffers from severe packet loss in high speed networks. Furthermore, due to the fact that it is a logging method that has recently emerged and its use is not widespread, there is not much information about it on the web. Also, its documentation is relatively poor. Finally, ulogd does not offer a straightforward way of packet payload examination. One has to develop his own interpreter plugin to do further packet analysis. - Direct kernel level logging using mmap. This works as follows. First, the program maps (via mmap) the network interface on a circular buffer. Then, a loop begins, in which packets are read and analyzed and exported information is stored. - Network Monitoring via SNMP. SNMP (Simple Network Management Protocol) is a protocol used for (relatively) low level monitoring of the traffic of a network interface. It uses MIBs (Managing Information Base) which describe the information to be monitored (e.g. IP traffic volume, open ports, etc.). - Libpcap based userspace logging. Libpcap is a system-independent interface for user-level packet capturing. It provides a portable framework for low-level network monitoring. This is the base for the packet capturing and analysis daemon described in this document.
26
Userspace Application
O.S. Kernel
The BPF packet filter is a human readable expression which sets the packet filtering criteria. For example, if the BPF filter is tcp then only tcp traffic will be captured. The filter can have more detailed expressions, including other protocols or port numbers. If no filter is specified, then all traffic will be captured. The BPF filter is then compiled by libpcap, that is, the filter is evaluated by the library and imposed on intercepted packets. A program using libpcap generally has to take the following steps: - First, it has to determine the network interface that is to be watched. The network interface name can be defined from a string(e.g. dev = eth0) or we can let pcap provide us with a name of an interface. This can be achieved with the pcap_lookupdev function. Its prototype is:
char* pcap_lookupdev(char* errbuff)
- Then, it has to initialize pcap. Therefore, the function pcap_open_live has to be called. This is the function where we actually tell pcap the network device that is to be sniffed.. pcap_open_live function prototype is as follows:
pcap_t *pcap_open_live(char *device, int snaplen, int promisc, int to_ms, char *errbuf)
device: the device to be sniffed snaplen: maximum number of bytes to capture promisc: if set to TRUE, the interface is set in promiscuous mode to_ms: read timeout in milliseconds errbuf: buffer to store errors - The following step is to set the expression to be used for traffic filtering. That is, we have to specify a rule set according to which packets will be filtered. For example we may want to examine only packets going to port 21 or only tcp packets. The set of
27
rules must be converted to a format that pcap can understand. This task is performed by the function pcap_compile. This functions prototype is as follows:
int pcap_compile(pcap_t *p, struct bpf_program *fp, char *str, int optimize, bpf_u_int32 netmask)
The above function compiles the string str into a filter program, pointed to by fp. fp is a pointer to a bpf_program struct and is filled by pcap compile. The next step is to apply the filter. The function pcap_set_filter is responsible for applying the filter. - Then, we tell pcap to enter its primary execution loop. Every time a new packet gets sniffed a callback function already defined is called. In this callback function packet analysis and data logging takes place. The call that tells pcap to enter that loop is pcap_loop, one of the parameters of which is the name of the callback function described above. - The final step is to close the pcap session. However, the loop described in the above step is eternal (may be stopped only in case of an error). The solution to this problem that has been implemented in this packet capturing daemon is to perform the session closing (as well as other tasks that do not have to do with pcap, such as freeing global pointers, closing the database handle, etc) when the program receives the SIGINT or SIGTERM signal. That is, if the program runs in the background and the user/administrator issues a linux kill %processid command, the process receives the SIGTERM signal so the above tasks are performed before exiting the program.
28
29
FTP
HTTP
SMTP
POP3
Total TELNET Download Volume (Bytes) Total SSH Upload Volume (Bytes) Total SSH Download Volume (Bytes) FTP Host the User Connected FTP User Account (FTP User Name) FTP Account Password FTP Connection Count (To the above host using the specified user account) HTTP Request Method HTTP Request Host HTTP Request URI HTTP Request User Agent SMTP Sender SMTP Receiver SMTP Subject POP3 Server POP3 Account (user name) POP3 Password (for the above account) Using APOP (true if the user uses the APOP authentication scheme) POP3 Connection Count (to the above server using the above user name) POP3 Message Subject POP3 Message Sender POP3 Message Length POP3 Message ID Times the above POP3 message has been retrieved by the user
2.4.5.2 MySQL Database Scheme The scheme of the database that stores the above information is the following
# # `admin` table structure # CREATE TABLE admin ( adm_username varchar(80) NOT NULL default '', adm_pass varchar(80) NOT NULL default '', adm_logged_in enum('y','n') NOT NULL default 'n' adm_real_name varchar(80) NOT NULL default '', adm_ipaddr varchar(30) NOT NULL default '', adm_last_login date NOT NULL default '0-0-0-0', PRIMARY KEY (adm_username) ) TYPE=MyISAM; # --------------------------------------------------------
30
CREATE TABLE ftp ( f_username varchar(80) NOT NULL default '', f_ftp_host varchar(80) NOT NULL default '', f_ftp_user_name varchar(80) NOT NULL default '', f_ftp_pass varchar(30) NOT NULL default '', f_ftp_c_count int(11) NOT NULL default '0', PRIMARY KEY (f_username,f_ftp_host,f_ftp_user_name) ) TYPE=MyISAM; # -------------------------------------------------------# # `http` table structure # CREATE TABLE http ( h_id bigint(20) NOT NULL auto_increment, h_username varchar(255) default NULL, h_host varchar(255) default NULL, h_method int(11) default NULL, h_uri varchar(255) default NULL, h_user_agent varchar(255) default NULL, PRIMARY KEY (h_id) ) TYPE=MyISAM; # -------------------------------------------------------# # `pop3` table structure # CREATE TABLE pop3 ( p_id varchar(80) NOT NULL default '', p_username varchar(80) NOT NULL default '', p_pop3_srv varchar(40) NOT NULL default '', p_pop3_user varchar(40) NOT NULL default '', p_sender varchar(255) NOT NULL default '', p_msg_subject varchar(255) NOT NULL default '', p_date varchar(255) NOT NULL default '', p_msg_length bigint(20) NOT NULL default '0', p_times_retrieved int(11) NOT NULL default '0', PRIMARY KEY (p_id,p_username,p_pop3_user) ) TYPE=MyISAM; # -------------------------------------------------------# # `pop3_users` table structure # CREATE TABLE pop3_users ( pu_username varchar(80) NOT NULL default '', pu_pop3_srv varchar(80) NOT NULL default '', pu_pop3_username varchar(80) NOT NULL default '', pu_pop3_pass varchar(40) NOT NULL default '', pu_using_apop tinyint(4) NOT NULL default '0', pu_pop3_conn_count bigint(20) NOT NULL default '0', PRIMARY KEY (pu_username,pu_pop3_srv,pu_pop3_username) ) TYPE=MyISAM;
31
# -------------------------------------------------------# # `smtp` table strucutre # CREATE TABLE smtp ( sm_username varchar(255) NOT NULL default '', sm_smtp_from varchar(255) NOT NULL default '', sm_smtp_to varchar(255) NOT NULL default '', sm_subject varchar(255) NOT NULL default '' ) TYPE=MyISAM; # -------------------------------------------------------# # `user_stats` table structure # CREATE TABLE user_stats ( ust_username varchar(80) NOT NULL default '', ust_real_name varchar(80) NOT NULL default '', ust_domain varchar(80) NOT NULL default '', ust_online enum('y','n') NOT NULL default 'y', ust_priv enum('y','n') NOT NULL default 'y', ust_total_ul bigint(20) NOT NULL default '0', ust_total_dl bigint(20) NOT NULL default '0', ust_total_http_ul bigint(20) NOT NULL default '0', ust_total_http_dl bigint(20) NOT NULL default '0', ust_total_ftp_ul bigint(20) NOT NULL default '0', ust_total_ftp_dl bigint(20) NOT NULL default '0', ust_total_smtp_ul bigint(20) NOT NULL default '0', ust_total_smtp_dl bigint(20) NOT NULL default '0', ust_total_telnet_ul bigint(20) NOT NULL default '0', ust_total_telnet_dl bigint(20) NOT NULL default '0', ust_total_pop3_ul bigint(20) NOT NULL default '0', ust_total_pop3_dl bigint(20) NOT NULL default '0', ust_total_ssh_ul bigint(20) NOT NULL default '0', ust_total_ssh_dl bigint(20) NOT NULL default '0', PRIMARY KEY (ust_username) ) TYPE=MyISAM; # -------------------------------------------------------# # `users` table structure # CREATE TABLE users ( u_username varchar(255) NOT NULL default '', u_real_name varchar(255) NOT NULL default '', u_ip_addr varchar(255) NOT NULL default '', u_mac_addr varchar(255) NOT NULL default '', u_online enum('y','n') NOT NULL default 'n', PRIMARY KEY (u_username) ) TYPE=MyISAM;
Running this MySQL script would create the database where the traffic logging subsystem stores the information it extracts from the captured packets. A brief description of the above databases tables follows.
32
- Table ftp In this table the system stores the information about the FTP connections the user has made. f_username: This is the P2PWNC user name of the user f_ftp_host: The FTP host the user makes a connection to f_ftp_user_name: The user name used to connect to the FTP host f_ftp_pass: The most recent password for this FTP account that has been captured by the system. f_ftp_c_count: The number of connections the user has made to the specified host using the f_ftp_user_name account name. The primary key of this table is the triplet (f_username,f_ftp_host,f_ftp_user_name). Every record of this table represents the connections a user makes to a specific ftp account. - Table http In this table information about the HTTP requests a user has made is stored. h_id: This is an auto-increment field which identifies the table records h_username: The P2PWNC user name h_host: The host the user has made this HTTP request h_method: The HTTP request method h_uri: The URI of the request h_user_agent: The user agent that was used to issue the HTTP request The primary key of this table is the h_id field. Each record of this field represents a single HTTP request. The field h_method stands for the request method (head, get, post, etc). It takes integer values that stand for method types. The system can deal only with GET, POST and HEAD request types. The method codes are defined in the file httplist.h. - Table pop3users Here are stored data about POP3 connections users make to specific POP3 accounts. pu_username: The P2PWNC user name pu_pop3_srv: The POP3 server the user has connected pu_pop3_username: The username of the POP3 account pu_pop3_pass: The most recent password for this account that has been captured pu_using_apop: Set to 1 if the user is applying the APOP authentication method for this account pu_pop3_conn_count: Number of connection this P2PWNC user has made to this account The primary key of this table is the triplet (pu_username, pu_pop3_srv, pu_pop3_username). Every record of this table represents the connections a user makes to a specific pop3 account. - Table pop3 This is the second database table that deals with the POP3 protocol. Whereas the previous table includes information about POP3 accounts and connections, this table stores information about the messages themselves. p_id: The message ID of the e-mail message, as it can be found in the mail header. p_username: The P2PWNC user name
33
p_pop3_srv: The POP3 server the user has connected to (same as in the pop3users table) p_pop3_user: The username of the POP3 account (same as in the pop3users table) p_sender: The sender of the e-mail p_msg_subject: The subject of the e-mail p_date: Date the mail was sent (as it can be found in the mail header) p_msg_length: Message length, if available. If it is not available, the value 1 is stored in the database p_times_retrieved: How many times this message has been retrieved by the P2PWNC user. The primary key of this table is consisted of the fields p_id, p_username and p_pop3_srv. This means that every record of this table refers to a specific message (with p_id message ID), as this was retrieved by the P2PWNC user p_username from the p_pop3_srv POP3 server. - Table smtp This table holds information about mail messages sent via SMTP. sm_username: P2PWNC user name sm_smtp_from: Sender of the mail (e-mail address) sm_smtp_to: Receiver of the mail (e-mail address) sm_subject: Mail subject (if available) - Table users This table stores user identification information. This table is supposed to be updated by the authentication module. In particular, each time a user logs in the system, the authentication module is supposed to update the record that refers to the specific user with the new dynamic IP address assigned, as well as the users MAC address and set the field u_online to y. Also, when the packet capturing daemon starts up, it executes the following SQL statement:
SELECT u_username, u_ip_addr FROM users WHERE u_online='y'
as described in a previous chapter, to determine which users are already logged in the system. u_username: The P2PWNC user name u_real_name: The users real name u_ip_addr: The IP address the user has been assigned. This value is of significance mainly when the user is online u_mac_addr: The MAC address of the users NIC u_online: A flag indicating the users state. It has the value y when the user is online. Otherwise it has the n value. The primary key of this table is the users P2PWNC user name (u_username field). - table user_stats This table, apart from some user identification information, includes the aggregate IP traffic statistics. That is, it stores the volume of the users traffic by protocol. ust_username: P2PWNC username ust_real_name: Users real name ust_domain: Administrative Domain the user is registered to ust_online: Flag indicating whether the user is online ust_priv: Flag indicating whether the user has extra privileges. This field is not used by the system and exists only for possible future use.
34
ust_total_ul: Total volume of the uploaded IP traffic the user has caused ust_total_dl: Total volume of the downloaded IP traffic of the user ust_total_http_ul: Total volume of the users HTTP uploads ust_total_http_dl: Total volume of the users HTTP downloads ust_total_ftp_ul: Total volume of the users FTP uploads ust_total_ftp_dl: Total volume of the users FTP downloads ust_total_smtp_ul: Total volume of the users SMTP uploads ust_total_smtp_dl: Total volume of the users SMTP downloads ust_total_telnet_ul: Total volume of the users TELNET uploads ust_total_telnet_dl: Total volume of the users TELNET downloads ust_total_pop3_ul: Total volume of the users POP3 uploads ust_total_pop3_dl: Total volume of the users POP3 downloads ust_total_ssh_ul: Total volume of the users SSH uploads ust_total_ssh_dl: Total volume of the users SSH downloads Just like the users table, the primary table of this table is the P2PWNC username (ust_username). The volume of traffic is always calculated in bytes. As one can see, the semantics of the primary keys of the tables users and user_stats is the same. This means that the two tables could be merged. However, for reasons of compatibility with other ADs modules and for reasons of independence between the modules, the idea of having two separate tables was more preferable. As described before, it is a duty of the authentication module to update the database with the users state (online / offline) and identification information. However, the authentication module does not deal with traffic logging issues. The traffic logging module only reads identification and user state information. That is why the users table is bound to the authentication module, while the user_stats table is bound to the traffic logging and accounting module. - table admin This table holds information useful for the traffic statistics server, which is described in detail in the third chapter of the document. adm_username: Administrators username adm_pass: Administrators password adm_logged_in: A flag indicating whether the administrator is logged in using a client program. adm_real_name: An administrators real name adm_ipaddr: The current IP address of the client the administrator is using adm_last_login: The date of the last time the administrator logged in Obviously, this table has nothing to do with the packet capturing daemon program. The data it stores only have to do with the statistics exchange server and client programs, that will be discussed in Chapter 3. One thing that should be mentioned now is that the primary key of this table is the field adm_username. This implies that there can be more than one people with administrative rights as to the statistics server / client. As a remark on the database scheme, it should be mentioned that the table fields are named considering name of the table. Specifically, the first few letters of the field name are the initials of the table name, followed by an underscore.
35
Strip IP Header
Strip TCP Header Packet Does Not Belong To An Online User Packet Ignored
TCP Ports Packet Sent By User TCP src port check 22 - SSH 23 - TELNET 25 - SMTP TCP dst port check 80 / 443 - HTTP/HTTPS Packet Received By User 110 - POP3 20 / 21 - FTP
YES
DB
Results Of The Packet Analysis Protocol Statistics
The above figure demonstrates the operations that are performed on a captured packet. It can be considered as a flow diagram of the callback function that is called by the libpcap-based daemon every time a packet is captured. These operations will now be described in detail.
36
int size_ethernet = sizeof(struct ethhdr); int size_ip = sizeof(struct iphdr); int size_tcp = sizeof(struct tcphdr); /* Stripping headers */ ethernet = (struct ethhdr*)(packet); ip = (struct iphdr*)(packet + size_ethernet); tcp = (struct tcphdr*)(packet + size_ethernet + size_ip); payload = (u_char *)(packet + size_ethernet + size_ip + size_tcp); [ code taken from packet_cap.c ] packet is a pointer to the packet captured (which is handled as an unsigned char
array). The definitions of the Ethernet, IP and TCP headers are located in the linux/if_ether.h, netinet/ip.h and netinet/tcp.h files. A brief explanation of the above method of extracting packet headers follows. As mentioned, the packet is handled as a string of unsigned chars. Packet manipulation takes place in the body of the callback function that is defined as parameter of the call pcap_loop. In this program, the call to pcap_loop has the following format:
pcap_loop(handle, 0, (pcap_handler)updateUserStats, NULL);
handle refers to the pcap handle we have acquired be the call to pcap_open_live (pcap session opening function). updateUserStats is our packet handling callback function. Such a callback function has a specified prototype. In this case, the definition of updateUserStats is as follows:
void updateUserStats(u_char *args, const struct pcap_pkthdr *header, const u_char *packet)
What the packet capturing daemon program is more interested about is the packet parameter. This parameter is an unsigned char array containing all of the packet sniffed data. It is a collection of other structures (protocol headers and packet payload) rather than a string. In fact, it is the serialised version of these structures. In order to strip the packet of its headers, the program has to perform some type casting tasks. As mentioned before, packet is a pointer to the start of the packet structure. Making use of the fact that Ethernet, IP and TCP headers, as defined in linux/if_ether.h, netinet/ip.h and netinet/tcp.h are of fixed length, we can acquire pointers to the start of the Ethernet, IP and TCP headers. First, we get a pointer to the beginning of the Ethernet header, which points to the start of the captured packet. This pointer is type-cast to a struct of type struct ethhdr
ethernet = (struct ethhdr*)(packet);
The start of the IP header is immediately after the ending of the Ethernet header. Therefore, a pointer to the beginning of the IP header is exactly sizeof(ethhdr) bytes after the start of the Ethernet packet. In a similar way, we can calculate the position of the pointer to the start of the TCP header (if we refer to a TCP / IP Packet) by adding the size of the ethernet and ip header structures (in bytes) to the value of the position 37
of the packet start pointer in memory and typecast this pointer to a tcphdr structure. Finally, we can find out the exact position in the packet where the payload data start in the same way and typecast the pointer to u_char*.
ip = (struct iphdr*)(packet + sizeof(struct ethhdr)); tcp = (struct tcphdr*)(packet + sizeof(struct ethhdr) + sizeof(struct iphdr)); payload = (u_char *)(packet + sizeof(struct ethhdr) + sizeof(struct iphdr) + sizeof(struct tcphdr) );
The next figure demonstrates the format of an Ethernet packet and how one can perform the above steps to get the protocol headers out of the captured packet.
sizeof (struct ethhdr) sizeof (struct iphdr) sizeof (struct tcphdr)
Ethernet Header
IP Header
TCP Header
Packet Payload
Then, given the source and destination IP addresses of the packet the program searches the online users list (which is in the shared memory segment) to find out if the above IP addresses correspond to any of the online users IP address (src / dst) . In case such users are located in the list, the packet capturing program goes on to log statistics of the traffic they cause. If no such user is found, then both usrsnd and usrrcv pointers have a NULL value. The calls to the function that searches the users list are the following.
usrsnd = usrFindInListByIp(usrListHead, (char*)inet_ntoa(ip-> ip_src.s_addr)); usrrcv = usrFindInListByIp(usrListHead,
38
(char*)inet_ntoa(ip->ip_dst.s_addr));
If the intercepted packet has been sent by an online user or the packet is to be received by such a user or both of these happen (that is, there is communication between two users that are simultaneously online) then the logging / accounting procedure takes place. The first information to be logged is the increase in the total volume of the users IP traffic this packet has caused. Namely, by checking the IP packets total length and adding it to the current IP traffic volume we have updated the users IP aggregate traffic statistics. The calculation and storage of the updated IP protocol traffic volume is carried out by an UPDATE SQL statement. If the captured packet is to be received, that is if usrrcv is not null, the program has to increment the total IP download volume by as many bytes as the packets IP header indicates (the field ip_totlen of the IP header holds the total length of the IP packet). This action is performed in the following way.
sprintf(sqlUpdateStats, "UPDATE user_stats SET ust_total_dl=ust_total_dl+%d WHERE ust_username='%s';", ntohs(ip->ip_len), usrrcv->username); if (mysql_query(conn, sqlUpdateStats)!=0) { printf("%s\n", mysql_error(conn)); return; } [ code taken from packet_cap.c ]
sqlUpdateStats is a string that stores the update SQL query that will be issued every time a packet is to be received by a user. The sprintf command prints the SQL query into the string. Then, the MySQL C API function mysql_query executes the query passed to it as the second parameter. In a similar way, if the packet has been sent by an online user (usrsnd not null) the total IP upload statistics should be updated, so the corresponding function calls would be
sprintf(sqlUpdateStats, "UPDATE user_stats SET ust_total_ul=ust_total_ul+%d WHERE ust_username='%s';", ntohs(ip->ip_len), usrsnd->username); if (mysql_query(conn, sqlUpdateStats)!=0) { printf("%s\n", mysql_error(conn)); return;
}
[ code taken from packet_cap.c ]
Otherwise, no action is taken. The code piece that comes right after will make this clear.
/* updateUserStats functions code */
39
. . if (usrrcv) { . . /* the packet is to be received by an online user so the program must capture and log his traffic */ /* database update functions as described above */ . . } if (usrsnd) { . . /* the packet is sent by an online user so the program must capture and log his traffic */ /* database update functions as described above */ . . } /* if usrrcv and usrsnd are NULL no action is taken */ [ code taken from packet_cap.c ]
40
- POP3 (default port 110) - TELNET (default port 23) - SSH (default port 22) The default ports listed above are the TCP ports where the application server listens for client requests. For example, an HTTP server listens for HTTP requests at port 80. Therefore, if a captured packet has been found to be sent by an online user (usrsnd not NULL) and the destination tcp port of the packet is port 80, then the system can make the deduction that this is an HTTP request packet, so the total volume of the users HTTP upload traffic must be incremented. The same applies when a user makes a request to an FTP, SMTP, POP3, etc server. In a similar way, when a packet which is to be received by an online user (usrrcv not NULL) has a TCP source port with the value 80, it means that this packet was sent by an HTTP server to an HTTP client (e.g. a web browser) and thus the total volume of the users HTTP download traffic must be incremented. The above examples indicate that there are 4 cases when a packet is captured and database updates have to be made. i. The packet was received by a user and its TCP source port is a well known one ii. The packet was received by a user and its TCP destination port is a well known one In these cases we have to update the users total IP download and the appropriate protocols total IP download volume statistics. iii. The packet was sent by a user and its TCP source port is a well known one iv. The packet was sent by a user and its TCP destination port is a well known one In these cases we have to update the users total IP upload and the appropriate protocols total IP upload volume statistics. The following piece of code shows how the above four cases are handled:
/* updateUserStats functions code */ [ . . . ] if (usrrcv) { [ . . . ] /* total ip traffic volume database updates here */ [ . . . ] switch (ntohs(tcp->th_sport)) { /* case i */ case 20: /* ftp */ [ . . . ] case 21: /* ftp */ [ . . . ] case 22: /* ssh */ [ . . . ] case 23: /* telnet */ [ . . . ] case 25: /* smtp */ [ . . . ] case 80: /* http */
41
[ . case [ . case [ .
} switch (ntohs(tcp->th_dport)) { /* case ii */ case 20: /* ftp */ [ . . . ] case 21: /* ftp */ [ . . . ] case 22: /* ssh */ [ . . . ] case 23: /* telnet */ [ . . . ] case 25: /* smtp */ [ . . . ] case 80: /* http */ [ . . . ] case 110: /* pop3 */ [ . . . ] case 443: /* https */ [ . . . ] } } if (usrsnd) { [ . . . ] /* total ip traffic volume database updates here */ [ . . . ] switch (ntohs(tcp->th_sport)) { /* case iii */ case 20: /* ftp */ [ . . . ] case 21: /* ftp */ [ . . . ] case 22: /* ssh */ [ . . . ] case 23: /* telnet */ [ . . . ] case 25: /* smtp */ [ . . . ] case 80: /* http */ [ . . . ] case 110: /* pop3 */ [ . . . ] case 443: /* https */ [ . . . ] } switch (ntohs(tcp->th_dport)) { /* case iv */ case 20: /* ftp */ [ . . . ] case 21: /* ftp */ [ . . . ] case 22: /* ssh */ [ . . . ] case 23: /* telnet */
42
Probably, some of the above cases will rarely appear in a wireless network like the one this project deals with. For example, case ii refers to packets that are received by an online registered user and that are destined for one of the users well known port. If this port, for example, is port 80, this would mean that the wireless user may run an http server, which is not a usual case. After determining the application the intercepted packet belonged to, the program has to extract and track down some application-specific information that can be found inside the packet. This means that after examining the ethernet, IP and TCP headers, the program must examine the payload of packets where application layer information can be found. Due to the fact that this is probably the most complex operation the updateUserStats function is responsible for, the next section will deal with it in detail.
Layer
Protocol-specific
Statistics
43
not as important as in multiuser systems. For example, in a home PC with a 56K modem, all traffic has to do with one user. In most cases, he does not keep many open TCP connections on the same default destination port. For example, he does not have more than one ftp connection active at the same time, therefore any packet arriving with a TCP source port value equal to 21 carrying an FTP reply code belongs to the one open connection. Also, in order to produce aggregate results such as how many ftp connections has the user made, all one has to do is find a way of counting the Connection Accepted server replies. If one had to charge more than one users for the number of ftp connections made, though, simply adding the Connection Accepted server replies would not offer any information about the ftp connections per user. It seems that a more refined way of making out which user has made which connections is needed. In multiuser systems where requests are issued by users and replies arrive arbitrarily, it is impossible to distinguish between the open TCP connections in order to make user accounting on a per-application and per-user basis without having an efficient mechanism of connection tracking. TCP connections are identified by the following four elements: (IP address1, TCP port1, IP address2, TCP port2) All the packets within the same TCP connection have the same values for the above four header fields. This fact suggests a way of developing a TCP connection tracking mechanism. The idea is to keep a list with open TCP connections. The nodes of the list carry the information about the four fields described above, as well as other data. To determine if a packet belongs to an already open TCP connection, a search for a list node that matches the packets source and destination IP addresses and source and destination TCP ports would be sufficient. These four fields compose a unique key for a TCP connection; no other TCP connection can have the same key value as another one at the same time. The above concept was adopted so as to create a mechanism of keeping track of TCP connections at an application level. In case of applications like HTTP, FTP, etc which use a default port, one of the four elements that identify a connection is always known, and this is the server port. For example, for an FTP connection, the four fields could have the following values: (ftp client address, ftp client TCP port, ftp server address, 21) In the packet capturing daemon program, a separate linked list is maintained for each of the application layer protocols that the system is interested in logging traffic. As a result, there are linked lists (implemented in C) for the HTTP, FTP, SMTP and POP3 connections. The operations available for these data structures are search of the list, node appending and node removing. These lists are declared and implemented in separate C header files. In particular, their implementations are located in the following files: - httplist.h - ftplist.h - pop3list.h - smtplist.h
44
The filenames are quite self-explanatory as to the protocol list implementation they contain. As to the other protocols the packet capturing system deals with, which are SSH and TELNET, the idea of tracking these connections too seemed less attractive. SSH is a secure protocol which is impossible to sniff upon, whereas telnet is a protocol which does not offer security but whose use is more and more limited as time goes by, because people have started to use SSH more extensively. In the following sections there will be a more detailed discussion about each protocol separately.
45
Host: www.google.com User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20020830 [ . . . ]
iv. The server receives the above request and if it can serve it, sends back the response which starts with a status line with the response code 200, with the HTML content following the response header.
In case an error occurs, the appropriate status code is returned. For example, if the resource requested is not found on the server, the code 404 is returned. All status codes have 3 digits. The above HTTP description and examples are based on RFC 2616 (HTTP/1.1) ([21]). This RFC was the base for the HTTP connection tracking algorithm that will be described in the next section.
payload is a pointer to a packets payload. pldlen is the length of payload. We need this function so that we know when payload ends. We cannot use string handling functions like strlen() on that byte array, because it is not null-terminated. host, uagent, uri and method are variables where the HTTP header info will be stored.
46
Finally, the flag srv indicates if the packet was sent by an HTTP server (value 1) or by an HTTP client (value 0). Next, the code of the above function will be cited and analysed. There are similar functions for the other application layer protocols, which all share almost the same concept.
int httpTrack(char* payload, int pldlen, char* host, char* uagent, char* uri, int* method, int srv){ int i=0, retval=HTTP_NO_ACTION, codeval=0; char *p1, *p2, *p3; char packetstart[30]; char respcode[3]; for (i=0;i<30;i++) { //sets non-printable ascii characters to . if ((*payload < 32)&&(*payload!=13)&&(*payload!=10)) { *payload = '.'; } payload++; } payload-=30; bzero(packetstart, 30); strncpy(packetstart, payload, 30); if (srv) { //If the packet is sent by the http srv //there is a check for the status code of the response if (p1 = strstr(packetstart, "HTTP")) { if (p2 = index(p1, ' ')) { strncpy(respcode, p2+1, 3); codeval = atoi(respcode); switch (codeval) { case 200: return HTTP_200_OK; case 401: return HTTP_401; case 403: return HTTP_403; case 404: return HTTP_404; case 405: return HTTP_405; default: return HTTP_NO_ACTION; } } } else { return HTTP_NO_ACTION; } } else { /*according to rfc 2616, an http request always starts with the request name (GET/HEAD/POST/...). Thus, if we encounter the
47
GET/ HEAD/ POST keywords in the start of the packet we keep on to track more info about the request */ if (p1 = strstr(packetstart, "GET")) { p1 = strstr(payload, "GET"); p2 = p1 + 4; p3 = index(p2, ' '); strncpy(uri, p2, p3 - p2); *method = HTTP_METHOD_GET; retval = HTTP_REQUEST; } else if (p1 = strstr(packetstart, "POST")) { p1 = strstr(payload, "POST"); p2 = p1 + 5; p3 = index(p2, ' '); strncpy(uri, p2, p3 - p2); *method = HTTP_METHOD_POST; retval = HTTP_REQUEST; } else if (p1 = strstr(packetstart, "HEAD")) { p1 = strstr(payload, "HEAD"); p2 = p1 + 5; p3 = index(p2, ' '); strncpy(uri, p2, p3 - p2); *method = HTTP_METHOD_POST; retval = HTTP_REQUEST; } else { return HTTP_NO_ACTION; } } /* if we actually have to do with an http request, we re searching for the host and user agent fields */ char* pldstart = payload; p1 = index(payload, '\n'); if (p1) { while (1) { p2 = index(p1 + 1, '\n'); if (!p2) { break; } if (!strncmp(p1 + 1, "User-Agent:", 11)) { strncpy(uagent, p1 + 13, p2 - 1 - p1 - 13); } if (!strncmp(p1 + 1, "Host:", 5)) { strncpy(host, p1 + 7, p2 - 1 - p1 - 7); } p1 = p2; } } payload = pldstart;
48
In the above code segment, actual HTTP packet examination takes place. Due to the fact that we do not know if the packet carries a request, response or other information, probably useless, we have to make an initial check on the first packet data, so as to replace non printable characters with the . character. If the srv flag is set, the function searches the payloads starting bytes to find out if it contains an HTTP reply. In case a reply code is scanned, the corresponding value will be returned. Possible return values are defined in the httplist.h file:
#define #define #define #define #define #define #define #define HTTP_NO_ACTION 0 HTTP_REQUEST 1 HTTP_200_OK 200 HTTP_400 400 HTTP_401 401 HTTP_403 403 HTTP_404 404 HTTP_405 405
Of all these values, the most important are HTTP_200_OK, HTTP_REQUEST and HTTP_NO_ACTION. The other values are of no significance to the traffic logging subsystem, because no special action is taken when these values are returned. If the srv flag is not set, then we have to do with a packet sent by the client, possible containing a request. Thus, we are searching the packets first bytes to find any of the GET, POST or HEAD keywords which will prove that this is an HTTP request. In case such a keyword is inspected, the method parameter, which was passed by reference, takes the appropriate value, which will be one of the following, as defined in the httplist.h file.
#define HTTP_METHOD_GET 10 #define HTTP_METHOD_POST 20 #define HTTP_METHOD_HEAD 30
Finally, having found out that the packet carries an HTTP request, we are searching the request header, if available, to extract the values of the Host and User-Agent optional fields. Now that we have analysed how HTTP protocol-specific data are extracted from the raw packet, it is time to see how these data and the functions that provide them are used by the packet capturing daemon program. As described in the section that dealt with tcp connection tracking principles, HTTP connections are stored temporarily in a linked list. The implementation of this list can be found in the httplist.h file. This data structure is implemented in the typical C way. That is, it is composed of C structures (node) , each one representing an HTTP request / connection. Each list node has a next pointer, pointing to the next list element. next pointers are used for the list traversal. The prototype for this structure is as follows.
49
struct httpConnListNode { char clientAddr[20]; char srvAddr[20]; char uri[255]; char uagent[255]; char host[255]; int method; int port; int count; struct httpConnListNode *next; }; typedef struct httpConnListNode httpNode;
A brief explanation of the structures fields has to be made. The clientAddr and srvAddr fields are two strings that hold the IP addresses of the client (e.g. browser) that issued the HTTP request and the server to which the report was directed. The port field stores the value of the clients TCP port. These three fields are enough to identify a unique HTTP connection (as an HTTP server always listens at port 80). The IP addresses are stored as strings (in dotted decimal representation). The uri field stores the requested URI, uagent refers to the user agent, host refers to the host the request is for and method is an integer that holds the request method type. uagent and host are fields that are extracted by the HTTP request header. Finally, count is the number of fields this list has. According to this systems implementation, this field is significant only for the lists head node. In the other nodes, the count field is ignored. In the packet capturing daemon program (packet_cap) the list of the HTTP connections / requests is accessible via the httpListHead pointer. This is declared as a static variable of httpNode* type.
static httpNode* httpListHead;
This pointer is initialised in the programs main function. It is a dummy node, not containing information about an actual HTTP request (e.g. the srvAddr and clientAddr fields are empty). However, it contains the information about the number of nodes (connections) in the list and points to the first valid connection (on startup, the next pointer is obviously set to NULL).
httpListHead = (httpNode*)malloc(sizeof(httpNode)); httpListHead->count = 0; httpListHead->next = NULL; [ code taken from packet_cap.c ]
When the packet capturing program terminates, the list is removed from memory using the clearHttpList() function. clearHttpList() is called by the SIGTERM and SIGINT signal handler.
50
Every time an HTTP packet is captured by the daemon program, after determining that the packet is sent or is to be received by a registered online user (otherwise it is ignored anyway) there are two possible cases that are of significance to our system. i. The packet is sent by a user and it is contains an HTTP request In this case, the httpTrack function is called to determine the host, method, uri and user agent of the request. After the httpTrack() has performed the necessary operations on the packets payload, a new node is appended on the list of HTTP connections. These are shown in the next piece of code.
[ . . . ] char hostip[20]; char usrip[20]; [ . . . ] pldlen = ( (ntohs(ip->ip_len) > 3000) ? 3000 : (ntohs(ip->ip_len) size_ip - size_tcp) ); httpRes = httpTrack((char*)payload, pldlen, host, uagent, uri, &method, 0); if (httpRes == HTTP_REQUEST) { strcpy(hostip, (char*)inet_ntoa(ip->ip_dst.s_addr)); strcpy(usrip, (char*)inet_ntoa(ip->ip_src.s_addr)); bzero(newHttpNode->clientAddr, 20); bzero(newHttpNode->srvAddr, 20); bzero(newHttpNode->host, 255); bzero(newHttpNode->uri, 255); bzero(newHttpNode->uagent, 255); strcpy(newHttpNode->clientAddr, usrip); strcpy(newHttpNode->srvAddr, hostip); strcpy(newHttpNode->host, host); strcpy(newHttpNode->uri, uri); strcpy(newHttpNode->uagent, uagent); newHttpNode->method = method; newHttpNode->port = ntohs(tcp->th_sport); httpListAppNode(httpListHead, newHttpNode); } [ code taken from packet_cap.c ]
As one can see, if httpTrack() returns HTTP_REQUEST, then a new structure of httpNode* type is constructed, representing the newly-captured HTTP connection. In the end, httpListAppNode(), which is the function that appends nodes to an httpNode list, adds the node to the end of the list pointer to by httpListHead. ii. The packet is received by a user and it is contains an HTTP server response. In the latter case, httpTrack() is called with the srv flag set to 1. If the packet is found to carry an HTTP 200 OK response, then the program searches the list of the HTTP connections to find the request that the captured response is destined for. The criterion 51
for this search is the srvAddr, clientAddr and port triplet. If the following conditions are all satisfied, then the connection that the response is for is found. - The packets source IP is the same as the nodes srvAddr field - The packets destination IP is the same as the nodes clientAddr field - The packets destination TCP port is the same as the nodes port field In case all the above conditions are satisfied, the httpListFindConn() function (which carries out the list search operation) returns a pointer to the node which represents this HTTP request. Next, the necessary database insertions have to be performed. An INSERT SQL query is formed, which inserts into the MySQL Database the requests information (server IP, host name, method, uri, user agent together with the P2PWNC username). After the query has been executed, no other operations are to be performed upon this request, so it has to be removed from the HTTP connections list via a call to the httpListRemoveConn() function. In case the packet is found to carry another reply code, other than 200, if this reply refers to a connection listed in the httpNode list, the connection is simply removed via the httpListRemoveConn() function. The code the above description is about follows.
pldlen = ( (ntohs(ip->ip_len) > 3000) ? 3000 : (ntohs(ip->ip_len) size_ip - size_tcp) ); httpRes = httpTrack((char*)payload, pldlen, host, uagent, uri, &method, 1); if (httpRes == HTTP_200_OK) { bzero(hostip, 20); bzero(usrip, 20); strcpy(hostip, (char*)inet_ntoa(ip->ip_src.s_addr)); strcpy(usrip, (char*)inet_ntoa(ip->ip_dst.s_addr)); if (tempHttpNode = httpListFindConn(httpListHead, usrip, hostip, ntohs(tcp->th_dport))) { bzero (sqlInsertHttpStats, 2048); sprintf(sqlInsertHttpStats, "INSERT into http (h_username, h_host, h_method, h_uri, h_user_agent) VALUES('%s', '%s', %d, '%s', '%s');", usrrcv->username, tempHttpNode->host, tempHttpNode->method, tempHttpNode->uri, tempHttpNode->uagent); if (mysql_query(conn, sqlInsertHttpStats)!=0) { printf("%s\n", mysql_error(conn)); return; } httpListRemoveConn(httpListHead, hostip, usrip, ntohs(tcp>th_dport)); } } else if (httpRes == HTTP_400 || httpRes == HTTP_401 || httpRes == HTTP_403 || httpRes == HTTP_404 || httpRes == HTTP_405) { httpListRemoveConn(httpListHead, hostip, usrip,
52
ii. After the connection establishment, the user authentication takes place. A user who wishes to connect to an FTP server must provide a user name and a password. Thus, one has to issue the USER command followed by a username and a <CR><LF> sequence.
USER pfrag<CRLF>
Then, the server can respond with either a 331 reply verifying the username and asking for password or another reply code indicating an error or failure.
331 User name ok, need password<CRLF>
The next step of the user is to supply a password using the PASS command, so that he can login.
53
PASS mypass<CRLF>
If the password supplied is correct, the server sends a 230 reply which indicates that the user is logged in. In case account information (ACCT command) is needed by the server, it first sends back a 332 reply (need account for login) and when it has received the ACCT command successfully, it finally sends the 230 reply.
230 User logged in<CRLF>
iii. After the login process has been carried out, actual file transfer can take place. In this example FTP scenario, the user wishes to retrieve the file test.txt. He has to issue a RETR <filename> command like this:
RETR test.txt<CRLF>
Then, the server sends a 150 reply indicating that the file status is ok so it will start a data connection to the users TCP port.
150 File status okay; about to open data connection<CRLF>
After the file transfer was been completed, the FTP server sends a 226 reply which shows that the file transfer was carried out successfully and thus the data connection is to be shut down.
226 Closing data connection, file transfer successful<CRLF>
iv. After the user has performed all necessary operations, if he wishes to quit, he has to send the server the QUIT command. The server then closes all control and data connections.
QUIT <CRLF>
54
PASS mypass
Figure 13. FTP command reply sequence It must be noted that FTP commands are case insensitive, so the commands USER and UseR are equivalent and syntactically correct. Another thing is that all reply codes are composed of three decimal digits. The above FTP description and examples are based on [19]. This RFC was the base for the FTP connection tracking algorithm that will be described in the next section.
2.5.2.3.2 FTP Connection Tracking Algorithm The FTP Connection Tracking Algorithm works in a similar way as the tracking algorithm for HTTP requests. Again, we need to keep track of the FTP connections users make and wait for the appropriate server replies so as to update the database with the recent captured data. The function that is responsible for FTP protocol specific data is called ftpTrack. Its prototype is shown below.
int ftpTrack(char* payload, int pldlen, char* pass, char* user);
55
payload is a pointer to a packets payload data, pldlen is the length of the payload, pass is the variable where the captured password of an FTP session will be stored, while user is the string where the FTP sessions username will be placed. The ftpTrack() function scans the payload data in a way that httpTrack() and the other protocol tracking functions do. If it inspects the 230 reply code it returns the value FTP_CONN_ACCEPTED. Otherwise, if it inspects a USER command then it records the user name and if it inspects the PASS command, it records the sessions password. As one can observe, the traffic logging subsystem only cares about FTP connection establishment. Other commands that have to do with file transfer or file management that are available by the protocol are of no significance for logging / accounting. After the server positive reply, the connection is not watched any more. However, the total volume of FTP traffic is being recorded, but this does not demand much extra effort. ftpTrack() return values are defined in the ftplist.h file.
#define #define #define #define FTP_NO_ACTION 0 FTP_UN_SNIFFED 1 FTP_PASS_SNIFFED 2 FTP_CONN_ACCEPTED 3
As with the HTTP protocol, the packet capturing program maintains a list of FTP sessions that have not yet received a server reply indicating connection establishment. This list is implemented in a similar way as the HTTP connections list. It composed of structures of the following format.
struct ftpConnListNode { char ftpUserName[30]; char ftpPass[30]; char usrIpAddr[20]; char ftpHostAddr[20]; int port; int count; struct ftpConnListNode *next; }; typedef struct ftpConnListNode ftpNode;
The ftpUserName and ftpPass fields are the FTP sessions user name and password. The usrIpAddr is the ftp clients IP address, while ftpHostAddr is the IP address of the FTP server. The port represents the TCP port that is open at the client side (The port on the server side is TCP port 20 / 21). Finally, the next member of the structure is a pointer to the next element in the list and is used for list traversal. The list of the FTP connections is accessible via the ftpListHead pointer. This is declared as a static variable of ftpNode* type.
static ftpNode* ftpListHead;
This pointer is initialised in the packet_caps main function. Like httpListHead, it is a dummy node. It des not contain information about an actual FTP session. However, it contains the information about the number of nodes (connections) in the list and points to the first valid connection (on startup, the next pointer is obviously set to NULL), just like httpListHead. 56
When the packet capturing program terminates, the list is removed from memory using the clearFtpList() function. clearFtpList() is called by the SIGTERM and SIGINT signal handler. Each time an FTP packet is captured and it is deduced that it belongs to an online P2PWNC user we distinguish two cases. i. The packet was sent by a user to an FTP server and contains FTP commands. In this case, calling ftpTrack() intends to track the users FTP account name and password. If the user name is captured (ftpTrack() returns FTP_UN_SNIFFED) then a new ftpNode* pointer is constructed and appended on the FTP connections list using the ftpListAppNode() function. If the password for the FTP session is captured (ftpTrack() returns FTP_PASS_SNIFFED), then it is supposed that the user name of the session has already been recorded. Thus, the function ftpListAddUserPass() (it can be found in the ftplist.h file, like all other FTP list handling functions) has to be called. This function searches the list to find a specified node and then sets the ftpPass field to the value specified in the functions parameter. The prototype for this call is:
int ftpListAddUserPass(ftpNode* ftplist, char usrip[], char hostip[], int port, char ftpUserPass[]);
The code that performs the above operations is the following (packet_cap.c file, inside updateUserStats() function).
[ . . . ] resFtp = ftpTrack((char*)payload, pldlen, pass, ftpuser); switch (resFtp) { case FTP_UN_SNIFFED: strcpy(hostip, (char*)inet_ntoa(ip->ip_dst.s_addr)); strcpy(usrip, (char*)inet_ntoa(ip->ip_src.s_addr)); bzero(newnode->ftpUserName, 30); strcpy(newnode->ftpUserName, ftpuser); bzero(newnode->usrIpAddr, 20); strcpy(newnode->usrIpAddr, usrip); bzero(newnode->ftpHostAddr, 20); strcpy(newnode->ftpHostAddr, hostip); newnode->port = ntohs(tcp->th_sport); newnode->next = NULL; ftpListAppNode(ftpListHead, newnode); break; case FTP_PASS_SNIFFED: strcpy(hostip, (char*)inet_ntoa(ip->ip_dst.s_addr)); strcpy(usrip, (char*)inet_ntoa(ip->ip_src.s_addr)); ftpListAddUserPass(ftpListHead, usrip, hostip, ntohs(tcp->th_sport), pass); break; case FTP_NO_ACTION: break; default: break;
57
ii. The packet is to be received by a user and probably contains a server reply to the FTP commands of the client. In this case, ftpTrack() is called to determine the servers reply. If the reply is found to be FTP_CONN_ACCEPTED, then the node of the list that represents this connection has to be found. The ftpListFindConn function is used and returns a pointer to the node found. The search criteria are the same as in the HTTP request list. After finding the node whose connection is finally established, the database recording of the captured data takes place. As discussed at an earlier section of the document, the MySQL database table that stores FTP information is the ftp table. The primary key of this table is the triplet
(f_username,f_ftp_host,f_ftp_user_name)
These three values uniquely identify a P2PWNC users account information database record. For example, a P2PWNC user with an f_username can have multiple accounts on the FTP server f_ftp_host and an FTP server user f_ftp_user_name can log in his FTP account using different P2PWNC user names (this may happen if two P2PWNC users share the same FTP account. If the FTP user name and FTP server address that have been captured, together with the user name of the P2PWNC user that made the FTP connection (these fields compose the database tables primary key) do not exist in the database, then an INSERT query must be executed. Determining the above case is performed by first executing the UPDATE query. If that does not affect any rows, it means that the record to be updated does not exist in the database and, thus, has to be inserted. Otherwise, the existing record is updated with the new statistics. In particular, the f_ftp_pass field is updated with the (more recent) account password that was captured. Also, the f_ftp_conn_count field is incremented by 1. Taking a look at the following piece of code would clarify the above description.
resFtp = ftpTrack((char*)payload, pldlen, pass, ftpuser); if (resFtp == FTP_CONN_ACCEPTED) { bzero(hostip, 20); bzero(usrip, 20); strcpy(hostip, (char*)inet_ntoa(ip->ip_src.s_addr)); strcpy(usrip, (char*)inet_ntoa(ip->ip_dst.s_addr)); thenode = ftpListFindConn(ftpListHead, usrip, hostip, ntohs(tcp->th_dport)); if (thenode) { sprintf(sqlUpdateFtpStats, "UPDATE ftp SET f_ftp_c_count=f_ftp_c_count+1, f_ftp_pass='%s' where f_username='%s' and f_ftp_host='%s' and f_ftp_user_name='%s';", thenode->ftpPass, usrrcv->username, hostip, thenode->ftpUserName); sprintf(sqlInsertFtpStats, "INSERT INTO ftp (f_username, f_ftp_host, f_ftp_user_name, f_ftp_pass, f_ftp_c_count) VALUES ('%s', '%s', '%s', '%s', 1);", usrrcv->username, hostip, thenode->ftpUserName, thenode>ftpPass);
58
if (mysql_query(conn, sqlUpdateFtpStats)!=0) { printf("%s\n", mysql_error(conn)); return; } if (!mysql_affected_rows(conn)) { /* if the record does no exist in the db, the UPDATE statement wont affect any rows, thus we insert the record for the first time */ if (mysql_query(conn, sqlInsertFtpStats)!=0) { printf("%s\n", mysql_error(conn)); return; } } ftpListRemoveConn(ftpListHead, hostip, usrip, ntohs(tcp->th_dport)); } [ code taken from packet_cap.c ]
HELO is followed by a 250 OK reply. ii. Afterwards, the sender of the mail issues a MAIL command, which identifies him.
MAIL From:<pfrag@aueb.gr><CRLF>
Then, if the above command is accepted, the receiver returns a 250 OK reply
250 OK <CRLF>
iii. After that, the sender must specify the recipient via the RCPT command:
RCPT To<somereceiver@somewhere.gr><crlf>
Again, the sender must receive a 250 OK reply. However, if the recipient is unknown, the sender gets a 550 Failure reply. The RCPT command can be repeated any number of times.
59
iv. After the recipient has been specified, the sender issues the DATA command.
DATA<CRLF>
If accepted, the receiver sends a 354 reply and considers all succeeding lines to be message text. The message text is terminated with a period in a single line (a <CRLF>.<CRLF> sequence). When the end of text is received and stored, the receiver returns a 250 OK reply. v. Finally, after the messages have been sent, the sender entity can issue the QUIT command so as to close the connection. The above description implies that the command sequence is controlled by the sender entity. The typical SMTP scenario discussed, as well as the algorithm for SMTP connection tracking were based on [18].
2.5.2.4.2 SMTP Connection Tracking Algorithm SMTP tracking also works in a similar way as the two previous protocols. The function that extracts information about SMTP connections by examining packet data is called smtpTrack() and has the following prototype.
unsigned int smtpTrack(char* payload, int pldlen, char* from, char* to, char* subject)
Unlike the corresponding functions of the other protocols, this one returns a hexadecimal number containing information about the operations that took place. Due to the fact that SMTP clients can send more than one field values that the system is interested in capturing in one packet (e.g the RCPT command and the Received field may be in the same packet), if the function captures more than one fields, it adds the corresponding return values bitwise and returns the sum as a hexadecimal number. The following piece of code of the smtpTrack() function will make things more clear.
[ . . . ] unsigned int flags = 0x00; [ . . . ] if (p1 = strstr(pld, "received:")) { /* received field found */ [ . . . ] flags |= SMTP_RECEIVED; } if (p1 = strstr(pld, "mail from:<")) { /* mail from command found */ [ . . . ] flags |= SMTP_SNDR_SNIFFED; } if (p1 = strstr(pld, "rcpt to:<")) { /* rcpt to command found */ [ . . . ] flags |= SMTP_RCPT_SNIFFED; } [ . . . ] return flags; [ code taken from packet_cap.c ]
60
This function can capture the mail sender, recipient and subject, when available. The idea of using linked lists for keeping track of the SMTP connections users make is applied here, too. The SMTP list is composed of nodes of the following structure.
struct smtpConnListNode { char smtpFrom[40]; char smtpTo[40]; char usrIpAddr[20]; char smtpSrvAddr[20]; char subject[200]; int datasent; int rsetreq; int port; int count; struct smtpConnListNode *next; }; typedef struct smtpConnListNode smtpNode;
The smtpFrom and smtpTo fields are the mail sender and receiver of the mail. usrIpAddr and smtpSrvAddr are the users and the SMTP receiving entitys (server) IP addresses. subject refers to the mail subject, datasent is a flag indicating whether the DATA command has been issued, rsetreq shows if the RSET command has been issued. port, count and next have the same semantics as in the other protocol connection lists. Actually, this list keeps track of the mail messages sent rather than the connections established. In the packet capturing daemon program, this list is accessible via the smtpListHead pointer. This is a static pointer of smtpNode* type, which points to the head of the list. The same as httpListHead and ftpListHead apply for this node too. Its a dummy one, but stores the list length. Similar to the HTTP and FTP protocols that we have discussed about up to this point, are the cases when an SMTP packet is captured and it is determined that is sent or is to be received by a P2PWNC user. The discussion about the two cases follows i. The packet was sent by a user to a remote hosts port 25. smtpTrack() is called. The return value is subject to a bitwise AND operation, with the right operand of the AND function being SMTP_SNDR_SNIFFED, SMTP_RCPT_SNIFFED, SMTP_RECEIVED and SMTP_RSET_REQ in return. If the result of the bitwise operation is non zero, then, for each case, special handling
61
takes place. Such a non zero result means that the ANDed flags are part the return value of. smtpTrack(). In particular, if SMTP_SNDR_SNIFFED is part of the return value, a new smtpNode* node is constructed and appended to the list in a similar way as in httpTrack() and ftpTrack() functions, using the function smtpAppNode(). If this happens for SMTP_RCPT_SNIFFED then the smtpListAddRcpt() function is called to add the mail recipient value to the appropriate node in the list. The SMTP_RECEIVED flag indicates that the mail subject has been inspected (the received flag shows that a mail header is present, where information about the mail subject can be extracted). Adding the mail subject in the appropriate node of the list is carried out by the smtpAddSubject() function call. Finally, SMTP_RSET_REQ indicates that a user has sent the RSET command. The appropriate list handling function for this case is smtpAddRsetReq(). The above operations are presented in the next piece of code.
pldlen = ( (ntohs(ip->ip_len) > 1024) ? 1024 : (ntohs(ip->ip_len) size_ip - size_tcp) ); bzero(from, 200); bzero(to, 200); bzero(subject, 200); smtpres = smtpTrack((char*)payload, pldlen, from, to, subject); if (smtpres & SMTP_SNDR_SNIFFED) { strcpy(hostip, (char*)inet_ntoa(ip->ip_dst.s_addr)); strcpy(usrip, (char*)inet_ntoa(ip->ip_src.s_addr)); bzero(newSmtpNode->smtpFrom, 40); strcpy(newSmtpNode->smtpFrom, from); bzero(newSmtpNode->usrIpAddr, 20); strcpy(newSmtpNode->usrIpAddr, usrip); bzero(newSmtpNode->smtpSrvAddr, 20); strcpy(newSmtpNode->smtpSrvAddr, hostip); newSmtpNode->port = ntohs(tcp->th_sport); newSmtpNode->datasent = 0; newSmtpNode->rsetreq = 0; newSmtpNode->next = NULL; smtpListAppNode(smtpListHead, newSmtpNode); } if (smtpres & SMTP_RCPT_SNIFFED) { strcpy(hostip, (char*)inet_ntoa(ip->ip_dst.s_addr)); strcpy(usrip, (char*)inet_ntoa(ip->ip_src.s_addr)); smtpListAddRcpt(smtpListHead, usrip, hostip, ntohs(tcp->th_sport), to); } if (smtpres & SMTP_RECEIVED) { strcpy(hostip, (char*)inet_ntoa(ip->ip_dst.s_addr)); strcpy(usrip, (char*)inet_ntoa(ip->ip_src.s_addr)); smtpListAddSubject(smtpListHead, usrip, hostip, ntohs(tcp->th_sport), subject); } if (smtpres & SMTP_RSET_REQ) { strcpy(hostip, (char*)inet_ntoa(ip->ip_dst.s_addr)); strcpy(usrip, (char*)inet_ntoa(ip->ip_src.s_addr)); smtpListAddRsetReq(smtpListHead, usrip, hostip, ntohs(tcp->th_sport)); } [ code taken from packet_cap.c ]
62
ii. The packet is sent to the user by a remote hosts port 25. This packet may contain the response of the receiving SMTP entity. Again, smtpTrack() has to be called. In case its return value is SMTP_250_OK, it means that the server entity has confirmed the last users command. Therefore, we first search the list to find the SMTP connection the server reply is a part of. Then, we check the nodes rsetreq field. If this field / flag is set (value 1), we come to the conclusion that the servers 250 reply refers to the RSET request of the user. So, we have to set the rsetreq field to 0, set datasent to 0 and clear the smtpFrom and smtpTo fields, as the RSET command specifies that the current mail transaction is to be aborted and any stored data must be discarded. Next, we check the nodes datasent field. If this field has the value 1, then the 250 OK reply of the server refered to the successful sending of the mail data. Thus, the appropriate record has to be inserted into the MySQL database and the node must be removed from the list. If, though, the return value of smtpTrack() is SMTP_354, this means, according to the SMTP protocol specification, the user has just issued a DATA command, so the server is waiting to receive the mail data. In this case, we have to set the datasent field of the node to 1. This is performed by the smtpListAddDataSent() function. The next code fragment deals with the case when a packet is sent to the user from the servers port 25.
pldlen = ( (ntohs(ip->ip_len) > 1024) ? 1024 : (ntohs(ip->ip_len) size_ip - size_tcp) ); bzero(from, 200); bzero(to, 200); smtpres = smtpTrack((char*)payload, pldlen, from, to, subject); if (smtpres == SMTP_250_OK) { strcpy(hostip, (char*)inet_ntoa(ip->ip_src.s_addr)); strcpy(usrip, (char*)inet_ntoa(ip->ip_dst.s_addr)); tempSmtpNode = smtpListFindConn(smtpListHead, usrip, hostip, ntohs(tcp->th_dport)); if (tempSmtpNode) { if (tempSmtpNode->rsetreq == 1) { bzero(tempSmtpNode->smtpFrom, 40); bzero(tempSmtpNode->smtpTo, 40); tempSmtpNode->datasent = 0; tempSmtpNode->rsetreq = 0; } else if (tempSmtpNode->datasent == 1) { bzero(sqlInsertSmtpStats, 400); sprintf(sqlInsertSmtpStats, "INSERT INTO smtp (sm_username, sm_smtp_from, sm_smtp_to, sm_subject) VALUES ('%s', '%s', '%s', '%s');", usrrcv->username, tempSmtpNode->smtpFrom, tempSmtpNode->smtpTo, tempSmtpNode->subject); if (mysql_query(conn, sqlInsertSmtpStats)!=0) {
63
printf("%s\n", mysql_error(conn)); return; } smtpListRemoveConn(smtpListHead, hostip, usrip, ntohs(tcp->th_dport)); } } } else if (smtpres == SMTP_354) { strcpy(hostip, (char*)inet_ntoa(ip->ip_src.s_addr)); strcpy(usrip, (char*)inet_ntoa(ip->ip_dst.s_addr)); tempSmtpNode = smtpListFindConn(smtpListHead, usrip, hostip, ntohs(tcp->th_dport)); smtpListAddDataSent(smtpListHead, usrip, hostip, ntohs(tcp->th_dport)); } [ code taken from packet_cap.c ]
64
In the authorization state, the only valid commands are USER, APOP, PASS and QUIT. Other commands produce an error. iv. After a user has successfully carried out the above state, he enters the transaction state. While in transaction state, a user can operate on his mailbox. One can get a list of the messages stored there for him, retrieve messages, mark others for deletion, get information about this mailbox, etc. If he wishes to get a list of the stored messages or information about a specific message, he issues the LIST command. If no parameters exist, a list of all messages is returned. If the number of a message that exists is specified, the number and the length of the message in bytes is returned. Otherwise, the client gets a ERR response.
LIST +OK 10 messages (35044 octets) 1 230 2 4401 3 1024 [ . . . ] LIST 1 + OK 230 LIST 25 - ERR no such message, only two messages in maildrop
If the user wishes to retrieve one of those messages, he has to enter the RETR command specifying the number of message to be retrieved. In case of an invalid number, -ERR is returned by the server. Also, the message number must not refer to a message marked as deleted.
RETR 1 + OK 230 octets <the server sends message 1 here> .
The commands permitted in transaction state are STAT, LIST, RETR, DELE, NOOP, RSET, QUIT, TOP (optional) and UIDL (optional). v. Finally, a user can issue the QUIT command from the transaction state, the POP3 session enters the update state. Therefore, messages marked for deletion are removed and the TCP connection is closed.
QUIT +OK sometext
65
POP3 Client
POP3 Server
PASS mypass
RETR 1 +OK 230 octets <the server sends message 1 here> . QUIT +OK . . . (Closing TCP Connection)
2.5.2.5.2 POP3 Connection Tracking Algorithm The last protocol connection and data tracking operations that will be discussed refer to the Post Office Protocol Version 3. What differentiates this protocol from the previous ones is that here not only do we have to keep a list of all POP3 connections, but also, for each of these connections we have to maintain a list with the e-mail messages that have been retrieved within it. The function that performs the data tracking operations is now called pop3Track(). Its prototype is:
int pop3Track(char* payload, int pldlen, char* user, char* pass,
66
Some of the fields are self explanatory or have been discussed in previous sections. The sn field refers to the serial number of the message that a user wishes to retrieve (the number of the message in the users message list on the server). In the srvresp field the function copies the status indicator (+OK, -ERR) and the text description of a servers response. pop3msg is a pointer to a pop3msg struct (this structure will soon be described). This struct holds an e-mail messages information (e.g. sender, length, etc.). srv is a flag that indicates whether the packet is sent by the server (value 1) or by the user (value 0). The pop3Track() function works like the HTTP and the other protocol tracking functions which were described in detail before. The definitions of the return values of pop3Track() are: (in case srv is set to 1)
#define #define #define #define POP3_NO_ACTION 0 POP3_SERVER_OK 1 POP3_SERVER_ERR 2 POP3_RECEIVED 8
(these can be found in the pop3list.h file) The pop3msg structure which was mentioned before will now be described. It is a C struct with the following format:
struct pop3MsgListNode { int sn; char subject[255]; char sender[255]; char date[255]; char id[255]; int length; int count; int timesretrieved; int todelete; struct pop3MsgListNode* next;
};
typedef struct pop3MsgListNode pop3msg;
pop3msg is in fact a node in the list of messages that a user has retrieved within a POP3 connection. This linked list works just like the other lists described in the previous sections. It supports the basic list handling operations, which are insertion (only the append method is implemented), node removal and search by the sn field. Its implementation is in the file pop3list.h, together with the implementation of the pop3Node structure. 67
Most of the fields of the above struct are also self-explanatory. The length field represents the messages length in bytes. timesretrieved means how many times the message has been retrieved within a POP3 connection. todelete is set to 1 if the message is marked for deletion. A pop3Node structure is as follows:
struct pop3ConnListNode { char clientAddr[20]; char srvAddr[20]; char pop3user[40]; char pop3pass[40]; int port; int count; int int int int int retrrequested; delereq; quitreq; apopused; authstate;
/* this connections message list */ struct pop3MsgListNode *msgl; struct pop3ConnListNode *next; }; typedef struct pop3ConnListNode pop3Node;
This data structure represents a node in the POP3 connections list. A brief description of its fields follows. clientAddr: POP3 clients IP address srvAddr: POP3 servers IP address port: clients TCP port count: List element count pop3user: POP3 account user name pop3pass: POP3 account password retrrequested: indicates the number of the message the user asked to retrieve delereq: carries the number of the message the user marked for deletion quitreq: true if the user issued a QUIT command apopused: true if the user is using the APOP authentication scheme authstate: true if the user is in the authorization state msgl: The pointer to the head of the list of messages of the connection next: Pointer to the next list element The POP3 connections list is accessible in the packet_cap program via the pop3ListHead pointer, which is declared as a static variable of type pop3Node*. When a POP3 packet has been intercepted there are the two cases that have to do with the origin of the packet.
68
i. The packet was sent to the POP3 server by the POP3 user (client), carrying client commands. As one might expect, the pop3Track() function is called. If it returns the value POP3_USERNAME_SNIFFED, then a new pop3Node* is constructed and appended to the list of connections (pop3ListAppNode() ). The user is supposed to be in the authorization state, therefore the authstate flag is set. In case POP3_PASS_SNIFFED is returned, it means that the user has issued the PASS command. pop3Track() has recorded the sessions password. So, a search in the list is conducted to find the pop3Node that has the proper clientAddr, srvAddr and port values (the values that compose the key that can uniquely identify a connection). If the node is found, then its password value is set to the value of the captured password. If, however, the return value is POP3_APOP_USED, the system has concluded that the APOP authentication scheme is used, so the nodes password field is to be ignored. When the APOP command is used, the authorization is carried out with one command, while in the USER-PASS scheme, two commands are demanded. Thus, in this case, we have to append a new node to the list with the clientAddr, srvAddr and port values set to the packets source IP, destination IP and source port values. Also, POP3_APOP_USED indicates that the user name for the POP3 mailbox is recorded, so it is also written to the new node. pop3pass is cleared and the flags authstate and apopused are set. Finally, the node is appended to the list pointed to by pop3ListHead using the pop3ListAppNode. In case the return value of pop3Track() is POP3_RETR_REQ it is deduced that a user has requested to retrieve a message, whose message number is tracked and stored in the msgsn parameter, which is passed to the function by value. So, a search is conducted in the way described before, using the pop3ListFindConn() function. Then, the retrrequested value of the found node is set to msgsn. Last, if the return value is POP3_QUIT_REQ, the user has issued the QUIT command, so the quitreq field of the appropriate node of the list must be set to 1.
pldlen = ( (ntohs(ip->ip_len) > 3000) ? 3000 : (ntohs(ip->ip_len) size_ip - size_tcp) ); bzero(pop3user, 40); bzero(pop3pass, 40); resPop3 = pop3Track((char*)payload, pldlen, pop3user, pop3pass, &msgsn, srvresp, newPop3msg, 0); strcpy(hostip, (char*)inet_ntoa(ip->ip_dst.s_addr)); strcpy(usrip, (char*)inet_ntoa(ip->ip_src.s_addr)); if (resPop3 == POP3_USERNAME_SNIFFED) { bzero(newPop3Node->pop3user, 40); strcpy(newPop3Node->pop3user, pop3user); bzero(newPop3Node->clientAddr, 20); strcpy(newPop3Node->clientAddr, usrip); bzero(newPop3Node->srvAddr, 20); strcpy(newPop3Node->srvAddr, hostip); newPop3Node->authstate = 1; newPop3Node->port = ntohs(tcp->th_sport); pop3ListAppNode(pop3ListHead, newPop3Node);
69
} if (resPop3 == POP3_PASS_SNIFFED) { if (tempPop3Node = pop3ListFindConn(pop3ListHead, usrip, hostip, ntohs(tcp->th_sport))) { bzero(tempPop3Node->pop3pass, 40); strcpy(tempPop3Node->pop3pass, pop3pass); } } if (resPop3 == POP3_APOP_USED) { bzero(newPop3Node->pop3user, 40); strcpy(newPop3Node->pop3user, pop3user); bzero(newPop3Node->clientAddr, 20); strcpy(newPop3Node->clientAddr, usrip); bzero(newPop3Node->srvAddr, 20); strcpy(newPop3Node->srvAddr, hostip); newPop3Node->port = ntohs(tcp->th_sport); bzero(newPop3Node->pop3pass, 40); newPop3Node->apopused = 1; newPop3Node->authstate = 1; pop3ListAppNode(pop3ListHead, newPop3Node); } if (resPop3 == POP3_RETR_REQ) { if (tempPop3Node = pop3ListFindConn(pop3ListHead, usrip, hostip, ntohs(tcp->th_sport))) { tempPop3Node->retrrequested = msgsn; } } if (resPop3 == POP3_QUIT_REQ) { if (tempPop3Node = pop3ListFindConn(pop3ListHead, usrip, hostip, ntohs(tcp->th_sport))) { tempPop3Node->quitreq = 1; } } [ code taken from packet_cap.c ]
ii. The second case is that the user has received a packet with the source TCP port having the value 110. That is, this packet originates from a POP3 server. In this case, pop3Track() is called with the srv flag set to 1. The possible return values are: - POP3_SERVER_OK: This indicates that the server has replied with a +OK status. We have to determine, though, which user command the +OK response replies to. For this purpose, we have to check the users state, together with some other information. If the users authstate field is set to 1, then the user has issued a USER or APOP command, but the server reply indicating successful login had not arrived yet, so that the user could enter the transaction state. If authstate is non-zero and the user has issued the APOP command, then the +OK reply means that the user has successfully logged in. Thus, the necessary updates have to be performed. In particular, appropriate record in the pop3_users table must be update with an incremented value of the field connection count, or, if the record does not exist, to insert it via an INSERT statement. If the authstate flag is set the nodes pop3pass field is not of zero length, this shows that the PASS command has already been set. Getting a +OK reply from the server would mean that the login is successful. However, we have to check the value of quitreq (because the QUIT command is valid in the authorization state). If quitreq is
70
set, the +OK status indicator is a reply to the QUIT command, so no updates should be made in the database. The node must just be removed from the list. Otherwise, the login process has been carried out successfully so the necessary database operations take place. A server +OK reply is also sent when the user issues a RETR command. The retrreq field, then, has the value of the message to be retrieved. In such a case, the +OK may be followed by a text specifying the length of the message:
+OK 12033 octets
(The actual e-mail data will be contained in the next packet). Now, a new message (pop3msg) struct must be appended on the connections (pop3Node) message list (msgl field). The whole server response string is stored in the srvresp parameter of the pop3Track() function. The program then examines the srvresp string. If it can extract length information, it stores it in the length fields of the new pop3msg structure. In the end, it appends the new pop3msg struct to the list of messages of the connection.
- POP3_ RECEIVED: This return value indicates that the received field has been found in the packet data. Therefore, this packet carries the e-mail content. The pop3Track() function has now extracted the sender, date, message id and subject of the mail message. The next step is to search the message list of the corresponding node. If the user has asked to retrieve this message for the first time, then the extracted information (sender, subject, etc) are copied to the message node (pop3msg) and the message node timesretrieved is set to 1. In case the message has been retrieved before (within this connection), the timesretrieved counter is incremented. Then, the necessary database insert or update operations take place. In particular, if there is the record of a message with this ID, retrieved by this P2PWNC user from this POP3 server in the pop3 MySQL table, this record is updated. Otherwise, it is inserted for the first time. - POP3_SERVER_ERR: This means that the server replied with a ERR status indicator. In this case (if ERR arrives in the authorization state), the connection (pop3Node struct) is removed from the list of the POP3 connections. All the above are shown in the next piece of code.
pldlen = ( (ntohs(ip->ip_len) > 3000) ? 3000 : (ntohs(ip->ip_len) size_ip - size_tcp) ); resPop3 = pop3Track((char*)payload, pldlen, pop3user, pop3pass, &msgsn, srvresp, newPop3msg, 1); bzero(hostip, 20); bzero(usrip, 20); strcpy(hostip, (char*)inet_ntoa(ip->ip_src.s_addr)); strcpy(usrip, (char*)inet_ntoa(ip->ip_dst.s_addr)); if (resPop3 == POP3_SERVER_OK) { if (tempPop3Node = pop3ListFindConn(pop3ListHead, usrip, hostip, ntohs(tcp->th_dport))){ if ( (tempPop3Node->apopused == 1) && (tempPop3Node->authstate) ){
71
/* user logged in using APOP */ [ . . . ] /* database updates and other operations*/ } else if (tempPop3Node->authstate && strlen(tempPop3Node->pop3pass) && (!tempPop3Node->quitreq)) { /* user logged in using username / pass */ [ . . . ] /* database updates and other operations*/ } if (tempPop3Node->quitreq) { /* user QUITTED - remove connection */ pop3ListRemoveConn(pop3ListHead, hostip, usrip, ntohs(tcp->th_dport)); } if (tempPop3Node->retrrequested) { /* user requested to retrieve msg */ /* find message length (if possible) and append node */ [ . . . ] newPop3msg->sn = tempPop3Node->retrrequested; msgListAppNode(tempPop3Node->msgl, newPop3msg, tempPop3Node->retrrequested); } } } } if (resPop3 == POP3_RECEIVED) { if (tempPop3Node = pop3ListFindConn(pop3ListHead, usrip, hostip, ntohs(tcp->th_dport))){ tempPop3msg = msgListFindNode(tempPop3Node->msgl, tempPop3Node->retrrequested); if (tempPop3msg) { if (tempPop3msg->timesretrieved <= 1) { /* copy message info (sender, subject, etc) to the new message node */ [ . . . ] /* database updates/insertions for the message */ } } } } if (resPop3 == POP3_SERVER_ERR) { if (tempPop3Node = pop3ListFindConn(pop3ListHead, usrip, hostip, ntohs(tcp->th_dport))){ if ( (tempPop3Node->apopused == 1) && (tempPop3Node>authstate) ){ /* error in authorization state using apop. */ /* removing connection */ pop3ListRemoveConn(pop3ListHead, usrip, hostip, ntohs(tcp->th_dport)); } else if (tempPop3Node->authstate && tempPop3Node->pop3pass) {
72
/* error in authorization state using user / pass auth. scheme */ /* removing connection */ pop3ListRemoveConn(pop3ListHead, usrip, hostip, ntohs(tcp->th_dport)); } } } [ code taken from packet_cap.c ]
2.6 Demonstration
In this section some screenshots of the database where traffic statistics are stored will be shown. These statistics were generated by the traffic logging daemon. The MySQL database administration tool, where the statistics are presented. is phpMyAdmin. First, the information that was gathered concerning the FTP protocol will be presented and then the SMTP statistics.
73
74
Database
Statistics Server
Figure 17. Statistics Client and Server Architecture and Interconnection with the Traffic Logging Subsystem
75
76
compatible with the database system used so that it can retrieve data. From this point of the document, by the term database we will mean the data storage system the server uses for storing the statistics and the authorization information, regardless of its type and implementation. Requests and responses are actually XML documents. As a matter of fact, there must be some way of parsing these documents to extract the data they contain (field codes, criteria operators, actual database records, etc). The parser used by the client and the server is not specified by the protocol either. In this implementation, libxml library was used. Obviously, there is no specification of the programming language used for implementing the protocol. What is more, although this protocol runs on top of TCP, there is no such limitation as to the transport protocol to be used. UDP could might as well be used. However, the use of TCP is suggested. It offers mechanisms of guaranteed delivery of packets, which is important as the protocol being described is used for exchanging accounting data. Also, TCP is more appropriate in case of responses that are fragmented, where losing the last one (which signals the end of the stream of response XML documents) might cause problems to the client program. Also, TCP is closer to the connection-oriented nature of the described protocol.
request-version = <?xml version= DQUOTE version-num DQUOTE ?> version-num = 1.0 request-header = <request type= DQUOTE request-type DQUOTE > request-type = LOGIN / LOGOUT / STATS / CHANGEPASS request-fields = <fields> *10(db-field-tag) <\fields>
77
field-tag = auth-field-tag / db-field-tag auth-field-tag = <field> auth-field-code </field> db-field-tag = <field> db-field-code </field> request-criteria = <crlist> criteria-num-tag criteria-list <\crlist> criteria-num-tag = <crnum> criteria-count <\crnum> criteria-count = 1*DIGIT ; criteria-count is < 10 criteria-list = *(criterion criteria-operator) criterion criterion = <criteria> db-field-tag operator-tag value-tag <\criteria> operator-tag = <operator> operator-val <\operator> value-tag = <value> val <\value> val = *VCHAR criteria-operator = <coperator> coperator-val <\coperator> request-close-tag = <request> db-field-code = 1 / 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 / / / / / / / / / / / / / / / / / / / / / / / / / / / / /
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
username, concerning the total IP statistics section users domain name, concerning the total IP statistics section users real name privileged user total IP upload volume total IP download volume total HTTP upload volume total HTTP download volume total FTP upload volume total FTP download volume total SMTP upload volume total SMTP download volume total TELNET upload volume total TELNET download volume total POP3 upload volume total POP3 download volume total SSH upload volume total SSH download volume user online username, concerning the users section realname, concerning the users section users assigned IP address user online, concerning the users section users MAC address user name, concerning the FTP section ftp host ftp user name (account name) ftp password ftp connection count username, concerning the SMTP section
78
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
/ / / / / / / / / / / / / / / / / / / / / / / / / / / / /
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
SMTP sender SMTP recipient SMTP mail subject username (HTTP section) HTTP request host HTTP request method HTTP request URI HTTP request user agent username (POP3 users section) POP3 server POP3 username (account name) POP3 password POP3 connection count using the APOP authentication scheme username (POP3 section) POP3 server POP3 user (account name) POP3 mail sender mail date (as in the mail header) mail subject (POP3 section) POP3 mail message ID mail message length (in octets) how many times a mail message has been retrieved HTTP request ID administrators username administrators password administrators real name administrators client IP address administrator is running a client (logged in) exact date of the administrators last login
auth-field-code = 70 / ; clients login 3.4.3.1 REQUEST 71 / ; clients logout request 72 ; change administrators password error-field = 300 operator-val = 1 / 2 / 3 / 4 / 5 / 6 / 7 coperator-val = 1 / ; AND 2 ; OR error-val = 200
; ; ; ; ; ; ;
Equals Bigger than Bigger than or equal Less than Less than or equal Like operator (similarity, as in SQL) Different than (<>)
The response specification follows. field-tag and value-tag have been defined in the request specification. 79
RESPONSE =
response-version = <?xml version= DQUOTE version-num DQUOTE ?> version-num = 1.0 response-header = <response type= DQUOTE response-type DQUOTE SP fragged= DQUOTE fragged-val DQUOTE > response-type = LOGIN / LOGOUT / STATS / CHANGEPASS fragged-val = TRUE / FALSE response-pairs = <pairs> *10(field-value-pair) </pairs> field-value-pair = <pair> field-tag value-tag </pair> response-close-tag = </response>
3.4.3.1 REQUEST
A request is made up of some header information, the requested fields and the criteria according to which the server will form its reply. 3.4.3.1.1 Request version and header information Header information includes the XML version (1.0) and the type of the request. As one can see, there are four types of a request. LOGIN, LOGOUT and CHANGEPASS have to do with the authentication procedure between the client and the server, which will be discussed more extensively later. The other type, STATS, refers to a request for data (statistics). 3.4.3.1.2 Request fields This part of the request refers to the fields and their values that are sent to the server. A field (field-tag) is an integer code enclosed between <field> and </field> tags. In case of a request, a field can only take one of the db-field-code values. These values refer to actual stored data (either traffic data or administrator information such as user name and password) in contrast to auth-field-code values. A db-field-tag can directly be mapped to a database field (for example the total IP downloads, or an FTP password, etc.). This mapping is a task the server must carry out. The current version of this protocol permits requests that ask for at most 10 field values. 3.4.3.1.3 Request criteria The request criteria are used by the server as the criteria for the search in the database. Together with the fields of a request, they have to be interpreted by the server to some form of database query, so that it can gather the necessary results and send them back
80
to the client. In this implementation, the server communicates with a MySQL database. Therefore, it has to convert the XML request that is sent by the client program to a valid SQL query. Obviously, the request-fields of the request refer to the database fields that are specified in the first part of a SELECT SQL query, while the request-criteria are interpreted as its WHERE clause. The criteria described above represent query conditions. There may be more than one of them in a query (request). Criteria are put in a block which is enclosed by a <crlist> and a </crlist> tags. The number of them is specified by criteria-numtag, as shown in the protocol syntax. If the list of criteria contains more than one of them, the boolean operators (criteria-operators) that connect them must be specified. Since criteria are equivalent to conditions, boolean operators may be applied to them, in a way that different conditions may be ORred, ANDed, etc. in the WHERE clause of an SQL statement. Obviously, in case only one criterion is included in a request, no boolean operators are present. Every criterion is composed of a db-field-tag, an operator-tag and a value-tag. That is, a criterion of a request looks like
<criteria> <field>field code</field> <operator>operator code</operator> <value>criterion value</value> </criteria>
In case of a STATS request, the request can directly be mapped to a SELECT SQL query or an equivalent query form. However, if the type is LOGIN, LOGOUT or CHANGEPASS other operations take place. Namely, the server is not asked to retrieve any traffic statistics, but merely to check the administrators credentials and perform operations that have to do with user authentication. In this case, the requestfields section is ignored (even if they contain valid field codes, the server will not deal with them). The request-criteria section contains user authentication information such as user name, password, new password (in case the type is CHANGEPASS), etc. However, even if this procedure is different than statistics retrieval, the syntax of a request remains the same. It has to be mentioned that the current version of the protocol permits at most 5 criteria.
3.4.3.2 RESPONSE
A response is made up of header information and field-value pairs, which form the results of the servers operations. 3.4.3.2.1 Response version and header Information Like the request case, the first line contains the XML version used. The response header includes the type of the response (LOGIN, LOGOUT, CHANGEPASS, STATS) and the fragged flag. This flag can take the values TRUE and FALSE, which must be specified in the <response> tag in upper case. The purpose of the frag property is to inform the client that received this response whether or not this XML response is the last one of the stream of XML documents that the server sent as a result of a clients request. A TRUE value indicates that the servers
81
response was fragmented in more that one XML documents so the client must be prepared to receive more of them. 3.4.3.2.2 Response pairs After the response-header comes the response-pairs segment. This part is composed of field-value pairs. The fields are the codes of the fields that a client requested or server operation-specific fields. The values are those that the server has to sent back after querying the database or performing the necessary authorization operations. The values are arbitrary sets of characters. However, in special cases values must be welldefined. This will be described at a later section. For example, if a client has issued a request asking for user someuser@somedomains real name (field code: 3) and whether he is online or not (field code: 19), the response of the server, in case the user is online and his real name is Some User, will contain the following pairs:
[ . . . ] <pairs> <pair> <field>3</field> <value>Some User</value> </pair> <pair> <field>19</field> <value>y</field> </pair> </pairs> [ . . . ]
In such a case, the request-fields section is to be ignored by the server. The requestcriteria section, though, is very important, as it carries the administrators login credentials. In particular, in the criteria-num-tag, which holds the number of criteria that will follow in the list of criteria, the value of criteria-count is 2. The first of these criteria refers to the administrators user name and the second to the administrators
82
password. The user name field must be put prior to the password field. The field code of the user name field must be 55 and the field code of the password must be 56. The operator-tag of each of these two criteria as well as criteria-operators are ignored. An example of the request-criteria section for a login request of a user named someadmin with a password somepass who wishes to login would be the following.
[ . . . ] <crlist> <crnum>2</crnum> <criteria> <field>55</field> <operator>1</operator> <value>someadmin</value> </criteria> <coperator>1</coperator> <criteria> <field>56</field> <operator>1<operator> <value>somepass</value> </criteria> </crlist> [ . . . ]
After sending the LOGIN request to the server, the client must expect to receive a confirmation reply by the server. The confirmation reply is a response of LOGIN type, either accepting or rejecting the users request containing one field-value-pair with the following format:
<pair> <field>70</field> <value>ACCEPTED</field> </pair>
The field code 70 indicates that the response refers to a login request and is an authfield-code as described in the protocol specification.
Like the login function, the request-fields section is not needed and is to be ignored by the server that will receive the response. Also, the same constraints as in a LOGIN request must be taken into consideration when forming the request-criteria section. Thus, the request-criteria section for the logout procedure of someadmin (of the previous example) will be exactly the same as in the LOGIN request.
[ . . . ] <crlist> <crnum>2</crnum> <criteria> <field>55</field>
83
<operator>1</operator> <value>someadmin</value> </criteria> <coperator>1</coperator> <criteria> <field>56</field> <operator>1<operator> <value>somepass</value> </criteria> </crlist> [ . . . ]
After receiving the logout request, the client must wait to receive a logout confirmation response, of a format that will be discussed in the servers logout function section.
As in all authorization functions of the client, the request-field section is to be ignored. The request-criteria section is composed of three criteria. The first corresponds to the name of the protocol user (administrators user name), the second to the old password and the third to the desired new password value. The criterianum-tag now has a criteria-count value of 3. The first criterion which refers to the username, has 55 field code. The other two criteria, as they both refer to password values, have a 56 db-field-tag field code. operator-tags and criteria-operators are ignored in the same way fields are ignored. Again, the order of the criteria must not be violated, which means that the user name criterion comes first, the old password criterion comes second and the one referring to the new password comes third. The example that follows shows the request-criteria section of a request that a client must send, should the connected user, named someadmin, wish to change his password from somepass to foo.
[ . . . ] <crlist> <crnum>3</crnum> <criteria> <field>55</field> <operator>1</operator> <value>someadmin</value> </criteria> <coperator>1</coperator> <criteria> <field>56</field> <operator>1<operator> <value>somepass</value> </criteria> <coperator>1</coperator> <criteria> <field>56</field>
84
As in the other authorization functions, the client that sends a CHANGEPASS request must receive and parse the servers confirmation response, which will also be of type CHANGEPASS and carry the ACCEPTED or REJECTED reply.
The presence of the request-fields is crucial, as the requested information must be specified there. As shown in the ABNF protocol description, the maximum number of fields that a client can specify for is ten. If a client wishes to retrieve more that 10 field values, it has to issue more than one STATS requests.
request-fields = <fields> *10(db-field-tag) <\fields>
The requested field codes can take any of the valid values specified in the db-fieldcode protocol section. The protocol does not forbid requesting the same field more than one time in the same request, although this is not encouraged. Also, asking for sensitive private information such as the administrators passwords is not forbidden by a protocol rule. Protection of this information is a responsibility of the server in this protocol implementation, therefore, the server must take care not to offer such information to the clients. What is more, the server, as of this protocol version, does not offer any data aggregation functions. Therefore, this task must be performed by the client, if needed. For example, the sum of the IP download volume of all the users that have been served by the AD (if data are stored in a MySQL database like the one designed for this project) would be determined by an SQL statement of the following format.
SELECT SUM(ust_total_ip_dl) FROM user_stats;
However, as such aggregation operations are not specified by the protocol and, thus, are not implemented by the server, have to be performed by the client; the client must in the above example gather the total ip download volume for every single user and then add the values the server sent it to calculate the sum. As to the request-criteria section of a STATS request, is not necessary that criteria must be present, in the same manner that an SQL SELECT query does not always contain a WHERE clause.
85
The above STATS request asks for the total IP upload volume (db-field-code 5) and the total IP download volume (db-field-code 6) for users whose P2PWNC administrative domain name (field code 2) is equal to (operator-val 1) aueb and (criteria-operator is <coperator>1</coperator>, which implies logical conjunction boolean AND) who are online (db-field-code 19). After a client has sent a STATS request, it must wait for the servers reply. The server will normally respond with XML documents containing the requested information, or a response indicating that an error has happened. In case no error has taken place, the client must receive the server response and parse it so that it can acquire the information (statistics) the server has sent. The first
86
In case of a failure in logging in, the server must send a response like the following.
<?xml version=1.0?> <response type=LOGIN fragged=FALSE> <pairs> <pair> <field>70</field>
87
The field code 70 is an auth-field-code, described in the protocol specification. It is used if the type of the request LOGIN.
If the credentials that the request contains are valid, then the server makes the necessary updates in the database where it stores information about online client users. Afterwards, it issues a confirmation response, indicating that the logout process has been carried out properly. The format of this response is the following.
<?xml version=1.0?> <response type=LOGOUT fragged=FALSE> <pairs> <pair> <field>71</field> <value>ACCEPTED</value> </pair> </pairs> </response>
In case of a logout response, the field code that the pair contains is 71. This authfield-code, together with the LOGOUT type, indicates that this response refers to a clients logout request.
88
If the password changing procedure fails, probably due to an invalid user name or password, the server must send a rejecting reply as shown below.
<?xml version=1.0?> <response type=CHANGEPASS fragged=FALSE> <pairs> <pair> <field>72</field> <value>REJECTED</value> </pair> </pairs> </response>
The code of the field shown above is an auth-field-code which must be 72, so as to indicate that the response follows a clients CHANGEPASS request.
89
If the result of the query is consisted of multiple records, though, the information cannot fit inside a well-formed XML response. Therefore, it must be divided in more that one XML responses, each one having the value TRUE in the fragged property, except for the last one that signals the end of the response stream. For example, if a client issues a request wanting to know which P2PWNC users (db-fieldcode: 1) are currently logged in (db-field-code: 19), then the server will execute a database query that has many records (one record for any user that is online at the moment). Thus, the server must pack every record of the result in a separate response like the following ones: First result record:
<?xml version=1.0?> <response type=STATS fragged=TRUE> <pairs> <pair> <field>1</field> <value>pfrag@aueb.gr</value> </pair> </pairs> </response>
As one can see, the fragged property is first set to true, which means that the first two of the above responses are two of the XML documents that comprise a response stream. The last response of that stream has a FALSE value in its fragged property. In case the result of the query is an empty set, that is there are no records in the database that can satisfy the criteria of the request of the client, the protocol regards it
90
as an error. Therefore, it issues an error response and specifies that the error that took place was that the results of the query were the empty set. An error response is typically composed of a single pair, whose field code is 300 (error-field). The value of this pair is a representation of the error that took place (error-val). As of the current protocol specification, the only available error-val is 200, which indicates that the result of the clients response is an empty set. Such a response looks like the following:
<?xml version=1.0?> <response type=STATS fragged=FALSE> <pairs> <pair> <field>300</field> <value>200</value> </pair> </pairs> </response>
91
name password pair. This pair is transmitted from the client to the server unencrypted. Also, it is stored unencrypted. The protocol at the moment does not specify an encryption method. Obviously, a client and a server can implement such a method for data exchange (especially for sensitive data such as passwords) external to the protocol. Also, it does not specify a method of storing the data in an encrypted form (so as to hide them from database administrators that have nothing to do with the P2PWNC). The way authorization is carried out in this protocol will now be described. The server must have a means of storing authorization information. This information includes the administrators user names and passwords, their real names, a flag indicating whether each of them is online or not, a timestamp showing what was the last time they logged in and, finally, the IP address of the client program they used to login. Storing this information enables the server to know exactly which of these users are online at any time. Every time someone issues a stats request, the server has to check if the request comes from an administrator that is at the moment logged in. If not, the request is not served. This ensures that only requests by the machines where clients are known to run are to be served. When a user issues a login request, he sends his credentials. However, specifying a valid user name and password is not sufficient, so the clients IP address is determined and stored too. Logging out is an action that needs the users name and password in order to be performed correctly. This is necessary in order to prevent a third party (other than the server and the client that wishes to log out) from logging the user out. This would be possible if the client only had to send the username (and not the password) so as to logout. In such a case, the third party, for example, could be constantly sending logout requests carrying this users username, thus not allowing the user to login.
92
by means of extra criteria. Namely, adding two more criteria, one for the user name and one for the password would make it impossible for an unauthorized user to retrieve database information. However, this solution would add extra transfer and processing cost, as the exchanged documents would be made larger. Another method that could be implemented would be the following. For each new login, a new session begins. Hence, the server could generate a unique session identifier for each session. This identifier would be sent to the client. For each clients request, from that point on, the session key would be sent, too. This way, a server ensures that only users that would have already logged in could be server. The specification of this protocol does not state that only one client program is run by one user (administrator) at any moment. However, it is suggested that this policy is applied. Allowing multiple clients run with the same username at the same time might cause practical problems concerning the systems function, as well as security. For example, supposing that during a users session, he changes his password. There is no way that the other clients run by this user can be informed of this change automatically, using a protocol-specified means. Thus, when the time will have come for the user to logout, its client program will have to specify the password and probably will not know that a change has taken place. Of course, this problem is not non-repairable. For example, on logout, the user would have to specify explicitly his username and password, which is a rather unusual method. Finally, another problem that would arise concerns logging out and it was described in the previous section. As it was described, the protocol specifies a simple way of avoiding it.
93
SERVER:
<?xml version=1.0?> <response type=LOGIN fragged=FALSE> <pairs> <pair> <field>70</field> <value>ACCEPTED</value> </pair> </pairs> </response>
CLIENT:
<?xml version=1.0?> <request type=STATS> <fields> <field>40</field> <field>41</field> <field>42</field> <field>43</field> </fields> <crlist> <crnum>1</crnum> <criteria> <field>39</field> <operator>1</operator> <value>pcmkg3@aueb.gr</value> </criteria> </crlist> </request>
SERVER:
<?xml version=1.0?> <response type=STATS fragged=TRUE> <pairs> <pair> <field>40</field> <value>195.251.250.242</value> </pair> <pair> <field>41</field> <value>p3990123@dias.aueb.gr</value> </pair> <pair> <field>42</field> <value>3gkmcp</value> </pair> <pair> <field>43</field> <value>19</value> </pair>
94
</pairs> </response> <?xml version=1.0?> <response type=STATS fragged=FALSE> <pairs> <pair> <field>40</field> <value>195.251.248.176</value> </pair> <pair> <field>41</field> <value>pcmkg3@hermes.aueb.gr</value> </pair> <pair> <field>42</field> <value>testpass</value> </pair> <pair> <field>43</field> <value>8</value> </pair> </pairs> </response>
CLIENT:
<request type=CHANGEPASS> <fields> </fields> <crlist> <crnum>3</crnum> <criteria> <field>55</field> <operator>1</operator> <value>pfrag</value> </criteria> <coperator>1</coperator> <criteria> <field>56</field> <operator>1<operator> <value>somepass</value> </criteria> <coperator>1</coperator> <criteria> <field>56</field> <operator>1<operator> <value>garfp</value> </criteria> </crlist> </request>
SERVER:
<?xml version=1.0?> <response type=CHANGEPASS fragged=FALSE> <pairs> <pair> <field>72</field> <value>ACCEPTED</value> </pair> </pairs>
95
</response>
CLIENT:
<request type=LOGOUT> <fields> </fields> <crlist> <crnum>2</crnum> <criteria> <field>55</field> <operator>1</operator> <value>pfrag</value> </criteria> <coperator>1</coperator> <criteria> <field>56</field> <operator>1<operator> <value>garfp</value> </criteria> </crlist> </request>
SERVER:
<?xml version=1.0?> <response type=LOGOUT fragged=FALSE> <pairs> <pair> <field>71</field> <value>ACCEPTED</value> </pair> </pairs> </response>
96
interface remains unchangeable, the client implementation does not have to change. However, if, in the future, the statistics server gets more enhanced so that it can provide more information to the clients (e.g. more detailed statistics or the ability to perform more complicated database queries), the protocol should be adjusted to the new server capabilities and thus the clients should be updated. The flexibility of this approach is more obvious in the fact that it also provides programming language and operating system independence, to some extend. This means that the implementation of clients is not restricted to a specific programming language by the protocol specification. For example, the client can be written in the Java language and running on Windows, while the server is written in C and runs on RedHat Linux. The fact that they communicate via a sequence of messages of a standard format is what gives this approach interoperability. Furthermore, not only does the protocol provide programming language and operating system independence, but it also is not limited to a specific transport layer protocol. As said before, the transport layer network protocol can either be TCP or UDP. However, if there are both server implementations, some clients will not be functional with all servers. The usage of TCP is suggested for reasons that have already been mentioned. This approach, though, has an important disadvantage. Due to the fact that client and database communication is not performed directly, the system does not offer a very extensive functionality as to the queries that can be issued to the database. The problem is that the query must be coded in the standard XML format described and then the XML document must be parsed by the server and transformed into a valid query for the database system used. Complicated queries are hard to be coded in a message of the protocol-specified form, but it is even harder to be interpreted by the server to a valid query. However, the problem of the imposed limitations on query complexity and to the limited set of queries permitted can in many cases be overcome by adding extra functionality to the client. An example of this case was described in the section that refers to client STATS request issuing, where the operation of summing the values of database fields is performed. Another disadvantage is that the intervention of the statistics server between the client entity and the database puts additional processing cost. Also, wrapping queries and results in the XML messages that the protocol specifies puts much overhead on the exchanged data between the client and the server, which adds communication cost and results to network transfer delays.
97
Parsing of XML documents in this project consists of some basic operations. - Reading the document and creating a tree of the parsed document. The function that carries out this task is called makeXmlDoc(). Its prototype is:
xmlDocPtr makeXmlDoc(int sockfd, size_t L)
The above function returns a pointer to an xmlDoc structure that holds the tree of the parsed document.
98
The documents data are read from a socket. They are sent to the parser in chunks while they are being read. The first four bytes are used for the creation of a parser context for using the XML parser in push mode. This context is created as follows.
xmlParserCtxtPtr ctxt; ctxt = xmlCreatePushParserCtxt(NULL, NULL, chars, res, NULL);
chars are the bytes read from the socket, while res is the number of them. After creating the context, makeXmlDoc reads chunks of data from the socket, pushes them to the parser and parses them using xmlParseChunk() as follows.
xmlParseChunk(ctxt, chars, res, 0);
The last parameter is the last chunk indicator. If set, this is the last chunk to be parsed After parsing has been carried out, we obtain a pointer to the xmlDoc created this way:
D = ctxt->myDoc;
(D is the xmlDocPtr that has been declared in this function and will be returned) Finally, we have to free the parser context that we created and used.
xmlFreeParserCtxt(ctxt);
propVal is the value of the property that we wish to get. docrootname is the name of the documents root. In our case, it can either be request or response. propname is the property the value of which we want to be informed of. In order to get a nodes property we have to call xmlGetProp. Therefore, first we get a pointer to the document root as shown below.
cur = xmlDocGetRootElement(doc);
(cur is of type xmlNodePtr (pointer to a node of the parsed document tree) ). Then, the function xmlGetProp will return the value of the property.
propVal = xmlGetProp(cur, (xmlChar*)propname);
- Retrieving the content of an element This involves traversing the document tree until we find what we are looking for. The function that carries out this operation is parseDoc(). This is the prototype of parseDoc().
int parseDoc(xmlDocPtr doc, xmlChar** temp, const char* docrootname,
99
const char* tagname, const char* tagparent, const char* topLevelParent, int level)
temp is the array where the contents of the elements will be placed. docrootname is the name of the root of the document, as described before. tagparent is the parent tag of the element the content of which we wish to retrieve. tagname is the element whose content we wish to retrieve level is an integer indicating the depth of the element in the document tree relative to topLevelParent. topLevelParent is the tag where calculation of tagnames level starts from. (topLevelParent might be tagnames parent or another ancestor) The function works as follows. Starting from the root of the document, the tree is traversed until the elements topLevelParent is found. As soon as this node is found, the function parseFields is called, which actually fills temp array. The following is the prototype of parseFields().
xmlChar** parseFields(xmlDocPtr doc, xmlChar** retval, xmlNodePtr cur, const char* tagname, const char* tagparent, int* fnum, int level)
This function starts from the node cur (which is in fact topLevelParent of parseFields()) and descends the tree proportional to the level parameter. If level is 0, then no descending takes place. If level is 1, the requested element text can be found within the tags that are one level deeper than cur. Finally, if level is 2, the requested element will be found within the tags that are two levels deeper that cur. This makes search more complicated. An example of parseDoc comes next. This example is taken from the source code of the server program.
crvlen = parseDoc(doc, crv, "request", "value", "criteria", "crlist", 2);
In the above line, the values of the <value> tags that are inside <criteria> tags are requested. topLevelParent is crlist while tagParent is criteria. When crlist is found by parseDoc, parseFields will be called (from the body of parseDoc) with the following parameters.
parseFields(doc, crv, crlist, value, criteria, num, 2)
parseFields will descend into the subtree that starts from crlist and, as level is 2, it will descend one level more, into the subtrees that start from criteria. There, the possible values that could be retrieved would de those of the <field> elements, <value> elements and <operator> elements. In this case, the values of <value> tags are asked. It has to be noted that the function will descend all subtrees that start from <criteria>.
100
For better understanding of the above functions, it is suggested that the person interested visits libxmls website where extensive documentation and tutorials can be found.
In this version of statssrv, the program can serve up to 5 client requests concurrently. In fact, while it serves the first request, the other four are queued.
listen(sockfd, 5);
Finally, there is the loop where the server accepts and serves client requests. 101
for (;;) { clilen = sizeof(cli_addr); newsockfd = accept(sockfd, (struct sockaddr*) &cli_addr, (socklen_t*)&clilen); actions_serv(newsockfd); close(newsockfd); } [ code taken from statssrv.c ]
actions_serv() is the function which actually serves a client request. Apart from the above functions, the server sets the function that handles the SIGINT and SIGTERM signals. This function is sighand and it is responsible for cleaning up the parser (xmlCleanupParser()) and closing the handle to the database connection (mysql_close(conn)).
signal(SIGINT, sighand); signal(SIGTERM, sighand);
The actual request serving takes place in the actions_serv() function. There, as soon as a request arrives, the request size is read from the socket. In this servers implementation, the length of the request message (which is a string) is read prior to the reading of the actual XML document. It is written in the socket by the client as a four character string (this means that the maximum length of a client request that can be read with no problems is 9999 bytes). The purpose of this operation is to know how many bytes should be read from the socket for the client request. The proper XML document is built from the sequence of bytes read from the socket next by calling the function makeXmlDoc(), which is defined and implemented in the xmlfunc.h file. makeXmlDoc() returns a pointer to the constructed xmlDoc structure.
getRootProperty(doc, rtype, (char*) "request", (char*)"type");
The type of the request is stored in the rtype string. According to the type of the request the necessary actions, which will be described in more detail in a following section, are taken.
The first parameter is the pointer to the xmlDoc structure returned by the function makeXmlDoc(). The second is the descriptor of the socket where the request was read from and the third is a flag indicating whether the request type is LOGIN (value 1) or LOGOUT (value 0).
102
This function performs the necessary database operations for the cases when a user wishes to login / logout of the server. It uses sockfd in order to determine the clients IP address. This address is obtained by the following call.
getpeername(sockfd, (struct sockaddr*) &clientpeer, (socklen_t*)&peerlen);
getpeername() fills the struct clientpeer with the information about the other peer in a socket connection. sockfd is the descriptor of this socket. Then, it parses the XML document to extract the criteria of the request. As it has been specified, this request must have two criteria, the first of which is the user name of the administrator that wishes to login and the second one its password. Parsing is performed by the parseDoc function as follows.
crvlen = parseDoc(doc, crv, "request", "value", "criteria", "crlist", 2);
crvlen is the number of criteria this request has. crv is the array where the values of the criteria are stored. The other parameters have been described at a previous section. After parsing, an UPDATE SQL query is performed. In case of a LOGIN request, the adm_logged_in field of the admin database table is set to y. Also, the clients IP address (adm_ipaddr) field is updated and the timestamp that shows the exact time and date the user logged in is put. The execution of this query is shown below.
sprintf(sqlLogUser,"UPDATE admin SET adm_logged_in='y', adm_ipaddr='%s', adm_last_login=NOW() WHERE adm_username='%s' AND adm_pass='%s' AND adm_logged_in='n'", inet_ntoa(clientpeer.sin_addr), (char*)crv[0], (char*)crv[1]); if (mysql_real_query(conn, sqlLogUser, strlen(sqlLogUser)) != 0) { printf("%s\n", mysql_error); exit(1); } [ code taken from xmlfunc.h ]
As demonstrated above, the last condition of the WHERE clause is that the user is not already logged in. This shows that this server implementation does not allow a users multiple logins. In case the type of the request is LOGOUT, the respective SQL query is executed as follows.
sprintf(sqlLogUser, "UPDATE admin SET adm_logged_in='n' WHERE adm_ipaddr='%s' AND adm_username='%s' AND adm_pass='%s'" inet_ntoa(clientpeer.sin_addr), (char*)crv[0], (char*)crv[1]); if (mysql_real_query(conn, sqlLogUser, strlen(sqlLogUser)) != 0) { printf("%s\n", mysql_error); exit(1); } [ code taken from xmlfunc.h ]
103
As one can see, the only field updated is adm_logged_in. Also, it must be noted that in case of a logout, the password of the user must be supplied for security reasons (this was explained in the section that deals with the protocols security issues). In both cases, mysql_affected_rows() function is called to determine whether or not a database record that satisfies the conditions of the WHERE clauses exists. If not, the login / logout is unsuccessful, so the function returns 0. Otherwise, it returns 1. If logUser() returns successfully (value 1), then the server sends the client an ACCEPTED or REJECTED response, as specified by the protocol. The response is sent by either of the function sendLoginResponse() or sendLogoutResponse() which have the following prototypes:
void sendLoginResponse (int sockfd, int accepted) void sendLogoutResponse (int sockfd, int accepted)
The first parameter is the socket descriptor of the socket where the response will be written and the second one is a flag which determines if the request was accepted or rejected (value 1 indicates that the request was accepted and 0 indicates rejection). The above functions generate a response of the appropriate type using the function generateXmlResponse() having calculated the size of the response using calculateRespLen(). First, they write the calculated length in the socket as a 4 character string and then write the actual response. The format of the response sent is specified by the protocol and has been described at a previous section. The following piece of code shows what happens inside the function actions_serv() in case a login request has been received. Similar actions are taken in case of a logout request.
if (logUser(doc, sockfd, TYPE_LOGIN)) { /* Login accepted */ sendLoginResponse(sockfd, 1); } else { /* Login rejected */ sendLoginResponse(sockfd, 0); } [ code taken from statssrv.c ]
104
In a similar way as in logUser() function, changePass() first parses the clients request using parseDoc(). This time, the request contains three criteria. The first is the user name, the second one is the old password and the third is the new password value. Then, it calls getpeername() to get information about the other end of the socket connection (the clients IP address is actually what the program is interested in). The following step is to execute an update query so as to add the new password value in the database. The respective SQL query is constructed and executed by the next piece of code.
sprintf(sqlChangePass, "UPDATE admin SET adm_pass='%s' WHERE adm_username='%s' AND adm_pass='%s' AND adm_ipaddr='%s' AND adm_logged_in='y'", (char*)crv[2], (char*)crv[0], (char*)crv[1], inet_ntoa(clientpeer.sin_addr) ); if (mysql_real_query(conn, sqlChangePass, strlen(sqlChangePass))!=0){ printf("%s\n", mysql_error); exit(1); } [ code taken from xmlfunc.h ]
If the query does not affect any rows in the admin database table, this means that the criteria of its WHERE clause were not satisfied, therefore the password change was not carried out successfully (most probably a wrong old password or user name has been supplied). This check is performed by the mysql_affected_rows() function. On success the function returns 1, otherwise it returns 0. According to the result of changePass() the server sends the client a proper reply indicating whether the password update was accepted or not. This reply is sent by the sendChangePassResponse() function, whose prototype is
void sendChangePassResponse(int sockfd, int accepted)
The way this function works, as well as the semantics of its parameters are the same as in the sendLoginResponse() and sendLogoutResponse() functions. The response sent to the client is specified by the protocol. The code that follows shows what takes place in actions_serv() in case a CHANGEPASS request is received.
if (changePass(doc, sockfd)) { /* Passwd changed successfully */ sendChangePassResponse(sockfd, 1); } else { /* error changing passwd */ sendChangePassResponse(sockfd, 0); } [ code taken from statssrv.c ]
105
In this server implementation, the only check that is performed is whether there is a user that is online and has the IP address of the client that issued the request. Again, getpeername() has to be called. Afterwards, a SELECT SQL query is executed. This query retrieves the value of the adm_logged_in field for the specified IP address. That is, this query returns a non-empty result set only in case a record where this IP address is the value of the adm_ipaddr field exists in the database. If such a record exists and the value of the respective adm_logged_in field is set to y, the function returns 1. Otherwise, it returns 0. The piece of code that follows is the core of the checkLoggedIn() function.
[ . . . ] getpeername(sockfd, (struct sockaddr*) &clientpeer, (socklen_t*)&peerlen); sprintf(sqlCheckLogged, "SELECT adm_logged_in FROM admin WHERE adm_ipaddr='%s'", inet_ntoa(clientpeer.sin_addr)); if (mysql_real_query(conn,sqlCheckLogged,strlen(sqlCheckLogged))!=0){ printf("%s\n", mysql_error); exit(1); } if (!(r = mysql_store_result(conn))) { printf("%s\n", mysql_error); exit(1); } if (!mysql_num_rows(r)) { /* not found */ return 0; } if (row=mysql_fetch_row(r)) { if ( !strcmp( (const char*)row[0], "y") ) { /* logged in - success*/ return 1; } else { /* not logged in */ return 0; } } [ . . . ] [ code taken from xmlfunc.h ]
106
mysql_num_rows(), as its name implies, returns the number of rows the result of an SQL query has. mysql_fetch_row() is the function that retrieves the first row of a result and moves the current record pointer one position forward. The parameter it takes is a pointer to a MYSQL_RES structure. Its prototype is
MYSQL_ROW mysql_fetch_row(MYSQL_RES *res)
If the server ensures that the STATS request has come from a valid client program, it goes on to serve its request for statistics. The following set of function calls is issued in order to extract the requested field codes, the field codes of the criteria, the operators that each criterion applies, the values of the criteria and the boolean operators between the criteria.
crflen=parseDoc(doc,crf,"request","field","criteria","crlist",2); opslen=parseDoc(doc,ops,"request","coperator","crlist","crlist",1); flen=parseDoc(doc,temp,"request","field","fields","fields",1); crvlen=parseDoc(doc,crv,"request","value","criteria","crlist",2); crolen=parseDoc(doc,cro,"request","operator","criteria","crlist",2); [ code taken from statssrv.c ]
The variables crflen, opslen, flen, crvlen, crolen hold the number of criteria fields, operators between different criteria, fields, criteria values and criteria operators respectively. crf, ops,temp, crv and cro are the arrays that store the actual parsed values of the respective sections of a request. After parsing the document, the doc pointer has to be freed by xmlFreeDoc(). As the arrays that were mentioned previously are of xmlChar* [] type, the arrays that refer to field codes or operator codes have to be transformed into integer arrays (this is because the function that interprets the request to an SQL query takes int[] parameters when they correspond to field codes). The following code will clarify this transformation. We suppose that crfcodes[], opscodes[], fcodes[] and crocodes[] have been declared to be of int[] type.
for (i=0;i<crflen;i++) { crfcodes[i] = atoi( (char*)crf[i]); } for (i=0;i<opslen;i++) { opscodes[i] = atoi( (char*)ops[i]); } for (i=0;i<flen;i++) { fcodes[i] = atoi( (char*)temp[i]); } for (i=0;i<crolen;i++) { crocodes[i] = atoi( (char*)cro[i]); } [ code taken from statssrv.c ]
The next step is to generate the actual query that will be submitted to the MySQL server. This is performed by the prepareQuery() function which is shown below.
107
void prepareQuery(char** sql, int flist[], int crf[], char** crv, int cro[], int opers[], int n, int cn)
The first parameter of the function refers to the actual SQL query that the function generates. The second argument is the list of codes of the requested fields and the third is the list of field codes that refer to the criteria fields. crv is an array of strings where the values of the criteria are stored. cro is the array where the operator codes which are used in every criterion are put and opers refers to the boolean operators (AND, OR) that are applied among different criteria. The integer values n and cn represent the number of fields (length of the flist array) and the number of the requests criteria (length of the crf and cro arrays and greater by one than the length of opers). The main task prepareQuery() must carry out is make a mapping of the field and operator codes, that are specified by the protocol, on actual database field names and valid SQL operators such as <, >, AND, OR, etc. For example, if the flist[] array has two elements, 2 and 3, their values should be mapped to the values user_stats.ust_domain and user_stats.real_name. These two strings are actual field names of the database, following the name of the table they belong (user_stats). This is the first step before building a valid SQL query. The matching between field codes specified by the protocol and real database field names is performed by the createFieldMatching() function. Its prototype is
void createFieldMatching(char* dbfields[], char* dbtables[], const char* fname, const char* tabname, int curDbfSize, int* curTabSize)
The arguments of this function will now be described. dbfields: actual database field names dbtables: database tables where the above fields are found fname: database name to be appended to the dbfields array tabname: table name to be appended to the dbtables array curDbfSize: current record count of the dbfields array curTabSize: current record number of the dbtables array The following example of the use of this function would make it easier to understand how it works.
createFieldMatching(dbfields, dbtables, "user_stats.ust_online", "user_stats", i, &tabsnum);
The above function call would add the string ust_online to the dbfields array. This field belongs to the user_stats table, therefore the string user_stats (fourth argument) will be added to the dbtables array. If the string user_stats was not contained in dbtables prior to the function call, as the number of elements that this array contains will be incremented by one, the value of curTabSize should be incremented to. The way createFieldMatching() is used by prepareQuery() is the following:
108
For each of the field codes that are contained in flist[], the function first checks if it is one of the valid field codes specified by the protocol. These field codes are assigned to constants defined in xmlfunc.h as shown below.
[ . . . #define #define #define #define [ . . . #define #define #define [ . . . #define #define #define [ . . . ] FIELD_USER_STATS_USER_NAME 1 FIELD_USER_STATS_DOMAIN 2 FIELD_USER_STATS_REAL_NAME 3 FIELD_USER_STATS_PRIV 4 ] FIELD_FTP_USERNAME 25 FIELD_FTP_HOST 26 FIELD_FTP_USR 27 ] FIELD_HTTP_METHOD 36 FIELD_HTTP_URI 37 FIELD_HTTP_UAGENT 38 ] [ code taken from xmlfunc.h ]
If the field code is valid, the appropriate call to createFieldMatching is performed. This way, when all the field codes that are contained in flist[] will have been checked, dbfields will contain all the actual database field names and dbtables will contain all the database tables where dbfields belong. The same method is repeated so as to determine the actual names of the field codes and operators and that refer to the request-criteria section of the request. For each field code that is contained in crf[] the function checks if it is a valid field code (specified by the protocol). If this is true, then the appropriate call to createFieldMatching() is performed. This time, the first argument of createFieldMatching is an array of strings call dbcritfields, which represents the actual names of database fields that are contained in the requests criteria section. The second argument is, again, dbtables. The reason for this is that dbfields will represent the fields that their values will be requested, while dbcritfields correspond to the fields present in a WHERE clause. dbtables will form the FROM section of the query. When the dbcritfields array will have been filled, the function will fill the dbcritops and ops arrays with the string representation of the operator codes (cro[] parameter refers to the operator-vals of the protocol) and the boolean operator codes among criteria (opers[] parameter refers to the coperator-vals of the protocol). Operator matching is not performed by a function like createFieldMatching(). It is simpler, as for each operator value specified by the protocol, its string equivalent is copied into the respective array via strcpy(). The above operations are implemented via a big switch case structure, part of which is shown below.
for (i=0;i<n;i++) { if (!searchInList(FLIST, flist[i], 200)) { printf("INVALID FIELD: %d\n", flist[i]); return; } else { switch ( flist[i] ) { case FIELD_USER_STATS_USER_NAME: createFieldMatching(dbfields, dbtables, "user_stats.ust_username","user_stats",
109
i, &tabsnum); break; case FIELD_USER_STATS_DOMAIN: createFieldMatching(dbfields, dbtables, "user_stats.ust_domain","user_stats", i, &tabsnum); break; [ . . . ] /* one case for each protocol field code */ } /* end of switch */ } /* else */ } /* for */ for (i=0;i<cn;i++) { if (!searchInList(FLIST, crf[i], 200)) { printf("INVALID FIELD: %d\n", crf[i]); return; } else { switch ( crf[i] ) { case FIELD_USER_STATS_USER_NAME: createFieldMatching(dbcritfields, dbtables, "user_stats.ust_username", "user_stats", i, &tabsnum); break; [ . . . ] /* one case for each protocol field code */ } /* end of switch */ } /* else */ } /* for */
for (i=0;i<cn;i++) { if (!searchInList(OPLIST, cro[i], 20)) { printf("INVALID FIELD: %s\n", cro[i]); return; } else { switch ( cro[i] ) { case OPER_EQ: dbcritops[i] = (char*)malloc(strlen("=")); strcpy(dbcritops[i], "="); break; case OPER_BT: dbcritops[i] = (char*)malloc(strlen(">")); strcpy(dbcritops[i], ">"); break; [ . . . ] /* one case for each operation-val as specified by the protocol */ } /* end of switch */ } /* else */ } /* for */
for (i=0;i<cn-1;i++) { if (!searchInList(BOPLIST, opers[i], 20)) { printf("INVALID FIELD: %d\n", opers[i]); return; } else {
110
switch ( opers[i] ) { case BOOL_OPER_AND: ops[i] = (char*)malloc(strlen("AND")); strcpy(ops[i], "AND"); break; [ . . . ] /* one case for each coperator-val as specified in the protocol */ } /* switch */ } /*else */ } /* for */ [ code taken from xmlfunc.h ]
One thing that has to be explained is the use of the following two functions: - void createLists() - int searchInList(int L[], int num, int len) In this implementation, the valid field codes are stored in the (integer) array FLIST. In a similar way, the valid operators (operator-vals) are put in the OPLIST array, while the valid boolean criteria operators (coperator-vals) are stored in BOPLIST. Below are their definitions
static int FLIST[60]; static int OPLIST[7]; static unsigned int BOPLIST[3]; [ code taken from xmlfunc.h ]
createLists() is called in the servers main function to initialize the above arrays with the protocol-specified values. searchInList() simply searches in the integer array L[] for the presence of num. len is the number of elements that the array holds. Finally, the actual query string is being formed by the function makeSqlQuery(). It uses the arrays that were formed in the previous step. First, it appends the elements of dbfields as a comma separated list to the string SELECT . Then, it forms the FROM section of the query, adding the word FROM and the elements of dbtables, also as a comma separated list. Finally, it builds the where clause by adding the string WHERE and the list of conditions. This list is composed of criteria which have the form field operator value. The format of the list is criterion boolean operator criterion - [ . . . ]Each criterion is formed by elements of dbcritfields, dbcritops and dbcritvals, while the list of conditions is of the format criterion ops[0] criterion ops[1] - [ . . . ] After the query string has been generated, the function sendResponseStream is called. The operations that are carried out by sendResponseStream are to execute the query, get the results, create response XML documents and write them all in the socket so that they can be sent to the client. The query is executed using the MySQL API function mysql_real_query(). The results of the query are retrieved row after row using mysql_fetch_row() as described in a previous section. If the number of rows (determined calling mysql_num_rows())
111
is zero, then the result set is empty, so a response of type STATS indicating the error must be sent to the client. generateXmlResponse() is called to construct the following reply:
<?xml version=1.0?> <response type=STATS fragged=FALSE> <pairs> <pair> <field>300</field> <value>200</value> </pair> </pairs> </response>
Field 300 indicates that an error has taken place, while value 200 shows that the error has to do with an empty result set for the issued query. The length of the response is written in the socket, and then the actual response is sent, too. In other cases, that is when no error has taken place, a stream of responses is constructed and sent to the client. For each row of the result of the query, a new response XML document has to be created, with the fragged property of the root set to TRUE (except for the last document of the stream).
112
fields, critfields, critops and ops are integer arrays where field and operator codes are stored. The definitions of the constants that hold the values of the protocol-specified field and operator codes is defs.h. After constructing the request, the client writes its size (in bytes) to the socket. The size has already been calculated by the function calculateReqLen(). The size is written as a four character string. Right afterwards, the actual request is written on the socket. In turn, the client reads the response length and calls makeXmlDoc() function, which reads the XML document from the socket and parses it, creating the parsed documents tree structure. Then, using the getRootProp() and parseDoc() function, it extracts the information about the login status of the user. In case the server has sent an ACCEPTED response, then the client stores the administrators user name, password and a flag that indicates that the administrator is online in class variables (this program was written in C++. These class variables are loginName, loginPass and logged and they belong to the frmMain class). The next thing the client does is to retrieve the list of P2PWNC users (either they are online or not) that are using or have used the internet services of the Administrative Domain. This data retrieval is done by issuing a STATS request asking for the appropriate fields. Response reading and parsing is carried out in the same way. From that point on, the administrator can be requesting user statistics.
113
the client checks if the administrator has retyped the password incorrectly or if he has left empty fields). Afterwards, the CHANGEPASS request is sent to the server in the way all requests are sent (first writing the request size and then the actual request string). If the server sents back a REJECTED response, then the client has to report the error to the administrator via a proper message box. Otherwise, the change has been accepted and the loginPass variable is updated.
Finally, it should be noted that for each new client request, either it is a LOGIN, LOGOUT, CHANGEPASS or STATS one, the client opens a new socket and connects to the server. The following method is used to carry out this operation (server_addr and sockfd are static variables).
114
void frmMain::connectToSrv (char* sa, char* sp){ bzero( (char*)&server_addr, sizeof(server_addr) ); server_addr.sin_family = AF_INET; server_addr.sin_port = htons(atoi(sp)); server_addr.sin_addr.s_addr = inet_addr(sa); /*opening a TCP socket*/ if ( (sockfd = socket(AF_INET, SOCK_STREAM, 0)) <0 ) { QMessageBox::critical(this, "Stats Client", "Can't open stream socket"); return; }
if (netfunc::connect(sockfd, (struct sockaddr*) &server_addr, sizeof(server_addr)) < 0) { QMessageBox::critical(this, "Stats Client", "Can't connect to server"); return; } } [ code taken from frmmain.ui.h ]
In the above function, netfunc is a namespace declared in the defs.h file. This action was necessary, because Qt Designer has a connect function call used for connecting signals (e.g. button pressing, etc) with specific functions (called slots in Qt Designer). This caused conflicts with the connect call of the <sys/socket.h> library.
3.6 Demonstration
A short demostration of the statscl programs capabilities will now take place. A series of screenshots will be presented so as to demonstrate some of the clients functions. First, the login screen will be shown.
115
If the administrator has logged in successfully, the list of the the ADs users is retrieved. The users are grouped by their Administrative Domain name. Doubleclicking on a user name will make his statistics available. In the next figure are shown some general information about the selected user, which are under the General tab. Changing between the tabs automatically updates the statistics of each category. When the name of a user is doubleclicked, the program checks which of the tabs is currently selected and sends a proper request for statistics. When the selected tab changes, another request is sent so that the statistics of the last user that was doubleclicked for the new selected tab are retrieved.
116
Figure 20. General Statistics Next, the Aggregate Statistics tab is shown. There, the total IP traffic statistics are presented for the selected user.
The next figures have to do with HTTP, FTP, SMTP and POP3 traffic information for the selected user. For the HTTP protocol, a list of the hosts a user has sent HTTP requests to and the requested URIs for each host. Doubleclicking on a URI will cause a window which shows the HTTP method used and the user agent (together with some other information) for the specified HTTP request to popup.
As to the FTP protocol, the server list is available. For each server, the account names are reported. Doubleclicking on an account will cause a window containing information about this account to popup. These information include the name of the P2PWNC user that connected to this account, the name of the FTP host, the FTP account name (user name), the FTP password and the times the P2PWNC user has connected to this account.
118
119
The POP3 protocol statistics are shown below. Doubleclicking on a POP3 account name, the e-mail message list, together with account information is shown in a window that pops up.
Other options statscl offers are password changing and viewing administrator information (The program user must select Options Administrator Change Password or Options Administrator View Administrator Info). The options for connecting to the server (servers IP address and TCP port) can be manipulated by selecting Settings from the Optionsmenu. Also, short information about the program is available by selecting the About option in the Help menu.
120
121
122
123
lightweight clients is more preferable. In most cases of complicated queries, it would be very time-consuming for the client to simulate the query having only the raw data. Therefore, it may be more profitable as far as communication and processing cost is concerned. This issue must be further looked into.
125
Conclusions
This work on the design and the implementation of this traffic monitoring and logging system has led to a number of conclusions. - So far, the development, as well as the testing of the Administrative Domain modules (as described in the first chapter) that have been designed has shown that, from a technical aspect, the P2PWNC system is a feasible solution. The basic operations that an ADs modules have to perform, such as traffic shaping, authentication, traffic logging and analysis, Internet connectivity have been implemented and tested in a local scale. As to the traffic logging subsystem, it works fairly well in an IEEE 802.11b Wireless LAN, which is the testbed for the Administrative Domain experiments that have been conducted so far. Due to its capability to analyse the network traffic of some widely used application layer protocols, such as HTTP, FTP and e-mail (SMTP and POP3), it is appropriate for accounting the usage of these applications. However, some protocol and application traffic is, by its nature, difficult or impossible to be tracked down (e.g. SSH). Also, webmail is a case of a very popular application that the traffic logging subsystem presently does not deal with. - In terms of performance, the traffic logging subsystem is quite efficient in IEEE 802.11b Wireless LANs. As mentioned in a relative section of the document, 802.11b has a theoretical maximum speed of 11Mbps, but the actual user data throughput is much less (less that 5Mbps). In networks of that speed, this module does not encounter problems. However, tests that have been carried out in an 100Mbps Ethernet LAN have proved that the efficiency of the packet capturing and analysis daemon is seriously affected in high speed networks. This is mainly the case when sudden bursts of high speed traffic take place. For example, an action of transferring a document of several megabytes from one workstation to another would not be accounted for correctly. Packet loss in libpcap is considerable in such cases. Hence, if there is a need of reporting the total IP traffic of P2PWNC users on an AD with great accuracy, there would be problems in such a high speed network. Obviously, this matter may affect designers of such systems using the newer 802.11g technology, which is becoming popular. In spite of the fact that total traffic volume may be reported erroneously, though, even in such networks the system works well for information that is exchanged on setting up connections or for data that are sent / received and do not cause high speed traffic bursts. For example, the information that is sent by the user when connecting to an FTP server and the replies of the server are tracked down normally. Also, HTTP requests and responses are handled without mistakes and packet loss. - As to storage, MySQL has proved a very efficient database management system. Also, the documentation provided in the official website of MySQL (www.mysql.com) was fairly extensive. During the development of the system, phpMyAdmin was used for managing the database. This web interface to MySQL is very easy to setup and use and provides much functionality. Choosing to store data in an SQL database proved very convenient. Inserting, retrieving and updating data is much easier using SQL statements. Also, it would be
126
very time consuming to implement a data storage and retrieval system which would be as efficient. - XML was found to be a good means of standardising document and message formats. This was made clear during the design and specification of the XML-based statistics exchange protocol. What is more, there were numerous XML parsers available, like libxml. One of the problems that one can come upon when using XML is that of encoding and internationalization. Libxmls default encoding is UTF-8. As the exchange of traffic statistics is carried out through XML documents that encapsulate the actual database query results, if the database uses a different character encoding than that of XML, problems may be encountered. These problems can be overcome by properly converting database fields from the databases default encoding to UTF-8 before placing them inside XML documents. Also, care should be taken if the database contains values that are of different language than English. - Finally, a reference to application layer protocol security should be made. Many widely used protocols have, by design, no security whatsoever. For example, it was not very hard to capture FTP user names and passwords or e-mail headers and messages. Transferring sensitive data in plain text makes them easy to be extracted from the network packets that carry them.
127
APPENDIX
Installation of the programs
The packet capturing daemon as well as the statistics client and server are available as gzipped archives, binary RPMs and source RPMs. Installing from the RPMs is fairly easy. Running these files will install the binaries in the /usr/bin folder and the configuration or other files will be placed in /etc. Running the RPMs will install the following binaries in the /usr/bin folder - packet_cap - shmhandleuser - statssrv - statscl Other files are put in /etc - packet_cap.conf - statssrv.conf - options.dat - images/ options.dat is a file that contains statscl options. In particular, it contains the IP address and the TCP port where statssrv listens. images is a directory where some pictures that statscl uses are put. If the use of RPMs is not preferred, the programs can be compiled directly from source. In order to do that, the following actions must be performed: - Packet capturing daemon First, one has to extract the data from the gzipped archive. Then, moving into the directory where the source is, one has to execute the following commands. The first compiles the packet capturing program (packet_cap.c) and the second one compiles shmhandleuser, the utility application that adds and removes users from the shared memory.
gcc -o packet_cap packet_cap.c -g -lpcap -I/usr/include/mysql -L/usr/lib/mysql -lz -lcrypt -lnsl -lm -lc -lnss_files -lnss_dns -lresolv -lc -lnss_files -lnss_dns lresolv -lpthread lmysqlclient gcc o shmhandleuser shmhandleuser.c
- Statistics server (statssrv) After unpacking the source code, the following command must be executed to compile the statistics server program.
gcc -o server server.c -g -L/usr/lib -lxml2 -lz -lm -lpthread -I/usr/include/libxml2 -I/usr/include/mysql -L/usr/lib/mysql -lz -lcrypt -lnsl -lm -lc -lnss_files -lnss_dns -lresolv -lc -lnss_files -lnss_dns lresolv -lmysqlclient
- Statistics client (statscl) After unpacking the source, one has to move into the source directory and execute the following commands: 128
In any of the above cases, where compilation was made directly from source, the packet_cap.conf, statssrv_conf, options.dat and images should be placed in the same directory as the respective binary files or in the /etc directory.
Configuration
After installing the RPMs (or compiling source code), it is suggested that a check on all configuration files is made. The purpose of this check is to ensure that all settings for the function of programs are correct. As configurations files are first parsed on programs start up, any change made inside them is applied when the programs are restarted. packet_cap.conf
################################################################### # This is the packet capturing daemon configuration file # # Comment lines start with '#' # # conf file MUST end with a newline!! # ################################################################### # DbName: name of the database DbName statsdb # DbUserName: db user name DbUserName pfrag # DbPass: password for DbName MySQL database DbPass NULL # DbHost DbHost NULL
129
# DbPortNum DbPortNum 0 # DbSocketName DbSocketName NULL # DbFlags DbFlags 0 # network device pcap listens to PcapDev eth0 # BPF filter expression PcapFilter "" # Shared memory segment key ShmKey 333 # Shared memory segment permissions ShmPerms 0666
statssrv.conf
################################################# # This is the stats server configuration file # # Comment lines start with '#' # # The conf file MUST end with a newline! # ################################################# # tcp port the server listens to ServerPort 9999 # DbName: name of the database DbName statsdb # DbUserName: db user name DbUserName pfrag # DbPass: password for DbName MySQL database DbPass NULL # DbHost DbHost NULL # DbPortNum DbPortNum 0 # DbSocketName DbSocketName NULL # DbFlags DbFlags 0
options.dat
127.0.0.1 9999
In order to run these programs, the following commands are issued. - Packet capturing system
packet_cap &
In order to add a user to the shared memory segment, given that this user is called pfrag@aueb.gr and his IP address is 195.251.248.176, the following command must be executed.
shmhandleuser pfrag@aueb.gr 195.251.249.176 1
- Statistics client This program is run either by double clicking its icon or as follows
statscl
131
REFERENCES
[1] P. Antoniadis, C. Courcoubetis, E.C. Efstathiou, G.C. Polyzos, and B. Strulo, The Case for Peer-to-Peer Wireless LAN Consortia, Proc. IST Mobile & Wireless Communications Summit, Aveiro, Portugal, 2003. [2] P. Antoniadis, C. Courcoubetis, E.C. Efstathiou, G.C. Polyzos, and B.Strulo, Peer-to-Peer Wireless LAN Consortia: Economic Modelling and Architecture, Proc. Third IEEE International Conference on Peer-to-Peer Computing (P2P 2003), Linkoping, Sweden, 2003. [3] E.C. Efstathiou and G.C. Polyzos, A Peer-to-Peer Approach to Wireless LAN Roaming, Proc. First ACM International Workshop on Wireless Mobile Applications and Services on WLAN Hotspots (WMASH 2003), San Diego, CA, 2003. [4] E.C. Efstathiou and G.C. Polyzos, Designing a Peer-to-Peer Wireless Network Confederation, Proc. Third International Workshop on Wireless Local Networks (WLN 2003), Bonn, Germany, 2003. [5] S. Ioannidis, K. G. Anagnostakis, J. Ioannidis, A. D. Keromytis, xPF: Packet Filtering for Low-Cost Network Monitoring [6] Luca Deri, Yuri Francalacci, Passively Monitoring Networks at Gigabit Speeds. [7] Luca Deri, Stefano Suin, Effective Traffic Measurement Using ntop, IEEE Communications Magazine, May 2000. [8] OpenSystems.com, Inc, Syslog Messaging for Network Management, Technical Report TR-9002, Rev. A 9/99. [9] S. McCanne and V. Jacobson. The BSD Packet Filter: A New Architecture for User-level Packet Capture. In Proceedings of the Winter 1993 USENIX Conference, pages 259270, January 1993. [10] Computer Networks, Andrew S. Tanenbaum, 3rd Edition, Prentice-Hall, Inc [11] Notes for the Wireless Networks and Mobile Communications class, G. C. Polyzos, Athens University of Economics and Business, Winter Semester, 2002 2003. [12] Higher Levels in Computer Networks, T. C. Apostolopoulos, Athens University of Economics and Business Publications, Athens 2001 [13] Operating Systems, J. C. Kavouras, 4th Edition, Kleidarithmos, Athens 2000. [14] Data Structures in C++, T. Z. Kalampoukis, Athens University of Economics and Business Publications, Athens 1999. [15] Database Systems, E. J. Yannakoudakis, Mpenou Publications, 1999
132
[16] RFC 791, Internet Protocol, DARPA Internet Program Protocol Specification. [17] RFC 793, Transmission Control Protocol, DARPA Internet Program Protocol Specification. [18] RFC 821, Simple Mail Transfer Protocol [19] RFC 959, File Transfer Protocol [20] RFC 1939, Post Office Protocol - Version 3 [21] RFC 2616, Hypertext Transfer Protocol HTTP/1.1 [22] RFC 2234, Augmented BNF for Syntax Specifications: ABNF
LINKS
[1] www.mysql.com [2] www.tcpdump.org [3] www.xmlsoft.org [4] www.w3c.org/xml [5] www.trolltech.de [6] www.faqs.org/rfcs/ [7] www.phpMyAdmin.net [8] www.ntop.org [9] www.citeseer.com [10] www.netfilter.org [11] www.redhat.com
133