of Computer Viruses in the Developing World - by Michael Paik
Project Report By : Mansi Gupta (3013021) and Malavikka Sharma (3013020) B.Sc. (H) Computer Science (VI th Semester) Hansraj College University of Delhi 2012 INTRODUCTION What is a computer virus ? A virus is a small piece of software that piggybacks on real programs in order to get executed. Once its running, it spreads by inserting copies of itself into other executable code or documents.
The Problem
Among all the problems which a computer user in the developing world faces today, the most pernicious one is prevalence of computer viruses, which have immediate and unexpected costs.
However, it is difficult to pin down the reliable figures about the rates and types of infections, as well as scale of damage done because these rates only reflect reports from legally purchased copies of antivirus software run on internet-connected machine, and not the preponderance of software in the developing world, which is illegally obtained, out of its license period, or operated offline and therefore not updated. The Global Infection Rate map by McAfee Labs.
Virus Infections per million citizens from all viruses: Virus Infections per million citizens from top 10 viruses: While data aggregated at this level is inconclusive, the difference between North America and the developing regions in this regard is remarkable in that it strongly suggests that the specific virus types present in the developing world, while high in absolute infection rate display a different ecology than that in the developed world.
Anecdotal accounts by experts on the ground put the figure of infection rates in the developing world at up to 80% indicating a well and truly endemic problem, a figure corroborated by recent surveys by Bhattacharya et al. conducted in Bangalore, India.
The prevalence and impact of viruses is summarized in the Figure :
As evident in the figure, 80% of centers experience moderate to high prevalence of computer viruses, where moderate indicates regular infections that cause considerable problems and high corresponds to continuous, highly detrimental infections. It also summarizes the average expense on antivirus software, grouped according to the severity of the virus problem in a given location. While the expense are highly variable it is evident that investment in antivirus software is not sufficient to spare a shop owner from the problems. In addition malware authors distribute their s/w in infected version of popular pirated s/w. Internet security firm Intego in 2009 discovered a new Trojan horse in pirated copies of Apples iWork 09 productivity s/w that could allow hacker to take control of infected computer.
Research attributes viruses as originating from USB sticks in addition to Internet websites. It also cites SD cards as frequent vector of virus infections.
The author of the research paper thus presented and described, INNOCULOUS : a system consisting of a specially crafted USB key, software and an incentivization strategy aimed towards disinfecting machines, creating revenue streams for small business and individuals in developing world and obtaining rich information about computer virus infections, in proceedings of the 5th ACM workshop on Networked Systems for Developing Regions (NSDR) 2011, Washington DC , June 2011 DESIGN Inspiration Innoculous was inspired by Disk Knight, a security software developed by a Bangladeshi student to protect computers against malicious programs that use USB memory sticks to spread. Its idea was simple : if a USB key is protected by Disk Knight the program will prevent the launch of any other process on the computer and display a message prompting the user to block or allow the starting process.
However there was a problem in its implementation. Disk Knight once installed starts copying itself onto every unprotected USB key, making it protected. Furthermore, when this new protected USB key is inserted into another system, Disk Knight would run and install itself onto that system without users consent.
This makes it a computer virus in itself.
Disk Knight has been classified as PUA (potentially unwanted application). Environment Innoculous was designed specifically to address infections on Windows Platform, particularly XP variant because a vast majority of virus infections in the wild are on this platform due to it popularity. (The Windows family covers for over 80% of total market.)
2012 Win7 Vista Win2003 WinXP Linux Mac Mobile February 48.7% 4.5% 0.7% 30.0% 5.0% 9.1% 1.3% January 47.1% 4.7% 0.7% 31.4% 4.9% 9.0% 1.3% 2011 Win7 Vista Win2003 WinXP Linux Mac Mobile December 46.1% 5.0% 0.7% 32.6% 4.9% 8.5% 1.2% November 45.5% 5.2% 0.7% 32.8% 5.1% 8.8% 1.0% October 44.7% 5.5% 0.7% 33.4% 5.0% 8.9% 1.0% September 42.2% 5.6% 0.8% 36.2% 5.1% 8.6% 0.9% August 40.4% 5.9% 0.8% 38.0% 5.2% 8.2% 0.9% July 39.1% 6.3% 0.9% 39.1% 5.3% 7.8% 1.0% June 37.8% 6.7% 0.9% 39.7% 5.2% 8.1% 0.9% May 36.5% 7.1% 0.9% 40.7% 5.1% 8.3% 0.8% April 35.9% 7.6% 0.9% 40.9% 5.1% 8.3% 0.8% March 34.1% 7.9% 0.9% 42.9% 5.1% 8.0% 0.7% February 32.2% 8.3% 1.0% 44.2% 5.1% 8.1% 0.7% January 31.1% 8.6% 1.0% 45.3% 5.0% 7.8% 0.7% 2010 Win7 Vista Win2003 WinXP W2000 Linux Mac December 29.1% 8.9% 1.1% 47.2% 0.2% 5.0% 7.3% November 28.5% 9.5% 1.1% 47.0% 0.2% 5.0% 7.7% October 26.8% 9.9% 1.1% 48.9% 0.3% 4.7% 7.6% September 24.3% 10.0% 1.1% 51.7% 0.3% 4.6% 7.2% August 22.3% 10.5% 1.3% 53.1% 0.4% 4.9% 6.7% July 20.6% 10.9% 1.3% 54.6% 0.4% 4.8% 6.5% June 19.8% 11.7% 1.3% 54.6% 0.4% 4.8% 6.8% May 18.9% 12.4% 1.3% 55.3% 0.4% 4.5% 6.7% April 16.7% 13.2% 1.3% 56.1% 0.5% 4.5% 7.1% March 14.7% 13.7% 1.4% 57.8% 0.5% 4.5% 6.9% February 13.0% 14.4% 1.4% 58.4% 0.6% 4.6% 7.1% January 11.3% 15.4% 1.4% 59.4% 0.6% 4.6% 6.8% 2009 Win7 Vista Win2003 WinXP W2000 Linux Mac December 9.0% 16.0% 1.4% 61.6% 0.6% 4.5% 6.5% November 6.7% 17.5% 1.4% 62.2% 0.7% 4.3% 6.7% October 4.4% 18.6% 1.5% 63.3% 0.7% 4.2% 6.8% September 3.2% 18.3% 1.5% 65.2% 0.8% 4.1% 6.5% August 2.5% 18.1% 1.6% 66.2% 0.9% 4.2% 6.1% July 1.9% 17.7% 1.7% 67.1% 1.0% 4.3% 6.0% June 1.6% 18.3% 1.7% 66.9% 1.0% 4.2% 5.9% May 1.1% 18.4% 1.7% 67.2% 1.1% 4.1% 6.1% April 0.7% 17.9% 1.7% 68.0% 1.2% 4.0% 6.1% March 0.5% 17.3% 1.7% 68.9% 1.3% 4.0% 5.9% February 0.4% 17.2% 1.6% 69.0% 1.4% 4.0% 6.0% January 0.2% 16.5% 1.6% 69.8% 1.6% 3.9% 5.8% Data Logging Computer data logging is the process of recording events, with an automated computer program, in a certain scope in order to provide an audit trail that can be used to understand the activity of the system and to diagnose problems.
As one stated goal of Innoculous project was to acquire rich data about virus infections, a writable medium was necessary
After considering several alternatives, a single self contained USB key was selected with additional effort to ameliorate the infection problem. Infection Cleaning Viruses target various type of transmission media or hosts.
Binary Executable files. Volume Boot Records of floppy disks and hard disk partitions. General purpose Script files. Application specific script files. System specific autorun script files. Documents that contain macros. Arbitrary computer files. One of the primary goal of Innoculous was cleaning of virus infections which necessitates an anti virus solution.
This lead to two important design considerations :- C Innoculous needed a self-contained and preferably scriptable, command line interface. C Measures must be taken in order to prevent disabling of the anti virus engine or corruption of the logs by viruses that might exist on the machine being scanned.
antivirus fulfilled these requirements and was thus selected. Moreover it was explicitly free for use for not-for-profit and research purposes Infection Prevention Windows variants from Windows 2000 through to Windows7, recognize only the first partition that exists on any USB memory key, and do not themselves have any capability to create multiple partitions on such devices. In observance of this fact, Innoculous was installed on a second partition on a USB stick, after a dummy 1 megabyte NTFS partition (the minimum size), which is presented to Windows. In order to partially mitigate USB threats, this 1 megabyte partition has its entire capacity occupied by a dummy file with a known hash, making the partition tamper evident and proving too small for many infections with large or advanced payloads.
In addition, the small size of this partition will discourage users from storing their own personal data on these devices.
IMPLEMENTATI ON Custom Scripting The script of Innoculous is written in VB Script.
It has the following functionality:
1. Displays the keys hardware ID/serial number. 2. Presents the user with an option to replicate a child key. 3. Asks the user for the PIN, ZIP or other postal code of their current location, if available. 4. Presents the user with an option to start a scan. If a scan is started: + Records serial numbers of all hard drives in the system. + Begins scan using Panda Antivirus, storing verbose logs.
+ Deactivates Autorun using command-line registry editor. + Records salient information about machine including Windows serial number, installed patches, etc.
5. If network connectivity is available: + Checks for updated virus definitions from a preconfigured IP address + Compresses and uploads any existing scan logs + Records system time skew against NTP server. WinPE Innoculous is implemented using Windows PE 3.1 32 bit, which provides a preinstallation environment based on Windows 7 SP1.
Windows Preinstallation Environment (aka Windows PE or WinPE) is a lightweight version of Windows XP, Windows Server 2003, Windows Vista, Windows 7 or Windows Server 2008 R2 that is used for the deployment of workstations and servers. It is intended as a 32-bit or 64-bit replacement for MS-DOS during the installation phase of Windows, and can be booted via PXE, CD-ROM, USB flash drive or hard disk.
USB Key Preparation A USB key of at least 2GB in size is necessary for Innoculous to run. It was prepared on a Linux machine using the following steps:
C Using parted , an NTFS partition is created from 1023kB to 2MB. This creates a 1 megabyte (1024kB) partition, which is the minimum size supported by any modern filesystem supported by Windows.
C Using mkntfs , the NTFS partition is formatted to NTFS
C parted is then used to create and format a FAT32 partition comprising the remainder of the device. C A Windows PE image is imaged onto the FAT32 partition using dd or partimage.
C install-mbr or other Master Boot Record program is used to install the MBR onto the USB key and point it to the second partition, e.g. install-mbr -p2 -e2 -v /dev/sdb.
Using the output from fdisk -ul, the start boundary is encoded into hexadecimal using, e.g. printf, and inserted in little-endian format at position 0x1C of the second partition. This can be done using any hex editor, such as hexedit on the device, e.g. hexedit /dev/sdb2. Deep Forensics As the Innoculous installation, when run, has access to all files resident on the host machines drives, it is possible to copy various files from the computer for forensic analysis regarding behavior. Access to these data, properly redacted, could prove to be a significant source of insight into infection vectors and browsing habits in the developing world.
This functionality, however, is not currently implemented given the murky ethics surrounding the issue of privacy. DISTRIBUTION Replication The script that serves as the core of Innoculous also contains the ability to replicate the entire system to another USB key. It does this using the Windows AIK (Automated Installation Kit), builder binaries as well as Windows versions of partitioning tools to create a direct copy of itself.
In the process of replication, The parent key records the serial number of the USB device it is replicating itself to. In addition, the replicated key is initialized with the hardware value of its parent, creating a bidirectional link that, as the keys are replicated, creates a graph of keys. Incentivization
The graph of keys is critical to the incentivization model, essentially a bounty on new virus types encountered and number of machines scanned.
In order to encourage users of the system to replicate their keys and give them to others , a system analogous to the MIT Red Balloon Challenge Team which was used during the DARPA Network Challenge was adopted. + The challenge was to be the first to submit the locations of 10 moored, 8-foot, red, weather balloons at 10 fixed locations in the continental United States.
In this model, bounties would be paid out starting with the finder and then geometrically smaller proportions to the finders parent, grandparent, etc. Explicitly,1/2 would be paid out to the finder,1/4 to the parent,1/8 to the grandparent, etc: Rs.4, Rs.2, and Rs.1.
While these amounts are small, given the large numbers of infected machines and potentially multiple infections per machine, this could represent a notable revenue stream in the developing world. Controls In order to maintain reins on the system, several controls may be optionally implemented on the keys:
A usage-based suicide gene that would wipe the key once n scans had been completed and uploaded at some internet-connected machine.
A time-based suicide gene that would wipe the key at a given date, verified against a known NTP server on some internet-connected machine.
Generational limits for how many generations from the first tier of keys distributed may be replicated.
Invoked self-destruct that, when triggered by the server, will cause the key to delete itself upon its next check for virus signature updates.
Invoked disabling of self-replication, forcing any given key to be a leaf node in the graph.
ANALYSIS The data Innoculous provides can be used for various analysis:
+ GEOGRAPHIC SPREAD ANALYSIS This could illustrate the spread levels and densities of particular strains of viruses over a region.
Differences in geographic distribution of viruses before 2003 Before viruses turned into money making machines, they were mostly done in developed western nations, like Europe, USA, Canada, Japan, Australia. Today the biggest hotspots are Russia, Ukraine, Kazakhstan, Romania, Moldova, China obviously, and South America, especially Brazil, which is the biggest source of banking trojans which steal money during online banking. Differences in geographic distribution of viruses after 2003
By 2009,there were even more advanced viruses and now the amount of infected machines around the world is in the millions + STRAIN ANALYSIS Based on the birthday of each strain of virus, worm, or other malware, it is possible to determine certain data regarding the age, spread rate and infection vector of observed viruses.
+ REINFECTION As some machines will likely be scanned more than once given a sufficiently large network of Innoculous keys, data will emerge regarding subsequent re-infection of machines that have been cleaned before .
+ PIRACY ANALYSIS Determining what proportion of Windows installations are genuine and which may have come infected with viruses. Nearly ten times as many Windows XP SP3 systems get infected as Windows 7 SP1 64-bit systems. Even Windows Vista with its latest service pack installed reports only half of the infection rate than what Windows XP reports. CONCLUSION The use of the Innoculous system, if widespread, will provide the research community with a detailed corpus of data regarding virus infection rates and types at low cost while simultaneously providing revenue streams for small business owners and individuals in the developing world and raising awareness of the problems presented by virus infection.
As a bonus, it also will provide a social network graph of people in the region(s) in question who are likely to be considered local computer power users, information that could help establish a valuable social network in deploying future projects.