Oleg M rk u
Author: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . June 2001 Supervisor: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . June 2001 Head of the Chair: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . June 2001
Tartu 2001
Contents
Introduction Aims of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . . . . . . Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 System Analysis 1.1 Domain Model . . . . . . . . . . . . 1.2 Generic Requirements . . . . . . . . 1.3 Conventional Elections . . . . . . . . 1.4 Trust . . . . . . . . . . . . . . . . . . 1.5 On Revoking Ballots . . . . . . . . . 1.6 E-voting Requirements . . . . . . . . 1.6.1 Functional Requirements . . . 1.6.2 Non-functional Requirements 3 4 5 5 8 8 9 10 11 15 15 15 18 20 20 21 21 23 26 26 27 30 34 35 35 36 38
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
System Design 2.1 Theoretical Basis . . . . . . . . . . . . . . 2.1.1 Model of the Real World . . . . . . 2.1.2 Electronic Voting Scheme . . . . . 2.1.3 Public Key Infrastructure . . . . . . 2.1.4 Time-stamping . . . . . . . . . . . 2.1.5 Bulletin Board . . . . . . . . . . . 2.1.6 Threshold Encryption and Signature 2.1.7 Implementations of EVS . . . . . . 2.1.8 On the Freedom of Choice . . . . . 2.2 Designing Framework . . . . . . . . . . . . 2.2.1 Real World Model . . . . . . . . . 2.2.2 Computing Device . . . . . . . . . 2.2.3 Software . . . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
2.3
2.4
2.2.4 Threshold Trust . . . . . . . . . . . 2.2.5 Connection . . . . . . . . . . . . . 2.2.6 PKI . . . . . . . . . . . . . . . . . 2.2.7 Time-stamping . . . . . . . . . . . 2.2.8 Summary . . . . . . . . . . . . . . Design for Bulletin Board . . . . . . . . . . 2.3.1 Some Simple Ideas . . . . . . . . . 2.3.2 Synchronous Environment . . . . . 2.3.3 Asynchronous Environment . . . . 2.3.4 Practical Solutions . . . . . . . . . Design Pattern for E-voting System . . . . 2.4.1 Computing Result . . . . . . . . . 2.4.2 Meta Process . . . . . . . . . . . . 2.4.3 Design for Single Authority EVS . 2.4.4 Design for Multiple Authority EVS 2.4.5 Conclusions . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
38 40 44 47 47 48 49 50 51 52 53 57 61 61 63 65 66 68 69
Summary
Introduction
Recently, the topic of implementing electronic voting (e-voting) has become very popular: multiple workshops have been held, there exist rms that provide corresponding services, real attempts of e-voting have taken place, media is eagerly covering this topic. The main purpose of electronical elections is allow voters to vote from as many locations as possible, ideally from their personal computing devices. An intermediate option would be to have specialized computers (kiosks) be deployed everywhere like ATMs (automated teller machines) currently are. Communication media would probably be Internet or something similar. The justication is that it would be more convenient, which would increase voter turnout. Also, one might expect that in future e-voting would become less expensive than conventional voting. At the current time e-voting is viewed as a complement to conventional elections because, for instance, not all people have access to computers and Internet (or skills to use them). Despite its tempting simplicity, this problem is much more complex than it seems at the rst moment. The main issues are security and reliability. The problem of organizing e-voting consists roughly of three parts:
Solving problem mathematically, which includes formulating model of the real world (e.g. formalizing the notion of trust), stating requirements, and nally nding a mathematical construction and proving that it satises these requirements. Such construction is called electronic voting scheme (EVS): a collection of protocols and algorithms, which implement e-voting within formulated model of the real world. I will call all this theoretical activity. Provided there is an EVS, it is needed to implement it. In particular, real world model, which was used, must be implemented. Besides that, EVS has usually relatively simple structure (nevertheless being complex mathematically), which assumes some inputs and produces some outputs. It does not consider the process of preparing input data and consuming output data. Also, e-voting must be somehow integrated into existing conventional voting
process. Real implementation must consider the whole iterative process of organizing elections. I will call this technical activity.
Finally, e-voting will inevitably differ from conventional elections: voter must perform different actions, there are different (and probably bigger) security threats, different demographical groups have different level of access to the Internet, etc. For this reason politicians and sociologists must evaluate impact of e-voting on the democratic process and decide whether it is useful at all and provide suggestions what should be changed. Besides that, laws must be changed to accommodate e-voting into conventional voting process. I will call this political activity.
Theoretical activity belongs to the eld of cryptography and has lasted for at least twenty years. The most inuentious papers in this eld are (personal opinion): [Cha81], [Ben87], [BT94], [CGS97]. Reader can nd a partial overview of this topic in my semester work [Myr00]. Basically, it can be said, that there exist solutions of acceptable security and complexity, although there is enough place for further advances. There exist some number of rms, which provide e-voting solutions. The most well-known of them are probably [VoteHere.net] and [Election.com]. The rst of them provides (at least) some description of their technology and is based on [CGS97], which is a good cryptographical construction. On the other hand the second of them has received more media attention, but does not present any description of their technology at their web site (which is a disadvantage, to my opinion). A number of workshops have been conducted, which concentrated on political and technical aspects: National Workshop on Internet Voting [IPI], Voting Integrity Project [VIP], California Internet Voting Task Force [CIVTF]. Their major nding is that although there is enough theoretical basis for implementing e-voting, technologically it is not possible to make systems secure enough. The biggest problem is insecurity of conventional personal computers and Internet. At the same time they propose using e-voting kiosks in near future.
Although it is clear that risks of voting from usual PCs over Internet are too high, it is still interesting to design e-voting system and see where and why these risks come up.
Acknowledgement
I would like to express my gratitude to Helger Lipmaa for introducing me to this subject and motivating me to deal with it and also for pulling me into Estonian e-voting project [LipMy01].
Notation
As a potential reader might have theoretical computer science (and not software engineering) background, I will describe shortly notation used. Two types of UML diagrams are used: static structure and activity. Notation is used and interpreted quite freely, which should be normal from the viewpoint of UML. There are also some other types of diagrams used, but their meaning should be evident or will be explained separately. Static structure diagram sample is presented on Figure 1. It depicts Factory (a class) that produces Cars (dashed line signifying dependency or direction of ow). Cars have names (an attribute) and operations Car::Start() and Car::Stop() (methods). Each car has at most one Owner, each owner can have many cars (arrow with 1 and * signifying one-to-many relationship). Car consists of Wheels (rhomb signifying aggregation). Car is a kind of Beeper, though it can Beep() (triangle signifying generalization, interface signifying a set of methods that some class should implement). Activity diagram sample is presented on Figure 1. It is supposed to describe state transitions and ow of a process. The upper black dot signies beginning of the process (initial state), the one at the bottom is the nal state. Bubbles signify either activity or state, arrows depict transitions. On this diagram Work1 and Work2 are performed in parallel, state Complete is reached when both of these activities complete. System architecture diagram sample is presented on Figure 1. Here threedimensional bar depicts a subsystem, simple rectangle depicts a process (I also interpret it as a user), rounded rectangle signies an object (or data), cylinder depicts datastore, grey bar between database and computer signies a boundary. Lines and arrows are used freely to signify relationships and directions of dataow.
Factory
Owner
Prepare
Work1
Work2
Complete
Programmer
Computer
Database
Program
Chapter 1
System Analysis
In this chapter I will try to describe the problem of e-voting in detail and formulate requirements for e-voting system. This would be a basis for designing and implementing such system.
* * Voter * 1
Figure 1.1: Domain model. Person Any person that may participate in some election. I assume that each person has unique identier. Election Specic election. I assume that elections are identied by a unique name. 8
Voter A person participating in election. Many voters can participate in election. Each person can be a voter in many elections. Voter is identied by persons identier and name of the election. Ballot Type Different voters can be presented with different ballot types at some election. Ballot type consists of some number of options amongst which voter will have to select one. Option One option belonging to some ballot type. Options are identied by names. Option names are unique within corresponding ballot type. At real election voter might be required to answer to multiple questions. In my model this can be modelled with multiple simultaneous elections. If it is important that voter answers correctly to each question, it can be easily enforced with technical or administrative methods. In addition I present the following denitions: Denition 1.1.1 Ballot - an option from some ballot type chosen by a voter. Denition 1.1.2 Tally - a set of ballots from voters of some election. Each voter can have at most one ballot. Denition 1.1.3 Election Result - calculated from tally, where for each ballot type it is said how many times each option was selected.
Incoercibility Nobody can learn how a person voted even if cooperating with him is possible. This includes that voter cannot prove how he voted. Of course, voter could just tell coercer how he voted, but coercer would not have any means of verifying this claim. Interested reader can nd longer discussion of this subject in [Myr00].
Election Organizer 1 *
Intermediate Organizer 1 *
Voting Location
Figure 1.2: Conventional elections. The structure of the system is usually hierarchical. Normally there is one organization, which is responsible for organizing election. I call it Election Organizer. On the other hand there is some number of hierarchy leaves, where people can actually vote. I call them Voting Locations. Voting locations and election organizers communicate through intermediate organizers, which group some number of voting places (usually on geographical basis). Different ballot types can be used at different voting locations. Each voter is assigned to one main voting location (close to his residence), where he can normally vote. In such setting it is possible to prevent voters from overvoting (voting more than once). Different voting technologies can be used there. One possibility is to use paper ballot, where options are presented and voter has to select one of them. After that ballot is casted into a sealed box. Later all ballots are taken out of the sealed box and counted. The counting process can be automated, for example, by means of optical scanning. All of the technologies assure that after voter has casted his ballot, it is not possible to link voters identity and his ballot, which ensures privacy. In fact, incoercibility is also guaranteed because voter is casting his ballot in a private voting booth, which implies that voter cannot prove how he voted (of course, in real world such claims are always 10
relative to how much determined is the adversary). It is important to point out that after voter has casted his vote, it is not possible to remove his vote from the tally, which might be useful, if later it turns out that voter was not eligible to vote. If voter does not want to vote at his main voting location, there are some procedures for absentee voting, which allow him to vote from a broader number of locations (e.g. even from home). In this case special measures must be taken to prevent voter from voting more than once (overvoting). One possible solution is that voter puts his ballot into a clean envelope, seals it, and then puts this envelope into a new one, where he writes his identication. After that all envelopes from one voter must arrive at the same place, where it can be ensured that he did not vote more than once. Also, if voter is not prevented from voting at his main voting location, it must be ensured that he did not use both voting procedures. This implies that the best place for gathering voters envelopes is at his main voting location. If it is decided that voters envelope should be counted, external envelope is removed and the second clean envelope (which cannot be linked to voters identity) is put into a bigger pool of clean envelopes of other voters. After that voters ballots can be processed without compromising voters privacy. If voter is obliged to form his ballot in a private voting booth, incoercibility is also guaranteed, but if ballot can be composed at any place (e.g. at home and then sent by mail) someone could have been watching how voter is lling his ballot. Election results at each voting location are passed up the hierarchy to the intermediate organizers and nally to the main organizers (separately for each ballot type). At every level information passed from lower nodes is summed up and then passed to the parent node. Integrity of the process is ensured by the presence of observers, whose function is to verify that everything is performed as needed (votes are counted correctly, voter privacy is maintained). Distributed nature of the system ensures that violations of voting process at some node will not poison the whole voting system and though will have only limited effect on election result. A separate problem is compilation of lists of people who can vote at each voting location. Probably the best solution is to have a database of all people from which lists of voters can be generated, but this is not always the case.
1.4 Trust
E-voting system must be trusted by all entities, whom decisions made using it may concern. In the case of a country, the system must be trusted by all citizens, government, organizers themselves (hopefully) and also it must be approved by the international community. The problem of verifying that system corresponds to its requirements is very
11
common in the eld of software engineering. Requirements can be usually divided into the following groups: Functional Which functions should the system be able to perform. Efciency Limitations on time, memory, and communication channel bandwidth requirements. Dependability To what extent one could rely (depend) on the system, including: Reliability Limits on statistical measure of frequency of faults. Is related to availability. Availability Minimum acceptable percentage of time, during which system performs correctly. Is related to reliability. Safety Limits on the measure of loss in the case of big failures. Security What functions should system prevent from performing. In general, such requirements are application specic, but the most usual ones are: Access Control Policy dening which system users can perform which operations. Condentiality Policy dening which system users are supposed to see which data. Integrity Policy dening how data present in the system can be entered, modied, and deleted. Most of requirements can be assigned meaningful numeric measures, but in practice it is almost never possible to measure them directly. Functional and efciency requirements can be measured to some extent by testing, but dependability and in particular security is not measurable directly almost at all (also it is very rare when systems can be built so that their correctness can be proven formally). A typical solution is to evaluate parameters indirectly, by measuring the quality of the process of creating and maintaining the system, which gives of course very vague results. The following aspects are usually taken into consideration:
Whether best practices and common sense are used. To what extent system has been subject to testing, formal verication. Presence of continuous process of improvement of systems quality.
12
For how long the system has withstood real or theoretical attacks by other interested parties. Presence of continuous process of prevention, detection and reaction to possible attacks.
In general specialists procient in one eld are not able to verify systems from the other elds. A general way to ensure quality of a system is to have specialists others than those who created it review it and express opinion. In the case of approval veriers would become partially responsible for that system themselves. Figure 1.3 describes components of hypothetical e-voting system, and also specialists that are responsible for them, who must be trusted to some extent. Theory Scientists are responsible for developing corresponding theory, the quality of which must be ensured by peer review. Hardware and Software Engineers are supposed to provide hardware and software, which must be certied. Servers E-voting system consists of one or more server computers, which are set up and maintained by the operators. Integrity of servers could be ensured by the presence of observers. E-voting system E-voting system itself is set up and maintained by the organizers. Integrity of the whole process could be also ensured by the presence of observers. Network Network connecting servers and voters computing devices is set up and maintained by the providers. Voters computers Voters computing devices are set up and maintained by administrators. In many cases voters are administrators of their computers themselves. Careful reader has probably noticed that network and voters computers do not have certiers. This is mostly a reection of existing situation. It is very hard to certify in any sensible way network, which spans the whole world (Internet). Although users computers could be certied, it is not done massively at the current moment. Also in the case of contemporary personal computer (PC), it might not be a very sensible activity, because voter would be able to miscongure his computer right after certication. So voter computers certication bears any sense only to the voter himself (and may be to computers administrator), but not the rest of the world. 13
Theory
Scientists
Hardware
Software
Engineers
Servers
Operators
E-voting system
Organizers
Network
Providers
Voters' computers
Administrators
14
When voter is casting his ballot it is checked whether his has already done that using the other facility. After election is over it is checked if someone voted using both facilities and then this voters ballot must be revoked from one of the two tallies.
As it is not possible to revoke ballots in conventional voting, introducing evoting would require one of the following:
E-voting to allow revoking ballots. E-voting and conventional voting not taking place at the same time, but sequentially. To have a database where for each voter it would be recorded if he has already voted and to maintain online connection between this database and all voting locations.
Its clear that the rst option is the most preferable as it requires the least resources (does not require database) and is the most convenient (conventional election and e-voting can take place at the same time).
15
Enter election parameters It must be possible to create new election record and enter general parameters like name and time period during which it should take place. Enter voter list After that voter list must be entered somehow. Besides that some form of voter identication must be provided. It is reasonably simple, if there exists database of all voters, from which this list could be retrieved with corresponding query. On the other hand, if voters are supposed to register at election (as in USA), it would require a separate software module to full this requirement. Enter ballot types Further, system should support entering different ballot types and options. Map voters to ballot types Finally system should allow creating mapping between voters and ballot types. Normally it would be possible to devise this mapping from supplementary information about voters, if it is available. For example, if ballot types are assigned based on where voter lives. Conduct election Having prepared election information it should be possible to conduct it. It is conceivable that system would start and stop election automatically based on the time period entered when creating election information. Start election Pause election Stop election Finish election After election has been conducted, it is needed to make conclusions. Revoke voters System must allow to leave uncounted ballots of some voters. It is an optional but very desired requirement, which was discussed in Section 1.5. Compute result After that result must be computed: for each ballot type it must be calculated how many times this option was selected. Output result Finally election result must leave the system and enter external world. One option is to print it on paper. The other option is to produce digitally signed document. Archive election As the last step election information must be archived to durable media. 16
Save election information The following information must be archived: Election information: voter list, ballot types, mapping between voters and ballot types. Binary representation of casted votes or any other information based on which election result was computed (and can be recomputed), if possible. The list of voters whose ballots were revoked when computing election result. Ofcial election result as computed before. Verify archived information It must be possible to verify that all archived information is correct, in particular that archived election result matches other archived information. Figure 1.4 illustrates the main process of e-voting system from the viewpoint of organizer.
Preparing election
Conducting election
Revoking voters
Computing result
Archiving
Figure 1.4: Voting process. Voters use cases Voter must be able to perform the following operations: Select election As there might be many elections taking place simultaneously, voter must be able to select which election he wants to work with. See ballot and make a choice After having selected election, voter must be able to see the options and select one of them. 17
Submit ballot Finally he must be able to submit his ballot to the system. Change or delete ballot Optionally it might be allowed to change or even delete previously submitted ballot. Access Control It should be possible to dene for each system operation, which users can perform it.
Election information preparation should act as a usual application where software delays should not be longer than (say) 1 second. The process of conducting elections should be efcient enough to ensure required speed of voters actions. The nal stages of election like computing result, archiving or verifying election data should not last more than a couple of hours.
Freedom of Choice Besides functional requirements, system must satisfy the freedom of choice requirement, which was discussed in Section 1.2. Privacy Is considered a minimum requirement and is necessary.
18
Incoercibility In general, incoercibility is very useful, but in the case of evoting it can never be ensured completely: if people can vote at any possible location (including at home) it is possible that someone will be looking over shoulder how someone is voting. So probably it is enough to ensure that coercibility cannot take place at a larger scale, than it is possible by physically attending when someone is voting. Dependability Its clear that e-voting system must be much more reliable than conventional software. There are two ways how system might be malfunctioning:
Some operations cannot be performed (voter cannot vote, result cannot be computed). System looks like working correctly, but it is not (voters software silently selects different option, some voters ballots are silently omitted, result is computed incorrectly).
19
Chapter 2
System Design
In this chapter I will describe different options for implementing e-voting system. I will also try to compare them and evaluate risks. First, an overview of theoretical basis will be given and then design for different parts of the system will be described. It is important to stress that design for such system does not mean only software design, but also design of the organization and the process, and also possibly hardware. As it was already mentioned the main problem of implementing e-voting is security. Security is almost always relative - it can be broken with some investment of resource (money and/or time). Although the benets of breaking e-voting system cannot be completely measured (for instance in money), it can still be argued that adversary would not spend more resource on breaking e-voting than he would gain from breaking it. This implies that elections of different importance would require different minimum level of security.
20
called Electronic Voting Scheme (EVS). The following types of actors are dened: Voter Actor that will vote in election. Authority Actor that will organize election. One and the same actor can belong to more than one set. It is assumed that:
There is one ballot type with L options, which are known to all actors. All actors know identities of authorities and have means of communicating with them (e.g. know addresses at which they send at receive messages). All actors know identities of all voters. Some of the voters have selected one option and intend to cast it.
Electronic voting scheme is a set of protocols (algorithms) for actors, which allow voters to send their ballot with selected option to authorities and authorities to compute election result and make it available to all actors. EVS must satisfy the following requirements: Correctness Election result must be computed correctly based on all ballots submitted by the voters. Freedom of Choice Either privacy or incoercibility, as discussed previously. It is very typical to require EVS to be veriable in order to ensure correctness: computation results of authorities must be veriable by any actor. This usually means that authorities must provide computational proof of correctness of election result. Each EVS should dene:
Means of actor identication. To what extent actors must be trusted to perform according to prescribed protocols. Basically, there are two options: Assume that actor is honest - i.e. follows protocols. Assume threshold trust towards thorities).
22
1 1
and produces hash of xed length (say 128 bits). Digest algorithm is supposed to be collision resistant - it should be infeasible to nd two different binary sequences having the same hash.
Key Generator::Generate() is an algorithm that randomly generates
a pair of two keys of specied bit length (e.g. bits). These keys are used as inputs to Encryption Algorithm and Signature Algorithm methods.
Encryption Algorithm::Encrypt() takes as input a Public Key
and a binary sequence of any length (called plaintext) and produces a binary sequence of comparable length (called ciphertext). Encryption Al23
gorithm::Decrypt() takes as input Private Key and performs reverse transformation from ciphertext to plaintext (which was encrypted using the public key from the same key pair). It is required that actor knowing the public key (but not private), some number of ciphertexts and corresponding plaintexts, would not be able to devise any information about the plaintext corresponding to some other ciphertext.
nary sequence of xed length and produces another binary sequence of xed length called signature. Length of input and output sequences is comparable to the length of the key. Signature Algorithm::Verify() takes as input Public Key, binary sequence, and signature and checks that this signature was produced from that binary sequence with corresponding private key. It is required that actor knowing the public key (but not private), some number of binary sequences and their corresponding signatures would not be able to produce signatures for any other binary sequence. In order to sign binary sequences of any length, digest is computed from them and then signed. There exist cryptographical algorithms that satisfy these interfaces, whereas it is important to assume that actors have only polynomially limited computational power2 . Now, if two actors and generate themselves pairs of keys, exchange somehow their public keys, and keep their private keys in secret, they get ability to communicate secretly and with identication: when actor wants to send message to actor , it signs it with his private key, encrypts with s public key and then sends resulting message to . No other actor who might learn message , but does not know s private key cannot devise any information about the message . At the same time is able to verify s signature, which could have been produced only by some actor knowing s private key (which is supposed to be kept in secret). Some encryption algorithms have an interesting property: is able to prove to someone else knowing message that he sent message , without revealing any information about his private key. Despite the beauty of such solution, there is one problem: actors need to exchange their public keys somehow. It is not possible to do it over the public connection, because it is not possible to ensure who did the public key come from. In
It should be stressed that there also exist many variations that do not exactly t into this description. Also not all encryption and signature algorithms have complementary counterpart in the sense of sharing the same key pair. This means that there exist encryption algorithms that do not have complementary signature algorithm, which could use the same key pair and vice versa.
2
24
fact, in such setting this problem is not solvable at all: at some moment communication with reliable identication is necessary. So at best one may require such communication to take place only once. PKI provides the following construction to deal with this problem:
All actors are assumed to have identity, which has a unique identier, which can be represented as a binary string. A special construction called certicate is introduced. It consists of: Subjects public key (in binary representation). Subjects identity (binary representation of its identicator). Some optional attributes explained later. Issuers identity. Signature of all preceding items veriable with issuers public key. Certicate is interpreted as a statement that binds contained public key to the subjects identity. If someone having such certicate has reasons to trust issuer of the certicate and he knows issuers public key, he would have reason to believe that specied public key belongs to the subject. Certicate may allow (trust) or forbid (not trust) the subject to issue certicates himself. Actor who is trusted to issue certicates is called Certication Authority (CA). Certicate may also be limited to some eld of activity. Such information can be recorded in the attributes of the certicate.
If actor has some number of certicates one could think of a graph, where nodes are identities and arcs signify certicates connecting issuer to the subject. Besides that, some nodes have associated certicates, which are assumed to belong to corresponding identities. Different nodes, certicates, and arcs can have different level of trustworthiness. After that derivations can be made on this graph (transitive closure). The simplest form of such graph is a tree: there is one root CA, which might certify some number of intermediate CAs, which nally certify all interested actors. It is assumed that everybody trusts the root CA. In order to get into this framework one would have to prove his identity to some of the CAs (what cannot be done within our model and so must be done externally) and provide his public key. After that, a certicate would be generated that would be trusted by everybody. It is important to point out that all this framework holds as long as actor is willing to keep his private key in secret. Nothing prevents him from revealing it to someone else. Also in real life someone might steal someones private key and for 25
this reason CAs are supposed to provide means of checking whether certicate is still valid. This can be accomplished, for instance, by providing certicate database (containing either valid, revoked, or both kinds of certicates) which can be queried online, or by periodically publishing certicate revocation lists (CRLs).
2.1.4 Time-stamping
Time-stamping is a complementary to PKI service, which allows binding arbitrary message to a moment in time. This is done by creating an additional time certicate message. At least two avours of time-stamping exist:
, within which Absolute Allows determining reasonably small interval the message received time certicate. Such construction is of any use only when actors clocks are synchronous.
and having time certiRelative Allows determining for any two messages cates, which of them received the time certicate earlier. Time-stamping service (TSS) is supposed to be implemented by one trusted actor or by actors with threshold trust. Actors forming such service are called Time Stamping Authorities (TSAs). In ideal, time certicates should allow comparing them without the need for contacting TSS (ofine verication). Observing time certicate of a message proves that message was created before the moment in time associated with this certicate. If a signed message incorporates time certicate of any message (e.g. empty), one could conclude that message was created after the moment in time associated with the certicate. One of the most important applications of time-stamping is in the situation when someones certicate is revoked (e.g. due to private key leak). In such situation time certicate could be used to prove that message was signed before the certicate was revoked.
26
The dynamics of a bulletin board could be described by latency (how long does it take after the message was sent to become readable for everyone) and monotonicity (guarantee that if someone has seen a message at the bulletin board, then at each successive read this message will be visible to every reader). Monotonicity for one specic reader could be called read repeatability. For the purpose of proving time-outs (in order to accuse some actor of not participating) it is very desirable that bulletin board would be able to tell with a reasonable precision at which moment message was sent. Such property could be called absolute time-stamping (as opposed to relative ordering provided by the bulletin board in any case). In a modication of bulletin board called atomic multicast it is supposed to forward messages to some number of subscribers. Note, that this is not the same when some actor sends a message to a group of actors himself because, for instance, there is no guarantee that one and the same message will be sent to everybody. If ordering of messages is not important, the construction is called reliable multicast. It should be clear that atomic multicast can be implemented with bulletin board by polling it periodically, which might not be as efcient, of course. A simplication of bulletin board (and atomic multicast) is to maintain a separate single writer multiple reader bulletin board for each actor. In this situation ordering of messages can be devised for each actor separately. Even in such setting one actor can prove that his message was sent after some other message by some other actor by including in the former message the hash of the latter. It is relatively simple to implement bulletin board if there is one actor everybody would trust. Otherwise a system consisting of multiple actors with threshold trust on them must be devised. It should be clear that full-blown time-stamping exists iff full-blown bulletin board exists. Still, from the viewpoint of efciency they have different prole: bulletin board requires much more storage and at the same time time-stamping might be required to process much more messages and exist for longer period of time. For the purpose of e-voting it is enough to have a single-writer bulletin board for each actor with reasonable latency, repeatable read, and absolute timestamping. Monotonicity is desirable, but is not a requirement. The number of messages at each bulletin board would be fairly small (say ).
share holders decide so. Such construction is very useful in the context of threshold trust. Figure 2.2 depicts relevant constructions.
interface Threshold Key Generator +Generate() +Reconstruct Public Key() +Reconstruct Private Key()
It is assumed that each of actors has his own private key and corresponding certicate is available to all others. It is also assumed that actors communicate through the bulletin board (or even better with atomic multicast). Each of
tor::Generate(), which:
Takes as input actors key pair. Outputs Private Key Share (supposed to be kept in secret) and also Public Output with actors signature (supposed to be made available to everyone).
Public Outputs can be veried with Verify() method. Those that do
Threshold Key Generator::Reconstruct Public Key() can construct a Public Key based on Public Outputs present on the bulletin board. Public key can be used by conventional Encryption Algorithm to encrypt and Signature Algorithm to verify signature.
In principle a conventional Private Key can be reconstructed with help of Threshold Key Generator::Reconstruct Private Key(), although it is not usually used.
Threshold Encryption Algorithm::Encrypt() takes as input Public Key and works exactly as conventional Encryption Algorithm. In order to decrypt a ciphertext, actors must execute Threshold Encryption Algorithm::Partial Decrypt(), which:
Takes as input ciphertext and actors Private Key Share. Produces Partial Decryption.
Partial Decryption consists of:
Binary information. Computational proof of correctness, which can be veried with Verify(), which takes as input Public Outputs present at the bulletin board. Those partial decryptions that do not pass this verication should be ignored further. Finally plaintext can be reconstructed from Partial Decryptions of actors with help of Threshold Encryption Algorithm::Reconstruct().
In order to sign a plaintext, actors must execute Threshold Signature Algorithm::Partial Sign(), which: Takes as input plaintext and actors Private Key Share. Produces Partial Signature.
Partial Signature consists of:
Binary information. Computational proof of correctness, which can be veried with Verify(), which takes as input Public Outputs present at the bulletin board. Those partial signatures that do not pass this verication should be ignored further. 29
Finally signature can be constructed from Partial Signatures of actors with help of Threshold Signature Algorithm::Reconstruct(). Threshold Signature Algorithm::Verify() takes as input Public Key and works exactly as conventional Signature Algorithm. It is important to point out that:
Both threshold decryption and signature operations can be performed only by or more actors or by all actors, the Public Output of which passes Verify() (which is relevant if their number is less than ). Decryption operation result cannot be incorrect due to proofs of correctness of partial decryptions (signature operation result can always be veried directly). This can also be used in the case .
Accept ballots from all voters3 . Count them correctly. Not to use single decrypted ballots (intermediate results of computation) in any other operation and discard them.
Despite the naivety of such solution, it can be made reasonably safe in real life. There exist many other more or less secure solutions, but the best of them (such statements are always subjective) follow the pattern, which is usually called Multiple Authority Solution. The algorithms and data structures involved are depicted on Figure 2.3.
Multiple ballots could be accepted and the latest of them used - this would allow modifying vote. Also, there could be a ballot of special form that would require authority not to count it, which would enable voter to delete previously submitted ballot.
3
30
Voter's Ballot interface Voter's Algorithm +Generate Ballot() Authority's Setup Information +information +signature +Verify() Authority's Output interface Authority's Algorithm +Setup() +Compute() +computation result +proof of correcntess +signature +Verify() +encrypted information +proof of correctness +signature +Verify()
31
It is assumed that there are authorities and each actor has a pair of public and private keys and certicates of all other actors. It is also assumed that actors communicate through the bulletin board (it would be good if authorities could communicate through atomic multicast). First authorities jointly execute a setup phase, during which each authority must execute algorithm Authoritys Algorithm::Setup(), which: Takes as input authoritys key pair. Might communicate with other authorities. Outputs a piece of private information that authority should keep in secret and also public Authoritys Setup Information with authoritys signature. The correctness of setup information can be veried with Authoritys Setup Information::Verify(). Those setup informations that do not pass verication should be ignored further. Resulting setup information must be available to all actors.
Further, during voting phase each voter can generate his ballot using Voters Algorithm::Generate Ballot(), which: Takes as input voters key pair, selected option, and Authoritys Setup Information from all authorities. Outputs Voters Ballot.
Voters Ballot consists of:
encrypted information about voters choice. Computational proof of correctness, which can be used to check that encrypted information was formed correctly without the need for decrypting. Voters signature. Voters ballot can be veried with Voters Ballot::Verify(). Those that do not pass this verication should be ignored further. All voters ballots should be made available to all actors.
After that, during tallying phase each authority executes Authoritys Algorithm::Compute(), which:
32
Takes as input authoritys key pair, private information generated during setup phase, Authoritys Setup Information from all authorities, and voters ballots. Veries voters ballots using Voters Ballot::Verify() and selects one correct ballot for each eligible voter. All authorities must produce exactly the same list of ballots. Produces Authoritys Output.
Authoritys Output consists of:
computation result explained later. proof of correctness of computation result. Authoritys signature. Authoritys can be veried with Authoritys Algorithm::Verify(). Those that do not pass verication should not be used further. output
Finally every actor can execute Consumers Algorithm::Compute Result(), which: Takes as input Authoritys Setup Information from all authorities, and Authoritys Output from exactly different authorities. Computes election result telling how many times each option was selected. This step can take a long time. This result must be veried with Consumers Algorithm::Verify Result(), which: Takes as input Authoritys Setup Information from all authorities, voters ballots, Authoritys Output from exactly different authorities, and the computed result. Veries voters ballots using Voters Ballot::Verify() and selects one correct ballot for each eligible voter. Selected ballots must be exactly the same as when authorities selected them. Veries Authoritys Output from those ify() method.
Veries that election result was computed correctly from Authoritys Output from authorities. Result verication is remarkably faster than result computation. 33
It is of crucial importance, that every actor sees the same Authoritys Setup Information from every authority and the same Voters Ballot from every voter (which might not be the case if some actor could send different information to different actors). This condition implies that all information should be posted to the bulletin board. It is important to point out that even if this condition does not hold, wrong election result cannot be computed, because some verication would fail. As voters ballots are sent to the bulletin board, it is possible to allow voters sending multiple ballots, amongst which the latest would be selected. Also there could be a ballot of a separate form that would require authority not to count it. This would allow deleting previously casted vote, although this might not be secret. Also the following holds:
Election result can be computed as long as there are at least authorities, which follow algorithms. Election result can never be computed incorrectly (to be more precise it can happen with negligibly low probability). Information on choices of individual voters can be extracted if at least authorities that produced correct Authoritys Setup Information decide to do that. Also all authorities together that produced correct Authoritys Setup Information can do the same (it is relevant in the case when the number of such authorities is less than ).
All this implies, that no actor should proceed further if the number of authorities that produced correct Authoritys Setup Information is less than . Careful reader has probably already noticed similarity of this construction with threshold encryption and signature. An example of such EVS would be variations of [CGS97] with shared public key (threshold) generation in the setup phase (see [Myr00]). Another option is for instance [CFSY96].
Generally, in an environment with only public (tappable) communication incoercibility is provably not possible. If PKI is introduced and it is assumed that voter does not want to reveal any information about his private key, it might be possible to deduce EVS that would satisfy incoercibility, but I am not knowledgeable of any such construction. Another option is based on observation that incoercibility is not achievable because voter can see intermediate computation results and is able to sign any message. This leads to a solution, where there would be a specialized device, which would ask voter which option to use, perform all computations, sign result and then pass it over to the voter, so that he would not be able to see intermediate results and would not be able to sign arbitrary messages. Unfortunately it is very hard to imagine that such solution would justify itself economically.
Software Software used must be correct - it must correspond to the algorithms and specications. Besides that, most of the people do not write their software on their own. This implies that correct software must be somehow delivered to the computing device.
Gain access to the computer either virtually or physically 4 . Execute software instructions.
Computer cannot be viewed without software running on it (OS, application software, shared libraries), which might have occasional or intentional bugs, which enable external entities to manipulate computer, including executing any software instructions and inspecting permanent or volatile memory. In the context of contemporary personal computer, the following problems exist:
General purpose software is written with emphasis on functional requirements and not so much dependability (because the former, not the latter gives prot, unless the latter is critical). As a result, contemporary software systems are ooded with security vulnerabilities. Another problem related to contemporary ways of software distribution: lots of programs are installed, and usually installation programs have full access to the system and can modify any feature of it, including introducing backdoors for unauthorized access to the system from outside. Although software rms might not have motivation to do such things themselves, it is enough to have one of their employees to do that. Many people can gain virtual or physical access to the computer.
In general the following can be done to ensure computing device to be correct and untappable: Use minimal, xed, veried set of necessary software (including operating system).
There also exist ideas how to wiretap computer from distance by measuring magnetic eld, etc.
36
Use rened access control system, which gives minimum needed rights to the installed software by default. Limit access (including physical) to only relevant personnel.
It is generally agreed that contemporary PC is quite insecure (not enough correct and untappable). At the same it is probably possible to create secure computing device, because most of the problems exist due to historical reasons or lack of economical motivation. Also, the smaller and more specialized the system is, the simpler it is to make it secure. A clear borderline should be drawn between the limited number of computers used by organizers and almost unlimited number of computers used by the voters. The amount of resource that can be invested into securing organizers computers is orders of magnitude higher than for voters computers, which should generally be used as is. What regards securing voters computers, two trends should be mentioned. First, recently multiple different handheld mobile devices have become affordable to large masses. Designing such device from scratch gives a good opportunity to implement robust security from the ground up. Also some devices could be created with xed set of preinstalled unmodiable software, which would increase security of such device a lot. In practice such devices are not principally more secure than usual PCs. Another idea is to have a tamperproof device (called smartcard) having a processor and memory chip, but otherwise being not self-contained, and keep there some well managed (even better, xed) software and data (secrets). The latter could be in secret even from the owner of the card himself (e.g. private key). Smartcards have an interface through which other devices can communicate with them. Smartcards are activated by entering PIN code, which is usually passed through the device to which smartcard is attached. Denitely smartcards have their own security threats (see [Sch99]), most remarkable of them is that the device to which smartcard is attached, after the PIN code has been entered, can manipulate the card in any uncontrolled way (sign any messages) and also reveal PIN to anyone else. This implies that device to which smart card is attached must be rather secure itself. Another problem is that most of existing computers are not equipped with smartcard readers and it will probably take a long time before they become widely adopted. Besides being correct and untappable, the computing device has to be randomised. Randomness can be retrieved from special physical device or a cryptographic primitive (algorithm) called pseudorandom bit generator, which must still be seeded with small random piece of information, which is usually collected based on the behaviour of the computing device, which depends on (unpredictable) actions performed by the user. 37
The best what can be done to keep computing devices clocks synchronous is to use periodically Network Time Protocol ([NTP]) to synchronize them with some time servers clock. Simple NTP (SNTP), suitable for usual computers, provides accuracy of 1 second, which should be sufcient. Also care must be taken to avoid bugs when dealing with time zones.
2.2.3 Software
Software correctness is a direct implication of the quality of software development process. In order to increase trust towards software, development process, source code, and supplementary documentation must be reviewed and certied by some external trusted parties. After there exists trusted source code from which software can be built, it must be delivered and deployed at the computing device. A problem arises here, because despite the fact that the source code of the software was certied, it does not imply in any way that the binary distributable that is received was built from that source. The solution is to require software publisher and certiers to sign the distributable and express in this way their trust towards software with respect to specied purpose. In this case each certier would need to receive the source code, inspect it, build binary distributable, and nally sign it. This implies that all certiers should be able to produce exactly the same distributable, which means that the build tool (compiler) must be deterministic (which they hopefully are, or at least can be made quite easily). It is natural to expect that framework for expressing trust by signing binary distributable should be a part of PKI. We will see later to what extent it is supported now. Another option is to require certiers to sign the source code, distribute it, and expect end-users to build it themselves, which is quite unrealistic and also time-consuming.
38
where is probability of failure of one component. For instance if , , the probability of failure of threshold trust is less than and . Note that cannot be too small, because for instance in the case of the probability of failure would be . On the other extreme, if the events of failure are completely dependent (i.e. if one component fails, all components fail) no advantage is gained as compared to case . Active Adversary It must be ensured that each component must be broken separately from the others so that adversary would need proportionally more resource to break components. Colluding Components Actors, which control components (or indeed are them) should not wish to cooperate with each other to break the service. This is largely a political issue of selecting actors. Described countermeasures require components to be independent. The following could be done to ensure independency:
Independent implementations and manufacturers (hardware, OS, libraries, software). Independent resources: Physical location Power supply Network
An interesting issue arises in the context of reliability (where service remains working as long as components are functioning correctly and can communicate) when components can be fragmented into two or more segments that cannot communicate with each other (e.g. network failures). In such situation components in each segment would consider components in the other segment failed as it is not possible to decide whether connection has failed or component is not communicating intentionally. Now, if there are two segments each containing components, each of them could form the service on its own, leading to the situation of split mind. If this is an issue, protocols and algorithms should be designed to avoid or at least to require that such situation. The simplest way is to require one can contact more than components. 39
2.2.5 Connection
Connection between computing devices must be dependable: it should be possible to send a message from one address to another without a failure. There are two ways of assessing system dependability: how system acts on average (reliability) and what can happen in the worst case, especially, if some entity is interested in bringing the system down (safety, security). Internet is the rst and probably the only candidate for connection implementation. It provides functionality to send messages between nodes (computing devices) having IP addresses. Internet can be viewed as a collection of interconnected local networks (segments) and consists of the following basic components which are built one on upon another: Physical link layer Physical devices providing local packet (message of limited size) sending functionality within one segment, which may have its own address system (OSI 5 physical and link layers.) Network layer Provides means of sending packets between any IP addresses. Special computing devices called routers are used to join segments and nd suitable path for each packet. Packet sender does not get trustworthy information about whether packet has reached its destination. Nothing is done if packet sending fails. Packets can be received in an order different from the order of sending. (OSI network layer.) Transport layer Provides means of establishing virtual connection between any two IP addresses, where messages of any size can be sent in both directions. Messages are split into packets, which are sent separately. Receiver is supposed to send acknowledgement about receiving each packet. Packets that do not reach destination (which are not acknowledged by the receiver) are resent. As a result, sender has a good evidence of whether message has reached the target. Also the order of messages (in one direction) is guaranteed to remain the same. (OSI transport layer) Domain Name System (DNS) IP addresses are numeric and hard to memorize. To solve this problem, each Internet node can be assigned one or more symbolic name, which is easier to remember. DNS provides a service of mapping symbolic names to IP addresses. DNS service is implemented as a worldwide distributed hierarchical set of computers, each of which keeps a part of this information and is supposed to know where to get the rest.
OSI (Open System Interconnect) reference model - an ISO standard dening seven layers of any network implementation. In practice nobody follows it precisely, but it is a good reference model. See [OSI].
5
40
Applications Networked applications making use of previously described components. On the average it can be said that reliability of Internet is acceptable, but in the presence of active adversary Internet is by no means dependable. Further, the following context will be assumed (although most of arguments apply to any situation): application server (implemented by a limited number of nodes) to which multiple clients (nodes) connect. When attacking in described context, the following direct aims can be set:
Prevent nodes from communicating with each other (clients from connecting to server). Modify transmitted information. Create illusion of communication with fake address (either DNS or IP).
The attacks themselves can be classied into the following groups: Damage, modify, fake, or overload components of the infrastructure: Links Routers DNS servers Nodes (application server, client) Although originally Internet was designed to resist nuclear attack, there should be multiple independent paths between any nodes without common links, at the current moment Internet has become mostly hierarchical with relatively low level of redundancy: both from the viewpoint segment connections and DNS. As a result, for most of node pairs it is possible to nd an intermediate link or server (router or DNS server), which when removed would disconnect these nodes one from another. Also, it should be relatively simple to disconnect specic node from most of the others by breaking link or intermediate server close enough to it. In addition, if attacker penetrated some link or intermediate server (or the attacker is indeed the operator of the component), he would be able to imitate communication with any IP address behind him. Finally, there exist effective methods to overload communications infrastructure, known as Denial of Service attacks (DoS), discussed later. It is important to point out, that most of components of the infrastructure belong to one specic entity, which must be trusted in order to rely on the connection. 41
Damage, modify, or fake data of: Local routing (within one segment) Routers DNS As a result packets would be sent to wrong destinations, or would not reach targets at all.
Attack protocols of any layer (link, network, transport) in order to break, modify, or fake connections.
The following weaknesses of the infrastructure are usually employed: Ability to gain physical or virtual access to infrastructure components, possibly with help of so called social engineering, which targets at human beings instead of surpassing technical or computational protection methods. Bugs in underlying software: operating system, networking components, protocol implementations, application software. Probably the most important of them is so called buffer overow error, where memory area right after the buffer is overwritten when writing to buffer too long data without proper size checking. Finally the basic protocols (ARP, ICMP, RIP, IP, TCP, DNS, etc) themselves have security vulnerabilities, which can be used against the aims of the infrastructure.
As an example, lets consider an attack of overloading infrastructure components called Denial of Service (DoS). The general idea is to send more garbage information than link or intermediate or application sever can process with an aim to consume some kind of resource: either bandwidth, computational power, or for instance memory. As a result legitimate users would not get through. Attacks can be (and usually are) initiated from multiple nodes, the total resource of which is higher than one of the victim (in this case attack is called Distributed DoS). Attacks can be application specic in such a way that attacker needs much less resource to generate the garbage than the victim to process it, for instance if victim tries to decrypt sent messages. As a result less resource is needed to bring the component down. One could ask why attacker would have more resource than the victim. On one hand states elections is an important event and so a lot of resource could be spent to bring it down. On the other hand attacker is always one step ahead of victims operators: rst they set up some resource, and then attacker has a chance 42
to gain enough resource to run DoS. Finally, as experience shows it is relatively simple to break into multiple Internet nodes and manipulate them externally. Also protocol vulnerabilities can be exploited to direct big trafc towards specic node. Although there is no complete cure for this problem, partial solutions and guidelines exist, which lead to attacker needing much more resource to launch the attack, see [Ero00]. In general, the following countermeasures could be taken to prevent described attacks:
Proper development process of protocols, software, and hardware. Proper infrastructure surrounding Internet components, including rened access control policy and attack prevention, detection, and response. Systematic redundancy of connections (bandwidth, independent paths) and nodes (fail-over and load-balancing clusters) Legal measures with big punishments for network disruption, which implies that it should be possible to trace back to the originators of attacks.
The rst two of described items are directly related to previously discussed problem of correctness of software and computational device. In general, only global measures can rise dependability of Internet substantially. At the same time the problem of identication can be completely solved within PKI with help of such cryptographic constructions as Dife-Hellman key exchange, message authentication codes, symmetric encryption, and so on. In short, provided two connection endpoints have certicates and are able to establish connection, it is possible to establish communication channel, which would provide: Authenticity It is clear which PKI identity sent received data. Integrity Data cannot be modied between connection endpoints. Condentiality Data cannot be wiretapped between connection endpoints. Non-repudiation Ability to prove that received data was sent by corresponding PKI identity. There exist implementations of this approach for network layer ([IPSec]), transport layer ([TLS]), and DNS system ([DNSEXT]). These protocols also help rising the quality of connection establishment, because non-repudiation helps proving that some component behaved incorrectly (e.g. provided incorrect information), but only post factum. Also, this measure will be effective only when most of Internet nodes start following them, which is not the case at the current method. 43
Besides that, network was required to be synchronous: there should exist upper bound on how long does it take from the moment message is sent until the message is received. With some simplications, one segment of a network (one Ethernet segment) can be considered synchronous. The main concern is that the speed of data transmission degrades as a function of network load, which opens doors to effective DoS attacks. This can be relieved to some extent by isolating the segment from the external world, but it does not help with internal adversaries. In the case of Internet it is not possible to give useful and sensible upper bound - Internet is completely asynchronous, whereas failing connection can be interpreted as lasting for especially long time. Although it was assumed that connection is synchronous, most of constructions described before are not sensitive to that, although they can freeze until the messages are delivered. The most remarkable exception is the bulletin board, which is often used as a replacement for the connection itself. The conclusion would be that existing connection is not dependable enough, multiple single points of failure exist, and it can be broken on purpose quite easily. Only global measures can improve situation substantially. At the same time, if connection can be established, communication authenticity, integrity, and even privacy and non-repudiation can be achieved.
2.2.6 PKI
PKI is a framework that facilitates establishing correspondence between identities and public keys, the basic functionality of which can be usually split into:
Roughly, PKI consists of three parts: Standards dening data formats and algorithms. Software supporting these standards (both for authorities and clients). Organizations functioning as certication authorities.
Probably the most popular PKI standard is X.509v3 ([X.509]), which denes formats for certicates and CRLs. There exists standard software for managing certication authority and corresponding client software, which provides the following functionality (list is not complete):
44
Usage of CRLs, which can be automatically retrieved from certicate distribution points (CDPs). Certicate verication. Encryption and decryption. Signing and signature verication.
Hierarchical (tree) topology of CAs is widely adopted. One example of both certication authority and client software is, for instance, Microsoft Windows 2000 platform that has built-in support for PKI. There also exist organizations that act as international certication authorities. They typically produce personal certicates of different level of trustworthiness (implied by different quality of identity verication), and also DNS name certicates (given to the owner of DNS name) and code publishers certicate, which can be used to sign code distributable (see later). One example of such authority is [VeriSign]. Still there exist multiple problems:
There is no universally accepted certication authority. There is no uniform, convenient, and accepted naming convention, which would enable assigning unique identicator to any person or organization in the world. Existing standards are rather loose (many features are optional) and as a result there exist interoperability problems between software from different vendors and also different CAs including: Not all vendors use hierarchical topology of CAs. Support for certicate revocation (especially automatic through distribution points) is implemented in many different incompatible ways.
There exists problem of how certicates of root CAs reach client computers. Currently they are pre-installed with operating system, which requires additional trust towards operating system vendor.
The conclusion would be that existing PKI is rather underdeveloped and further progress is needed. PKI requires that owners of the private keys should keep them in secret. For most of personal computer users this means keeping them on the le system (hopefully) protected by le system access control and a pass phrase. Also, the private 45
key is loaded into random access memory when using it. All this works provided computing devices untappability requirement is satised. A much better solution is to keep private key on a specialized tamperproof device (smart card), which would perform cryptographic operations on its own without revealing the private key to the computer. As it was already mentioned, smart cards have their own security concerns and also smartcard readers are not widely adopted yet. In the context of e-voting another problem arises: why would some state (organizer of the elections) trust root CA, which is located in South Africa or USA. A solution would be to have local intermediate certication authority, which would function within that state. Only certicates, which are under that authority would be allowed to be involved in the elections and at the same time interoperability with the external world would be ensured. Besides conventional PKI functionality, infrastructure for code deployment to computing devices is desirable:
Software distributable may carry signature from software publisher and also from external parties, who certify that this software is correct with respect to specied purpose. Certiers should be able to assign different levels of trust towards software. Operating system should provide ways to dene policy of whether to install (run) code distributable based on its signatures. The simplest way is to ask user every time some distributable is installed (run), which would become annoying very fast (especially in the case when code is downloaded automatically from the Internet) and would lead to situation where user would always say yes without thinking. Its a separate research problem how would such policy look like and who would create and enter it (it is naive to expect end-users to do that correctly).
It is reasonable to expect such framework to be (an integral) part of operating system. Contemporary code signing is rather underdeveloped: for instance Microsofts Authenticode (code signing framework on Windows platform) enables software publisher to sign the code and assert in this way that software is safe (whatever it means), but there are no means to state correctness with respect to some purpose (at best one can add free-form string) and also there can be only one signature. In short, publishers identity is bound to the code without any direct legal implications. Forthcoming Microsofts .NET platform ([.NET], now in beta) promises to provide more rened policy engine upon Authenticode. Naturally, other vendors have implemented similar constructions in their platforms, for instance in Java 2. Still they all appear to lack support for multiple signatures
46
and expression of trust with respect to specied purpose - i.e. they do not support software certication.
2.2.7 Time-stamping
The need for time-stamping was identied at least years ago. For a long time there were either relatively inefcient solutions (for relative time-stamping) or solutions that required to have common unconditionally trusted third party (for absolute time-stamping). Quite recently in 1998 a practical and efcient solution for relative time-stamping has been developed (see [BLLV98]). The solution is still single-authority, but it is not possible to cheat undetectably, although authority can ignore someone based on his identity. Also consequently the dependability and scalability of the service might not be sufcient for every application. Time certicate processing requires in general communication with the TSA. The topic of ongoing research is implementing time-stamping service, which works with respect to threshold trust. A compendium of information on this subject can be found at the timestamping project Cuculus home page [Cuculus]. As time-stamping is relevant to e-voting mostly in the context of the bulletin board I shall return to this topic when discussing its design options.
2.2.8 Summary
Existing framework of hardware, software, network, and PKI has been growing evolutionally during last - years. It is generally agreed that contemporary infrastructure is not dependable enough, although at the same time it is considered good enough to access Internet banks and shops. Many of the existing problems can be circumvented if framework is rebuilt carefully from scratch with specic requirements in mind. Special attention should be paid to the basic constituents of the framework and people or organizations who are responsible for them:
Hardware manufacturing and deployment. Software platform development and deployment (one can use code-signing only after one has installed software, which supports code-signing). Network components. Certication authorities.
Most of information in this section has been taken from Internet and most of the resources are not self-contained enough to cite them, so I cite only one most relevant: [Rub00]. 47
48
least years. Good overviews of theoretical meta facts can be found in [Kest95] and [Fi00], whereas [Fi85] is the paper where some of them were proven rst. Relevant keywords are distributed consensus, Byzantine agreement, reliable broadcast. Quite a lot of practical work has been done by Michael Reiter (see [Reiter]). In the rest of this section I will rst consider different implementation options for synchronous and asynchronous cases and then try to formulate practical solutions to the bulletin board problem from the viewpoint of organizing e-voting.
Use any form of time-stamping to prove ordering of the messages. When sending the next message, it should incorporate a hash of the previous message. This requires sender to remember (or retrieve) the last message sent so far. If message ordering is up to the sender (the only entity interested in ordering the messages is the sender himself), message could just incorporate a sequence number (which requires maintaining a counter) or even the current value of senders clock.
Based on this it is sensible to concentrate on implementing reliable store, out of which single-writer bulletin board can be created. The second idea is related to absolute time-stamping: Assume that we have numbers in ascending order out of which at most are assumed to be incorrect, but we do not know which. In this case belongs to the interval starting with the smallest and ending with the biggest correct number in the sequence. The reasoning is quite simple: if is correct, then implication is trivial, if is incorrect then it is surrounded in the sequence by correct numbers. This nishes the proof. This can be applied in the following situations:
Assume that there are time-stamping servers which append their current time to every sent message, sign it, and send back. Now if one accesses servers, it can get a trustworthy absolute time certicate as long as at most servers are malicious (or have wrong clocks).
Implement reliable write-once multiple read register Maintain an array of such registers
In order to send a new message, nd out the highest index of lled register and then write to register new signed message containing an index and a hash of .
Ideas for implemeting such register by multiple authorities with threshold trust can be found in [MR98], section 6. In order to prove time-outs (i.e. that someone didnt write a message during some period of time), time-stamping should be employed. Previously described out of approach could be applied here. Note that bare time-stamping by the writer does not solve the problem: writer could rst ask time certicate, wait for unlimited period of time and only then send the message. This implies that time-stamping must be performed by the authorities implementing the service. Register implemented with such approach has the following properties:
In order to write one message, this message (or at least its digest) must be transmitted times over the network. This limits throughput of the system quite a lot. Quite large cryptographical overhead is needed to perform authenticated communication. The following limitation holds , which implies or be malicious).
These limitations make specically this construction quite useless. I did not have time to make a big investigation and so cannot tell if there exist more practical constructions for synchronous networks. 50
In order to write one message, this message (or at least its digest) must be transmitted times over the network. This limits throughput of the system quite a lot. Quite large cryptographical overhead is needed to perform authenticated communication. System can survive only fragmentation of less than group members. System freezes if, for instance, group is split due to network failure into two equal parts. Group of correct authorities can melt down to any size, for instance less than . As group membership change is relatively expensive operation, an attack could be mounted where authority would periodically pretend to be unresponsive, get thrown out of the group, then again become responsive, get invited into the group, and so on. In order to rise efciency, messages are broadcasted by one selected (for instance with the smallest ID) group member and if he fails, he is voted out. For this reason it makes sense to concentrate attacks on group member with the smallest index. 51
Quite many issues are left open or need further adaptation to the needs of the bulletin board.
For these reasons this construction seems to be rather impractical for use in the wild Internet, but it might be of interest on more protected and synchronous local network. I have to acknowledge that these papers are rather complex, many details are left open, so there is always a chance that I just did not quite understand them.
Setup phase of multiple authority EVS and also threshold key generation (I will refer to them as setup phase further). Collecting voters ballots (I will call it collecting phase further). Result computation in multiple authority EVS (I will call it computing phase further).
Despite previously described semi-theoretical solutions it appears to me that practical hack is more appropriate in the current situation. The setup phase requires a high quality of bulletin board service (atomicity), but it can be restarted without any loss and also informational throughput is quite low. This phase could be performed on synchronous network. For this reason I would suggest to implement bulletin board by a single authority, which should broadcast signed messages to all participants. In the end each participant would sign the transcript of messages delivered to it and this phase would be considered complete if no participant objects that he did not receive all messages and signed transcripts match. Setup phase outcome should be made available to voters with help of administrative methods. Another option it to try to adapt Rampart, although it is questionable if it justies itself. Collecting phase must be performed at asynchronous network, the amounts of data transmitted are measured in gigabytes. Voter should be sure that if he gets conrmation that operation succeeded, then his ballot will be counted. At the same time he might not always get this conrmation because network connection might go down exactly at the moment when bulletin board sends the last conrmation message to the voter. Ordering of sent ballots is up to voter. At the same time this service cannot be restarted - it must function without failures. While collecting votes, it is sufcient to be write-only. For this reason I would suggest to have 52
independent authorities, which would collect, sent ballots. In order to submit a ballot voter would have to contact servers and send his ballot there. In the end ballot lists would be signed by the authorities, written to durable media, and physically transported to the place where computing phase takes place. Ordering of voters sent ballots could be ensured by time-stamping or even by including voters computer time into the ballot (probably not so good idea). Computing phase does not actually require the bulletin board. All that is needed is to collect self-verifying output of all authorities. This can be ensured with administrative measures.
Entry Server
Voter(s)
Organizer
Voting BB
EVS
Observer (s)
Consumer(s)
Archive
54
Voting client software to use. Election information. Voting bulletin boards addresses and public key certicates. Observers public key certicates. EVS information, generated at the setup phase.
Revocation Information Management Subsystem that helps creating lists of voter identities that should not be counted. Archive A permanent store, where all election information is saved after election is over. Thick line on the gure signies boundary between managed and external world. The process according to which system is supposed to act is depicted on Figure 2.5. The following steps are present there: Selecting Observers The process starts with selecting observers for specic elections. Preparing Election Information Election information is prepared. Once information is ready, it is signed by the organizer. Preparing Bulletin Boards Multiple bulletin board servers are setup, possibly at different geographic locations. Some of observers are supposed to observe these servers. Servers public key certicates are collected. EVS Setup Electronic voting scheme setup phase is executed resulting in some public information that must be made available to voters. Populating Entry Server Entry server is populated with all relevant data. The integrity of all information is assured by organizers signatures. Initiating Voting After that voting can be initiated by activating bulletin board servers. Voters Voting Then for some period of time voters are given an opportunity to vote. This is done so:
Voter contacts the entry server and downloads voting software. Voting software is deployed at voters computer. Signatures on the software distributable are veried. 55
Selecting Observer(s)
Preparing BB(s)
EVS Setup
Initiating Voting
Voters Voting
EVS Shutdown
Voting software contacts entry server again and checks that voter is eligible to vote, retrieves voters ballot type, bulletin board addresses and certicates, and EVS setup information. After that ballot is presented to the voter, where he can choose one option. Finally he can cast his ballot (or cancel the activity). Voting software forms voters ballot according to EVS algorithms and setup information. After that the ballot is signed with voters private key. Finally bulletin board servers are contacted and the ballot is submitted to them. There should be lower limit on how many servers must be successfully contacted. Voter is notied of success or failure.
Stopping Voting Finally voting is stopped by closing bulletin board servers. Preparing Voter Revocation Information After election is over, voter revocation information is prepared. Once information is ready, it is signed by the organizer. Computing Election Result Election result is computed with help of EVS. This will be explained later in a separate subsection. Observers Approving Election Result Observers are given opportunity to verify correctness of EVS result. After that they are given a chance to sign the result to express that they are satised with everything. Archiving Election Data Once EVS result is approved by sufcient number of observers, election data can be stored into permanent archive, where it is accessible to the consumers. EVS Shutdown Finally, electronic voting scheme is shut down. In particular, this means destroying secret information generated during EVS setup phase that may compromise election security if it leaks. It might be needed to postpone this phase for a reasonable amount of time.
57
Entry Server
Voting BB(s)
EVS
EVS Output
Observer(s)
Signature(s)
Full Report
Archive
58
Ballot(s)
EVS Output
Observer Signature(s)
At rst EVS takes as input raw ballot lists from the bulletin board servers (which are additionally signed by the organizer), list of revoked voters, and also the contents of entry server (why is it needed will become apparent later) and produces election result. After that observers are verifying and signing EVS output. Now all data is signed by the organizer once more and nally time-stamped. This forms complete election report, which can be stored in archive.
Figure 2.7 depicts components of election data and their interrelationships (here a dashed arrow from A to B means that A includes a hash of B):
EVS output is supposed to contain hashes of raw ballot lists, lists of revoked voters, and entry server contents, which were used in computation. Voters client software is supposed to include in the ballot hashes of parts of election data that were used. Observers signatures include hash of EVS output by denition. Finally full report contains hashes of observers signatures.
59
Setting up Infrastructure
Conducting Elections
Maintaining Archive
Figure 2.8: Meta Process of the Design Pattern. Such approach helps gluing different parts of election data together: none of the components can be modied after the full report is complete. In addition such structure commits organizer to some decisions: as bulletin board servers and observers certicates are present at the entry server, it can be devised at once which bulletin board server outputs were used and which observers have signed the result and which have not. Also such approach guarantees that all ballots were formed out of the same entry server contents. Finally, after time-stamping, organizers public key can be revoked without any harm.
60
Systems lifetime starts with setting up the infrastructure. After that an iterative process of organizers public key threshold generation, registering it at PKI, conducting some number of elections, and nally organizers public key revocation and private key share destruction. Threshold key generation should be performed using bulletin board constructions described in section 2.3. It is sensible to keep private key shares on smartcards. In parallel with that, data archive should be maintained.
Ballot list creator takes as input raw ballot lists, list of revoked voters, and entry server contents and generates a list of ballots, which should be counted. Signatures of ballots must be veried and removed. As a result eligible ballot list does not bear any direct links to voters identity. This operation is deterministic and could be duplicated on multiple computers to ensure that list is formed correctly. Finally this list should be signed by the organizer. Result computer takes as input eligible ballot list and the private key shares. Using them, it is possible to decrypt each ballot separately and compute election result. In the end election result is signed using the private key shares. This operation is the weakest point of security: single decrypted ballots should not leak outside the computer. This implies that the computer should be as secure as possible. Also, it shouldnt have any other means of communication with external world (hard disk, network) besides the one through which ballot list is entered and computed result is returned. Even
61
Result Computer
EVS Output
62
Ballot(s)
Figure 2.10: Election Data in Single Authority EVS. better, this computer could be implemented as a specialized device. This operation is deterministic and so it could be also duplicated on many computers to ensure that result is computed correctly, although computation correctness cannot be veried directly. Signature generation is not deterministic, so signatures should not be compared, but they can be veried directly. Figure 2.10 presents data structures involved and their relationships. It is very similar to the one in subsection 2.4.1 and does not need further explanation.
Each authority takes as input raw ballot lists, list of revoked voters, and entry server contents and produces signed authoritys output. After that election result can be computed based on authoritys outputs.
Figure 2.12 again presents data structures involved and their relationships. It is very similar to the one in subsection 2.4.1 and does not need further explanation. 63
Authorities
Authority's Outputs
Result Computer
EVS Output
Ballot(s)
2.4.5 Conclusions
The main conclusion of this section would be that it is possible to build a generic e-voting service design pattern, into which both single authority and multiple authority electronic voting schemes can be easily plugged in in a similar way. The advantage of single authority EVS is its speed, the disadvantages are its lack of veriability and presence of single point of failure, which must be secured a lot. The advantage of multiple authority EVS is its complete veriability and lack of single points of failure - everything is done with respect to threshold trust. The disadvantage is its relative ineffectiveness. At the same time probably in many contexts both approaches have comparable and sufcient security. Multiple authority EVS becomes more secure with much bigger investment of resource.
65
Chapter 3
Summary
In this work the following tasks have been accomplished:
Formulated detailed requirements for e-voting system. Described theoretical basis for e-voting from the viewpoint of software engineering. Analysed feasibility of generic security framework, upon which e-voting system could be built. Investigated options for designing bulletin board service. Proposed general design pattern for implementing e-voting system. Described how theoretical single and multiple authority electronic voting schemes t into this pattern.
The main conclusions would be: The only serious security threat is imposed by existing framework of personal computers and Internet. Unfortunately it is enough to prevent implementing secure e-voting in near future. Although theoretically single authority electronic voting scheme is much weaker than its multiple authority counterpart, in practice they are of comparable quality having different advantages and disadvantages.
66
Further development of PKI and code signing. Further research in the eld of time-stamping and bulletin board construction. Renement of e-voting system design. Currently the design pattern is very high-level and lots of technical details are missing. Also the whole process of e-voting system maintenance should much more rened. Evaluate precise nancial and computational resources needed to create and maintain e-voting system.
67
68
Bibliography
[Ben87] J. Benaloh. Veriable Secret-Ballot Elections. Ph.D. Thesis presented at Yale University, New Haven, CT (Dec. 1987). (Available as TR-561, Yale University, Department of Computer Science, New Haven, CT (Sep. 1987).) Ahto Buldas, Peeter Laud, Helger Lipmaa, Jan Villemson TimeStamping with Binary Linking Schemes. In Hugo Krawczyk, editor, Advances in Cryptology - CRYPTO 98, volume 1462 of Lecture Notes in Computer Science, pages 486-501. Springer-Verlag, 1998. http://www.tml.hut./ helger/papers/bllv98/ [BT94] J. Benaloh and D. Tuinstra. Receipt-Free Secret-Ballot Elections (extended abstract). In Proc. 26th ACM Symposium on the Theory of Computing (STOC), pp. 544-553. ACM, 1994. California Internet Voting Task Force http://www.ss.ca.gov/executive/ivote/ [CFSY96] R. Cramer, M. Franklin, B. Schoenmakers, M. Yung. Multiauthority secret ballot elections with linear work. In Advances in Cryptology - CRYPTO96, volume 1070 of Lecture Notes in Computer Science, pages 72-83, Berlin, 1996. Springer-Verlag. R. Cramer, R. Gennaro, B. Schoenmakers. A Secure and Optimally Efcient Multi-Authority Election Scheme. European Transactions of Telecommunications, 8:481-489, 1997. D. Chaum. Untraceable Electronic Mail, Return Addresses, and Digital Pseudonyms. Communications of the ACM, 24(2):84-86, 1981.
[BLLV98]
[CIVTF]
[CGS97]
[Cha81]
69
[Cuculus]
[DNSEXT]
[Election.com] http://www.election.com/ [Ero00] Pasi Eronen. Denial of service in public key protocols. In Proceedings of the Helsinki University of Technology Seminar on Network Security (Fall 2000), to appear in TML laboratory report series, December 2000. http://www.cs.hut./ peronen/publications/ [Fi85] M. Fischer, N. Lynch, and M. Paterson. Impossibility of distributed consensus with one faulty process. Journal of the ACM, 32(2), pp. 374-382, 1985. Michael J. Fischer The Consensus Problem in Unreliable Distributed Systems (A Brief Survey). Proc. Int. Conf. on Foundations of Computations Theory, 2000. http://citeseer.nj.nec.com/326938.html [LipMy01] Helger Lipmaa, Oleg M rk. E-valimiste realiseerimisv imaluste u o anal us. In Estonian. u http://www.just.ee/oldjust/JM/lipmaamyrk.pdf [IPI] National Workshop on Internet Voting. Conducted by Internet Policy Institute. Sponsored by the USA National Science Foundation. http://www.netvoting.org/ [IPSec] IETF IP Security Protocol (ipsec) Working Group http://ietf.org/html.charters/ipsec-charter.html [Kest95] Lawrence Kesteloot. Fault-Tolerant Distributed Consensus. 1995. http://tofu.alt.net/ lk/290.paper/290.paper.html [MR98] D. Malkhi and M. Reiter. Byzantine quorum systems. Distributed Computing 11(4):203-213, 1998. A preliminary version appears in Proceedings of the 29th ACM Symposium on Theory of Computing, May 1997. http://www.bell-labs.com/user/reiter/#Quorums 70
[Fi00]
[Myr00]
Oleg M rk. Electronic Voting Schemes. Semester work. u http://www.math.ut.ee/ olegm/my papers.english.html
[.NET]
[NTP]
[OSI]
[Rei94]
M. K. Reiter. Secure agreement protocols: Reliable and atomic group multicast in Rampart. In Proceedings of the 2nd ACM Conference on Computer and Communication Security, pages 68-80, November 1994. http://www.bell-labs.com/user/reiter/#Rampart
[Rei95]
M. K. Reiter. The Rampart toolkit for building high-integrity services. In Theory and Practice in Distributed Systems (Lecture Notes in Computer Science 938), pages 99-110, Springer-Verlag, 1995. http://www.bell-labs.com/user/reiter/#Rampart
[Rei96]
M. K. Reiter. A secure group membership protocol. IEEE Transactions on Software Engineering 22(1):31-42, January 1996. http://www.bell-labs.com/user/reiter/#Rampart
[Reiter] [Rub00]
Michael Reiters Homepage. http://www.bell-labs.com/user/reiter/ Avi Rubin. Security Considerations for Remote Electronic Voting over the Internet. http://avirubin.com/e-voting.security.html
[Sch99]
B. Schneier and A. Shostack. Breaking Up Is Hard to Do: Modelling Security Threats for Smart Cards. USENIX Workshop on Smart Card Technology, USENIX Press, 1999, pp. 175-185. http://www.counterpane.com/smart-card-threats.html
[TLS]
[VeriSign]
http://www.verisign.com
[VoteHere.net] http://votehere.net/ [VIP] Voting Integrity Project http://www.voting-integrity.org/projects/votingtechnology/ [X.509] IETF Public-Key Infrastructure (X.509) Working Group http://ietf.org/html.charters/pkix-charter.html
72