Anda di halaman 1dari 322

William James Yeager and Rita Yu Chen

Second Generation Peer-to-Peer


Engineering
This page is intentionally blank
Copyright © 2004, 2008 William James Yeager and Rita Yu Chen
Forward

We are sure that many of you wonder why a book completed in 2004 is finally being
published in 2009. To clarify this what follows is a short history of the writing of
“Second Generation P2P Engineering.”

An editor from Addison-Wesley contacted us in the spring of 2002 after having read a
few of our publications on the Internet. As the editor roughly described it at the time, “I
was mining the Internet for possible authors in the P2P field and found you two.” We
then completed the standard Addison-Wesley book proposal form that was accepted after
both Addison-Wesley internal as well as external peer reviews. We began to write the
book that summer. During the writing process that was finished in early 2004 all the
completed chapters underwent peer review. These reviews were without exception
excellent. So, why wasn’t the book published?

The final reason given to us by our third Addison-Wesley editor was that with respect to
the technical book area, it was becoming more and more difficult to capture sales and that
they did not believe our book would sell 30,000 copies. We consequently formally
terminated our contract, thus reverting all rights granted to Addison-Wesley for the
manuscript and its publication back to us.

As one can imagine, we were very disappointed given the effort it took to complete
“Second Generation P2P Engineering.” But, it is important to note that we do not fault
Addison-Wesley, and understood their decision. The business climate was difficult at that
time, and for us it was a case of bad timing.

Our initial editor also felt badly about this and put us in contact with a few other editors.
They all had the same take on the technical book market, and as a result, our book has
remained in a digital archive since that time. We remain friends with our initial editor.

Finally and happily, Dr. Rita Chen discovered scribd.com earlier this year. It took us a
few months of review to decide if a web-based publication was appropriate for our book
after a five-year hiatus on a digital bookshelf, and have concluded that it is suitable for
publication.

It’s always interesting to reread what one has written and to be pleasantly surprised at
first with how well the book is written, and its approachability, and second that a great
deal of it is still quite relevant today. The book is written so that the introductory chapters
are readable by non-technical persons, and that the introductions to the more technical
chapters are similarly done.

We hope that those of you, who scan, read parts or attempt to absorb most or all of the
technical design details we present feel the same way.

William Yeager, Rita Yu Chen November 2009


To the memory of Gene Kan, 
a dear friend and colleague. 
Preface

Chapter 1 Introduction
1.1 The .COM Supernova ...................................................................................... 1-1
1.2 The Changing Internet Marketplace ................................................................. 1-4
1.3 What Is a P2P Network? .................................................................................. 1-5
1.4 Why P2P Now? ................................................................................................ 1-8
1.5 Basic P2P Rights ............................................................................................ 1-12
1.6 Contents of This Book and P2P PSEUDO-Programming Language ............. 1-16

Chapter 2 P2P Is the End-Game of Moore’s Law


2.1 The 1980’s ........................................................................................................ 2-2
2.2 The 1990’s - The Decade of the Information Highway .................................... 2-12
2.3 The New Millennium ........................................................................................ 2-14

Chapter 3 Components of the P2P Model


3.1 The P2P Document Space ................................................................................ 3-2
3.2 Peer Identity ...................................................................................................... 3-9
3.3 The Virtual P2P Network ................................................................................. 3-23
3.4 Scope and Search - With Whom Do I Wish to Communicate? ........................ 3-31
3.5 How to Connect ............................................................................................... 3-37

Chapter 4 Basic Behavior of Peers on a P2P System


4.1 The P2P Overlay Network Protocols ................................................................. 4-2
4.2 P2P Mediator Protocols ................................................................................... 4-22
4.3 P2P PeerNode-to-PeerNode Protocols ........................................................... 4-82
4.4 P2P Connected Community Protocols ............................................................ 4-84
4.5 P2P Hashing Algorithms in the Context of Our P2P Overlay Network .............4-87
4.6 More 4PL Examples ....................................................................................... 4-92

Chapter 5 Security in a P2P System


5.1 Internet Security ................................................................................................ 5-2
5.2.Reputation Based Trust in P2P Systems ......................................................... 5-12
5.3 More Building Blocks for P2P Security ............................................................ 5-18
5.4 Digital Rights Management on P2P Overlay Networks ................................... 5-44

Chapter 6 Wireless and P2P


6.1 Why P2P in the Wireless Device Space? .......................................................... 6-2

1-0
6.2 Introduction to the Wireless Infrastructures ....................................................... 6-4
6.3 The Telco Operator/Carrier ............................................................................. 6-19
6.4 Fixed Wireless Networks ................................................................................. 6-21
6.5 Mobile Ad-hoc ................................................................................................. 6-25
6.6 P2P for Wireless .............................................................................................. 6-28

Chapter 7 Applications, Administration, and Monitoring


7.1 Java Mobile Agent Technology .......................................................................... 7-2
7.2 Implementation of Java Mobile Agents on the P2P Overlay Network ................. 7-4
7.3 The Management of the P2P Overlay Network with Java Mobile Agents ......... 7-23
7.4 Email on the P2P Overlay Network .................................................................. 7-27

Appendix

Reference

1-1
Preface:

“Peer-to-Peer Engineering” has as one of its goals to provide the necessary technical understand-
ing to enable software architects and engineers to build P2P software products that will have mar-
ketplace longevity. One cannot begin to list the number of P2P applications and stay current for
one month. Each application may use one of several P2P “techniques,” and these applications for
most the part do not communicate with one another. At the same time the number of academic
research papers on P2P is rapidly increasing. Unlike other publication in the related topic, this
book does not just summarize and list the existing P2P implementations and academic research
results, but also gives a complete and practical design of a P2P Overlay Network.

This approach leads to a much deeper look at the underlying fundamentals and research. What are
the computer-scientific “fundamentals” of P2P? This book answers this question with engineering
solutions. From P2P network components to P2P communication protocols, from security to
wireless, this book covers each P2P building block with detailed design specifications, as well as,
down-to-the-earth implementation suggestions. Most of these are authors’ original ideas and
intend to solve some of the difficulties faced by P2P software engineers. This book also covers the
field with enough breath to enable readers to make comparisons of the existing P2P implementa-
tions to show their strengths and weaknesses.

To these ends, this book will first give the reader an historical perspective so that (s)he can under-
stand that P2P is not new, and has its technical roots in the 1980’s. The reader will see that it is the
current market forces along with technological advances that have caused this latent technology to
resurface. In this perspective there will be case studies and examples of algorithms nearly twenty
years old that have “sat on the shelf” waiting for the such events to motivative their reevaluation.

Then to make the P2P Overlay Network behavior understandable, this book defines: First, the set
of components that comprise a P2P network, along with the XML documents to describe each
component; Then, the protocols to control the interaction of the components and their associated
documents. To express the behavior of the components and their communication protocols, this
book also introduces a P2P Pseudo Programming Language (4PL).

To make a P2P network design success, a very important building block is Security. Unlike other
books on the subject that discuss or allude to security protocols and the various algorithm names
like RSA, RC4, SHA-1, etc., but do not give the technical analysis necessary to implement a secu-
rity solution, this book includes an in depth analysis of security requirements for P2P networks,
and provides the solution for each requirement. Furthermore, this book integrates these security
solutions into other building blocks and this makes the P2P Overlay Network design complete and
secure.

Also, the Internet is expanding to include small devices such as light switches, automobile GPS
systems, mobile telephones, home appliances, etc. These devices are and will be an integral part
of P2P computing and we have particular expertise in this area and will cover them in breadth and
depth. It is important to note that it is now claimed that 30% of the Chinese will have Internet
access by 2005, and this will be wireless. P2P will play an extremely important role in bringing
services to these 300+ million people. The book will prepare software engineers for this eventual-
ity and will describe how to write a P2P application for mobile devices.

Finally, one the top of the P2P Overlay Network, this book discusses the P2P applications in the
wide range of concept. It takes advantage of Java Mobile Agent to build P2P applications, and
also P2P network management tools. The code samples are given on these topics.

In summary, this book is not going to be another P2P encyclopedia. Rather, it will help readers to

[1] Acquire an historical understanding of the evolution of P2P and why now is the right time
for technology to be mainstream.

[2] Improve her/his technical understanding of the component and behavior of P2P networks.

[3] Prepare and start her/his the first or next P2P Overlay Network design and implementation,
and make them success.

[4] Avoid security holes during design and/or implementation.

[5] Understand how to write P2P applications on the top of the P2P Overlay Network.

[6] See the importance of writing software that can address the needs of the expanding P2P
device space.

[7] Learn how wireless networks work, interplay with wired networks, and will effect the
future of P2P.

The audience for this book is software architects and engineers designing and writing P2P appli-
cations; managers who wish to evaluate the resource requirements for P2P product development;
and high-level industry leaders wanting to make sound P2P investment decisions. This book is
also appropriate for any computer network related courses, from general networking classes to
distributed networking classes. One of the features of this book which makes it ideal for class use
is that it covers not only industry trends but also the state of academic research including algo-
rithms and projects.

Rita Y Chen, William J. Yeager


Chapter 1 Introduction

Although this book is technical in nature, it’s arrival is not independent of exist-
ing economic trends and technical market forces. Thus, up front, before diving
headlong into the engineering chapters, let’s take a closer look at the signifi-
cant events that have motivated the necessity to write a book on P2P engi-
neering at this time.

1.1 The .COM Supernova


The year 2000 was amazing. The beginning of the new millennium began with fireworks displays
viewed world wide on television as successive capital cities greeted the stroke of midnight. The high-
tech economy responded likewise, it too expanding world wide to reach new limits, as a .COM frenzy
took hold and got to the point where a clever new domain name could generate a few million dollars in
startup, venture capital. This was clearly economic madness and it had to end sooner or later. It did,
and sooner, when the stock market bubble bursted like a sudden .COM supernova that sent hundreds of
these Internet companies into bankruptcy. Since that time most of those .COMS that drove the NAS-
DAQ to new heights, above 4000 points in the middle of 2000, have practically all vanished from the
planet. Here it is historically important to note that the .COM stock market “bubble” bursting is not a
new phenomena. The same bubble effect occurred concurrently with the inventions of electricity in
1902, the radio in 1929, and the promise of the mainframe computer in 1966. Initial expectations drove
stocks’ P/E ratios to unrealistic heights only to see them abruptly come tumbling down again. The new
technology in each case was sound and produced long-term revenue, the investors’ enthusiasm and the
market’s response was not.
One can ask first, why did the .COMS come into being, were they and/or their supporting technologies
legitimate, and from this point of view, try to understand the above events, and second, given the void
that their disappearance has left, what, if anything can be done to put in their place the products and
services that will result in long-term, stable economic growth in the high-tech sector? The recent col-
lapse has shown this sector to be the heart of the cardio-vascular system of the new millennium econ-
omy. There are those who deny the latter statement, and look to the old gold standard like industries in
time of economic decline, but the new economy is as dependent on the high-tech Internet technologies
as the automobile industry is on our highway system.
The birthrate of .COMs was directly tied to the ease with which one can deploy Internet, client/server
web based architectures, and this drove the .COM expansion further than it should have gone over the
period from the beginning of 1999 through the Summer of 2000. While this ease of deployment is cer-
tainly a big plus for the underlying technologies, the resulting digital marketplace could not provide
the revenue streams necessary to either support the already existing .COMs or sustain the growth being
experienced. One wonders about their business plans since office space was rented, employees hired,
software written, and hardware purchased and connected to the Internet Information Highway, all with
speculative, venture capital investments. Venture capital support was supposed to be based on a sound
business plan as well as new ideas based on Internet technology. Here it is extremely important to note
that the technology is not at fault. The technology is fantastic. Rather, it was the desire to make a quick
buck today without worrying about tomorrow’s paycheck that brought the economy to its knees. One
senses that a sound business plan was irrelevant, and that the hopes for a quick, and highly inflated IPO
was the goal. When the .COMS went supernova, and their hardware capital was reabsorbed into the
economy with bankruptcy sales, this brutalized companies that depended on hardware revenue. Simi-
larly, tens of thousands of very talented employees are now jobless. This can only be viewed as a
course in “business 101,” as “lessons learned” for the young entrepreneurs and their technical counter-
parts. For those who profited from this bubble implosion-explosion, and there are many, the young
entrepreneurs will be back, in force with their technical teams, but wiser and smarter, back to make
changes in the system that exploited their talent, energy and dreams.
On the technical side of things, again, the ease of deployment of the hundreds of .COMs is a proof of
concept of not only the web based, distributed computing model but also the high-tech design, manu-
facturing, and delivery cycle. High-tech is and will continue to be responsive across the board, and
clearly, the Internet will not go away either as a way of doing e-business, or as a means of enhancing
both personal communication and one’s day-to-day life. With respect to personal communication, a
beautiful thing happened during this time period. Strong partnerships and alliances were created
between nearly all aspects of the high-tech and telecommunications sectors because personal commu-
nication in all of its forms can be enhanced with Internet connectivity. Yes, there was a rush to add fea-

1-2
tures to mobile phones before the technology was mature, but the i-Mode, JavaTM experiment alone
proved the business and technical models are sound while the WAP experiment proved that without
uniform standards across this space, and responsive data networks, these services will be abandoned.
The former partnerships continue to flourish and the latter problem was realized mid-stream at the
WAP Forum and has been corrected. These are discussed in detail in Chapter 6.
As a consequence, business opportunities with personal communication, life-enhancing applications
will be a large part of the P2P economic model. We will point out throughout this book how P2P can
better address areas like these, and more easily solve the multiple device problems for which client/
server, web based solutions have not been adequate. In some cases a marriage of these latter technolo-
gies with P2P networks will be suitable, e. g., collaborative P2P applications in the enterprise, where
the documents produced are proprietary and need to be highly secured, regularly checkpointed, with
all transactions audited, and in others, pure, ad-hoc P2P networks will be the best solution. Here, one
might have the exchange of content, like family trip photos, between neighbors connected either on a
shared 802.11a/b network, or with DSL.
As the .COM rollout proceeded, limitations of the underlying network, its administration and services
were also revealed; SPAM continues to be an uncontrolled problem; bandwidth is not sufficient in the
“last mile;” Internet service to China, India and Africa is minimal to non-existent [Allen]; and denial-
of-service attacks, and security problems are a constant concern. These are discussed in section
1.3.1.1, and P2P engineering solutions, where applicable, are found throughout the book. For example,
security, denial-of-service attacks and SPAM are covered in Chapter 5.
In the final analysis, the .COM supernova was not technology gone wrong, but rather a business fail-
ure. Yes, we do have those unemployed, creative engineers, managers, and marketeers. Happily, cre-
ative beings tend not to be idle beings. Creative energy is restless energy. From this perspective, the
“supernova” is a positive event: Just like its astronomical counterpart which must obey the laws of
thermodynamics, where the released energy and matter self organizes itself to form new stars, one can
view the laid off, post .COM supernova employees as a huge pool of intellect, of creative energy that
will not remain dormant like a no longer active volcano, but rather, will regather in small meetings in
bars, cafes, homes and garages, to create business plans based on surprising and disruptive technolo-
gies, some which will appear to come out of “left field,” and others from a direct analysis of what is
required to advance Internet technology. And, these post .COM entrepreneurs will be not much older
but will be much wiser. As a result, the technologies they will introduce will be based on sound com-
puter science, an examination of the underlying reasons why failure of such huge proportions hap-
pened so abruptly, and thus, yield products with long term revenue as a goal rather than a short term
doubling of one’s stock portfolio. Life continues for these individuals, the dream is in place, is here to
stay, reshaping itself as necessary, and the .COM supernova is in the natural progression of things, a
necessary reorganization of a huge amount of energy that cannot be destroyed.

1-3
1.2 The Changing Internet Marketplace
Why does a marketplace change? Why are we still not all shopping in large farmer-like markets? Fun-
damental to these two questions is access to, the distribution and aggregation of the commodities being
sold. The Internet, digital or e-Commerce marketplace is no different. As we accelerated through the
events discussed in the previous section, rules for accessing, distributing and delivering digitally pur-
chased items were put in place. And, most of which was purchased on the Internet could not be deliv-
ered in the same manner. Many examples come to mind. Two typical ones are EBay and WebVan.
Also, many items of e-Commerce value that were not purchased were delivered extremely well on the
Internet. Here one is referring to Napster. What we are looking for is the right combination.
Access to these items was for the most part through a web-based, browser interface. If one needs to
search for a hotel at a particular price in a given region, then Internet access to this digital information
can be extremely tedious and time consuming. While the results are often gratifying, the means of
access can certainly be improved. If one wishes to purchase an orange, then it should not be necessary
to go to Florida. For the http GET mode of access, one almost always returns to the home site for fresh
data.
Napster showed us that one can much more effectively deliver digital content using hybrid P2P tech-
nology. Yes, Napster no longer exists but the technology it used is only getting better. There are legal
issues with respect to copyright protection, and so digital rights management software is being written
to protect digital ownership. Why? Because those who stopped Napster, i. e., the recording industry,
realize the huge potential of unloading the infrastructure cost for delivering mpeg to the users’ own
systems and networks. P2P is sure to become an essential part of the new digital marketplace because
there will be safegards put in place to both guarantee the payment for purchased, copyrighted material
as well as its P2P distribution. In [Saroiu02] it is pointed out that about 40-60% of the peers shared
only 5-20% of the total of shared Napster files, i. e., Napster was used as an highly centralized, client/
server system with very little incentive for sharing the wealth of music in the system. One can specu-
late that the resistance to sharing was directly correlated with the fear of getting caught for copyright
theft, and that a system of sharing incentives such as digital coupons leading towards free or low cost
content will be easy to put in place in a marketplace that discourages digital content theft.
Digital content extends far beyond mpeg and jpeg. There is an enormous marketplace for digital
games, and all kinds of software applications. P2P is the perfect way to distribute this content and
some form of legal sanity must prevail to permit it to become an accepted part of e-Commerce market-
place. Similarly, individuals can begin to market their home computing power to computing grids.
There are millions of people willing to contribute home computing cycles to SETI@Home [SETI] to
aid in the search for extra terrestrial life and that search in itself is sufficient incentive. This is a noble
cause and should be supported in this way. On the other hand, a business venture, PopularPower [Pop-
ularPower], which tried to make a business out of selling these computer cycles failed. This implies
that those companies that spend millions of dollars to purchase super computers were or are not will-
ing to spend less money for perhaps a more powerful, home based, computational grid. A firmly in

1-4
place, secured, P2P infrastructure can help provide the necessary motivation to make idle CPU cycles
an Internet commodity. This in turn can create a boom in personal computer sales, and this can make
even more cpu cycles available for the worthwhile non-profit grids like SETI@Home. The possibilities
are endless. One of the most important keys is to develop the technology necessary to assure that indi-
viduals’ and companies’ property rights are protected. The engineering side of this issue is discussed
in Chapter 4. The solution is neither to wage denial-of-service attacks against P2P networks, nor to
launch a legal attack against P2P startups hoping that the legal costs will drive them out of business
before the legal battles have finished. These approaches might give short term financial protection to
multimedia giants but will the stifle technical innovation required to bootstrap multimedia e-Com-
merce. They are motived by a near sighted vision that should be stopped in its tracks.
In order for a P2P to become a fundamental building block in the new digital marketplace, the digital
content exchanged needs to be aggregated closer to home. While powerful centralized servers are
essential for the financial management and initial distribution of this data, always “going to Florida” is
not a complete solution. Just like the freeways are filled with semi-trucks during off hours to deliver
the oranges to local markets, the same use of the information highway makes good “bandwidth sense.”
The consumer experience will be much improved if the data access is closer to home. Ultimately, the
equivalent of digital delis is needed. This is thoroughly discussed in Chapter 2.
Finally, as mentioned above, the web-based process of finding what you wish to purchase is tedious,
time consuming and must be streamlined. One would like to have a digital-aide with a template
describing a desired purchase that does the shopping. This together with digital yellow pages for
locally available purchases will create a very attractive e-Commerce marketplace. We discuss this thor-
oughly in Chapter 5 under the topic of Java Mobile Agents on P2P networks.

1.3 What Is a P2P Network?


What is P2P? That is the real question. Most people believe that P2P means the ability to violate copy-
right and exchange what amounts to billions of copyrighted mpeg or jpeg files free of charge. The
music and motion picture industries are strongly against P2P for this reason. We will discuss the his-
torical origins of P2P in Chapter 2, and this history makes no reference to the “Napster-like” defini-
tions of P2P. Rather, it will discuss the foundations of distributed, decentralized computing. One finds
also that the non-technically-initiated have also begun to equate decentralized computing with some
kind of dark force of the computing psyche. It almost comes down to the old battle of totally central-
ized versus decentralized governments. And, amusingly enough, for example, the United States and
European Union are organized somewhat like hybrid P2P networks in the definition we will propose
below. And, capitalism was born out of such an organization of political and legal balance of power.
Where does that leave the cabalistic opponents of P2P? We leave that to the reader to decide. But, this
opposition is at least partially neutralized by a lack of understanding of P2P, and thus a goal of this
book is to help shed some light on this P2P issue. Decentralized, distributed computing is a powerful

1-5
tool for organizing large collections of nodes on a network into cooperative communities, which in
term can cooperate with one another. Yes, the one possible instance of this paradigm is anarchy where
each member node of such an organization of computational resources can do what it wishes without
concern for others. At the opposite extreme is a dictatorship. An organization of network nodes that
leads to either anarchy or a dictatorship does not have a place in P2P networks as viewed from this
book’s perspective. Nearly everything in between does. And, certainly, we need to establish some rules
to prevent the violation of others’ rights whether they are members of society or nodes in a network.
Napster from this point of view was not P2P. Rather, is was a centralized system that encouraged non-
cooperation among the member nodes or subtle form of anarchy which is almost a self-contradiction.
Centralized because all mpeg distribution began with centralized registration and access to copyright
protected mpeg files from about 160 servers [Saroiu02]. And, anarchy-like behavior among the nodes
because once a node possessed a copy of a mpeg file the tendency was not to redistribute it, and thus,
ignore the underlying “share” the content model which is at the roots of P2P.
Clay Shirky gives us a litmus test for P2P in [Shirky00]:
1) Does it treat variable connectivity and temporary network addresses as the norm, and 2) does it
give the nodes at the edges of the network significant autonomy?”
While this is a test for P2P, there will be instances of P2P networks from this book’s point of view that
will treat fixed network addresses and 24x7 connectivity as the norm. Our goal is not to have a purist
litmus test that excludes a major segment of the computational network, but rather a test that is less
exclusive. To this end a collection of nodes forms a P2P overlay network or P2P network if the follow-
ing hold:
1) A preponderance of the nodes can communicate with one another, can run app-services
enabling them to each play the role of both a client and a server, and exhibit a willingness to participate
in the sharing of resources,
2) Nodes may be completely ad-hoc and autonomous, or use traditional, centralized, client/server
technology as necessary.
Here one notes the term overlay network. From this book’s point of view P2P networks are overlay
networks on top of the real network transports and physical networks themselves, as shown as Figure
1-1.

1-6
Overlay Network
node3 node5
node1 node6
node2 node4 node7

Tcp/Ip Tcp/Ip
Tcp/Ip on 802.11b
http http
NAT Firewall

Real Network
Figure 1-1. P2P Overlay Network
P2P means mutually supportive nodes on the one hand, and being able to use as much of the available
technology as is necessary on the other, to arrive at a network that behaves optimally. A P2P network
in an enterprise will be different than a P2P network in a neighborhood, and the two may or may not
communicate with one another. The former will in all probability be stable, and the later most likely
ad-hoc and unstable.
The lifetimes of network addresses and connectivity, as well as an autonomous node’s symbolic
“Edge” position in the Internet topology lay at the far end of a very broad P2P spectrum of possibilities
offered by the above definition. If one wishes P2P to be a success, then the engineering principles to
which it adheres, its domain, must be able to encompass, and find ways to both interact with and
improve current Internet, centralized client/server, based app-services. In fact, the appropriate vision is
to view the ultimate Internet as a single network of nodes for which P2P provides an underlying fabric
to help assure optimal, and thus, maximum service to each device limited only by the device’s inherit
shortcomings, and not by its symbolic position in the network. Yes, an ad-hoc, autonomous, self-orga-
nizing, network of unreliable nodes is inherently P2P. Yet, a highly available cluster of database sys-
tems supporting a brokerage firm can also be configured as a P2P network as can these systems’
clients. The members of such a cluster can be peers in a small P2P network using P2P communication
for the exchange of availability and fail-over information; the clients can also be members of the same
network to help both mediate network wide load balancing, and data checkpointing, as well as a mem-
ber of a client P2P network to share content, and suspend and resume tasks on one another’s systems.
In such a configured P2P network there may be centralized client/server relationships to, for example,
insure authenticated, authorized access, and this P2P network as well as the pure, ad-hoc P2P network
both satisfy the definition, both being points in the P2P spectrum. The application of the fundamentals
in this book will enable one to create such networks. But, standard, distributed client/server email and
database systems are not P2P even if the clients may keep data locally and can act as auto-servers

1-7
either to improve performance or at times of disconnection. These later client/server systems do not
communicate with one another as peers and adhere strictly to their roles as clients and servers. This
book does not discuss the fundamentals of these latter systems but will describe methods for morphing
them towards the P2P paradigm for better performance. Such morphed systems will certainly be
hybrid rather than pure P2P, and an extremely important step in the evolution of P2P technology.

client/server adhoc

Figure 1-2. The P2P Spectrum

The symbolic “Edge” of the network is really better viewed as pre-Columbian network terminology in
the sense that before Columbus, the western world believed the world was flat, and had an edge. When
Columbus looked for the edge of the world, he never found it, this fictional point of view was dropped,
and the possibilities for travel have been limitless ever since that time. If one is at any node in the Inter-
net, then there is not a sense of “I am at the network’s Edge.” One’s services may be limited because of
a slow or poor connection and this can happen anywhere in the Internet. It is much better to view each
node as located at the “Center” of the network, and then do what is possible to make all such “Centers”
equal. This is closer to the task P2P has set out for itself in this book.

1.4 Why P2P Now?


Internet e-Commerce will be as successful as the acceptance of, and thus, the willingness to both use
on a regular basis and pay for the applications and services (app-services) that the digital marketplace
offers. One of the reasons we had a .COM supernova was the consumers did not accept the app-ser-
vices offered by the .COM marketplace in the above sense. Some of the app-services were used some
of the time, and very few were used all of the time. Those very few are those that survived. The accep-
tance one envisions here means much more than, “Hmm... This is an interesting URL, maybe I’ll try it
someday.” Acceptance is expressed by users saying things like, “This app is so cool I can’t get along
without it”, “This service is so compelling that I feel like I am under charged for its use”, and “this app
is as necessary as my car, my roller blades, and my hiking boots, and, hey, this is fun!” The app-ser-
vices must offer a break from the tedium of day-to-day living. People spend enough time waiting in
lines, sitting in traffic, and being overloaded with junk postal mail, spam and those obnoxious popup
advertisements. Each of the above produce revenue but why perpetuate pain when it is not necessary?
Right, in the last three cases the advertisers are neither thinking of, nor care about the recipient of the
advertisements, rather they use any technique they can to increase sales revenue. How many times
have you, the reader, cursed these advertising methods? As its primary goal, the kind of acceptance

1-8
envisioned here requires maximal service with minimal user hassles. Furthermore, optimal app-service
response time is essential to enable the creation of real sources of revenue rather than using bogus nui-
sances for this purpose. These nuisances can be eliminated if they can be obsoleted.
In order to achieve maximal service with minimal user hassles we must look beyond the current client/
server mode of distributed computing that drives the Internet. We are looking towards a near future
where billions of devices will be interconnected. Although the client-server structured network is
straightforward, and has served us well up to now, even looking up the right server is a painful job for
both clients and servers as our every-day directory service, Domain Naming Service (DNS) becomes
one of the fatal bottle-necks with the sustained growth of the Internet. Moreover, with the appearance
of various directory services, such as Novell Directory Service (NDS), Network Information Service
(NIS) and Windows Active Directory, the difficulty of communication among those services has trig-
gered the adoption of a standard directory protocol - Lightweight Directory Access Protocol (LDAP).
This adds another bottleneck for information access.
On the road toward distributed computing on the top of the same client-server systems, several archi-
tectures were established to locate applications and allow applications to communicate. Those archi-
tectures include Remote Procedure Call (RPC), Remote Method Invocation (RMI), and Common
Object Request Broker Architecture (CORBA). They all need a centralized registry directory for cli-
ents to locate the distributed objects. This centralized service always requires high reliability, accessi-
bility and stability. Again, we have another bottleneck.
For authentication and authorization centralized, server based services such as Kerberos are in use.
The Internet security protocols like Secure Socket Layer (SSL) and Transport Layer Security (TLS)
currently require centralized Public Key Infrastructures (PKI) and well known, centralized Certificate
Authorities (CA) to manage X509 certificate creation and distribution. These systems are also facing
severe bottlenecks, and are required to do secure Internet financial transactions.
Beyond these computational, infrastructure limitations in the client/server model, we are also faced
with a new paradigm: Mobility. One travels with her/his laptop and would like to communicate with
another such client system. Even if name/address lookup is possible, this will no longer be sufficient to
locate the systems given their network address. There is in fact, no notion of a “home network.” One
solution is Mobile IP or a variant. But, Mobile IP still requires the device to have a home address. This
does not handle the problem of ad-hoc mobility where a node appears in a network, joins and begins to
communicate with other nodes. We are heading towards a collection of mobile devices with disposable
IP addresses and no home address identifier, e. g., a wristwatch. One can imagine two wrist watches
wishing to synchronize calendars. We need solutions for these mobile devices to discover and commu-
nicate with one another.
To build a reliable computing power house to serve billions of clients and applications, during the past
few decades, companies, institutes and governments are viewing Moore’s Law as a monarch to follow,
as well as a limit to challenge. Sooner or later, the limit will be reached at the level of each individual
machine, and scientists have already begun to investigate the possibility of building more powerful
computing engines by using more advanced technologies from optical to quantum that will no longer

1-9
be subjects of this Monarch. We are excited about the future, at the same time, we are worried about
present: idled computers, Internet services wedged like the 5 p.m. traffic on a Los Angeles freeway,
devices no longer able to communicate with one another, the impossibility of secure communication
between any two devices, wasted man-power and energy outages. Are we solving the right problem?
Are there better solutions already available?
We need P2P now because with the duplication of information at various locations closer to the sys-
tems accessing it, even when one or more these sites are disabled, the information can be retrieved
more efficiently since both the load on, and access bandwidth to a centralized resource are minimized;
with the direct connection between any two devices identified by unique IDs virtually achievable, the
centralized directory lookup will no longer be either a “must-have” component or a source of bottle-
necks in the network, and ad-hoc mobility can be achieved; with mobile agent services, objects, appli-
cations and clients can communicate and share information with each other in such a way as to
minimize the users’ involvement in tedious search tasks and thus make systems that are more user
responsive. There are many more possibilities brought by P2P technology and any one of them can
lead us toward the next wave. With respect to timing and the state of current technology, these possibil-
ities are much closer to realization, and preferable to us sitting here and waiting for the next revolution
from Physics or Bioinformatics.
We need P2P now because P2P will not only help to optimize a customer’s access to the Internet, but
will also provide a new, unique set of Internet commodities in the form of app-services compelling
enough to attract new users, by sufficiently improving both the user experience and these users’ lives to
keep them coming back for more.
The kind of new app-services one envisions are those for which P2P can play a significant role. They
must be multidimensional: Applications will be required have a mixture of text, audio and video at a
minimum; provide many-way communication paths between individuals; be responsive to the users’
location; be pervasive across the device space; and create the opportunity for not before possible busi-
ness ventures. An example of such a venture is in-the-home digital music recording studios using P2P
for distribution, digital rights management software for copyright protection and guaranteed payment
software such that a secure, parallel P2P payment structure can be put in place as a separate and sup-
porting business. With a few thousand dollars, several adventurous artists can create a digital music
marketplace where every customer is a potential seller. The recording giants like Columbia records
will have zero influence, will not be able to sue the artists nor their methods of distribution out of the
business, and will no longer monopolize the revenue. The growth potential of this kind of digital mar-
ketplace is tremendous with P2P in place.
So, why will P2P now help optimize Internet access and blow away this illusion of the user isolated at
its edge? A short answer to this question is that the current Internet technology without P2P cannot
support the sustained, optimized growth of multidimensional app-services, and the network topology
which P2P implicitly creates will location independent, and hot with activity everywhere. Let’s look at
why this is true.

1-10
As mentioned above, one of the first requirements is app-services that fully support multimedia data.
This means music, and video must be delivered on demand. The evidence is already here that central-
ized services cannot support the current demand for domain name lookup [Cheriton88], and the mas-
sive exchange of multimedia content is huge problem in comparison. The bandwidth is not there, and
the centralized, web-services based solution is always to add more bandwidth capacity, and larger
servers. This is historically viewed as keeping up with the demand by providing the same or poorer
quality of service. This is neither acceptable nor successful at providing user satisfaction. The analogy
is adding more lanes to the freeways to “improve” traffic flow, rather than seriously looking at alterna-
tive solutions that will either be convenient or compelling enough for drivers to consider using them.
How can P2P help now? Napster’s short-lived success proved that hybrid, P2P networks can efficiently
deliver billions of copies of mpeg files by taking advantage of peers in a P2P network in such a way as
to encourage the independent redistribution content. As we mentioned above, software is being written
to protect the digital rights of the owners of the content. It is foolish to ignore the content distribution
power of P2P networks if one desires to have optimal, revenue sustaining, digital marketplaces.
The build out of Wireless LANs (WLANs) based on 802.11a/b networks will arrive sooner rather than
later. In 2002 the revenue for WLAN in South Korea was expected to be $100,000,000 [WLANREVE-
NUE]. As will be discussed in Chapter 6, P2P is a natural fit for optimal content distribution in
WLANs. In section 3.4 it is pointed out how P2P will encourage an evolution of the organization of a
mixture of network devices again leading to an optimal use of bandwidth and services to eliminate the
centralized bottlenecks reinforced by the pre-Columbian illusion of where the center of the Internet is
located.
A second way P2P can optimize the Internet now is by taking advantage of the processing power of
quiescent CPUs. It was projected in 1966 that mainframe computers would revolutionize the world.
Neither the arrival of the now-extinct mini-computer nor the microprocessor was anticipated. A mobile
phone’s processor is more powerful than a typical 1966 mainframe’s! Mobile devices included, there
are several hundred million computers out there all connected to the Internet, and most of the world’s
population is not yet connected. The existing available processing power is massive. Using P2P one
can create coordinated, loosely-coupled, distributed computational networks of hundreds of thousands
of nodes. SETI@Home is successful as an administratively centralized computing grid where the
responsibility for decisions are made by software at SETI@Home’s University of California’s labora-
tory. With the addition of P2P capabilities SETI@Home will be able to off load some these administra-
tive tasks by adding server capabilities to each SETI@Home client. This will help to both lessen the
bandwidth used to and from their laboratory, and speed up the overall grid computationally by, for
example, permitting checkpointed jobs to be off-loaded to another client directly.
In the very near future one’s home can be a fully connected P2P network. This network in turn can be
connected to either a laptop, PDA, mobile phone, automobile, or workstation in one’s office behind a
firewall giving each family their personal peer community. This is possible now with existing P2P
technology [JXTA]. These latter networks are not as refined as they can and will be, and the time has
arrived to begin the engineering refinement that is necessary. This book presents the fundamentals suf-
ficient to initiate the process.

1-11
1.5 Basic P2P Rights
P2P Networks are organized overlays on top of and independent of the underlying physical networks.
We have a definition that is general enough to permit almost any device with network connectivity to
be a node in some P2P network. And our definition permits these networks to be both ad-hoc and
autonomous, and their nodes to be anonymous. It also admits the possibility of strongly administrated,
well authenticated networks. And, in either case, both openness and secrecy can and will exist. This
paradigm is frightening to some because on the one extreme it is antagonistic to George Orwell’s state
in the book 1984. Big brother will not know that you exist and therefor cannot be “watching you.” It is
frightening to others because it also permits Orwellian P2P networks where there is no secrecy, all
communication is both monitored and audited, and all data is in clear-text and logged. What’s impor-
tant is the freedom of choice P2P offers. This choice has concomitant responsibilities, a P2P credo if
you like: Respect one another’s rights, data, cpu usage, and share the bandwidth fairly; spread neither
virus nor worms; be tolerant of one another’s resource differences; be a good network neighbor; will
do no harm to others. The nature of credo’s is to be violated. That is understood and part of human
nature. The goal is to minimize non-altruistic P2P behavior by either localizing it to those P2P net-
works where it is acceptable, or appropriately punishing it when it leaks into networks where it is
unwanted.
Rightly enough, in the sanctity of one’s home can be a full-blown P2P network where everything is
connected, everything is private, and only search warrants will permit entry. The United States Bill of
Rights can be viewed as a P2P supporting document since freedom of assembly, speech and the avoid-
ance of unreasonable search and seizure are at the heart of P2P. And, certainly one can imagine a well
organized “network militia” bearing its software arms to protect the network and the data resident
therein. Freedom of access for network devices and their associated data are at the heart of P2P net-
works. The rules for the network and data use are decided by the member nodes, are member nodes’
policies. In the P2P world the voice of the minority is equal to that of the majority. Purchase several
devices, create a P2P network and attach them. Then chose your P2P applications carefully.

1.5.1 “Freedom of Assembly” for Right to Connect Everything


The first decade of the new millennium will see an exponential growth of network aware devices capa-
ble of sending and receiving data. The list is long and the combinatorics defy one’s imagination. Along
with computers we will have: PDA’s, mobile phones, automobiles, TV’s, light switches and light bulb
receptors, fans, refrigerators, alarm systems, wrist watches, stoves, dishwashers, ovens, stereos and all
components, electricity and gas meters, pet licenses, eye glasses, rings, necklaces, bracelets, etc.... Any
combination of these can be inter-connected to form ad-hoc P2P networks. One might ask, “To what
end?”
Imagine the following: Having dinner in London with several friends and receiving a mobile phone
call from your home, not someone at your home, but your home telling you that the alarm system had
been triggered. This is a real story told to one of the authors: In 2000 his friend from Sydney immedi-

1-12
ately used his mobile phone to scan the alarm log files and detected an alarm on the back bedroom
windows had been triggered. Since this bedroom was support by stilts and the alarm in question was
on a window overlooking a canyon, he concluded that a bird had flown into the window. Right, the
friend lives in an experimental home. But, the experimental possibility can and will be a reality during
this decade.
Similarly, it is easy to place oneself in a scenario having just left home and worrying if the oven or
stove was left on. Rather than turn back, a simply control panel on these devices which are peers in a
home P2P network and this home P2P network accessible with either a wireless device in one’s auto-
mobile or a mobile phone, both peers in one’s private home network, is sufficient to make a quick
check. In fact, one could launch a mobile agent to do a full integrity check of the home and report
back. Ten seconds later one will receive an “all is well,” or “you left the stove on, shall I turn it off?”
A final scenario is ad-hoc networks of people in coffee houses, railroad stations, sitting in the park, or
in an outdoor cafe. Here, jeweled bracelets, or necklaces might light up in similar colors for all of those
who are members of the same ad-hoc, P2P network. In the evening when viewed from the distance,
one can imagine small islands of similar colored lights moving about, towards and away from one
another, in a beautifully illustrated, ad-hoc social contract as “Smart Mobs” [Rheingold].
These scenarios are endless, practical, and part of our future. P2P engineering for wireless devices and
networks is discussed in Chapter 6.

1.5.2 “Freedom of Assembly” for Survival of the Species


Eco-systems are self-organizing, self-maintaining and in case of reasonable injury, self-healing. There
is life and death and the eco-system evolves using natural selection. This process in continuing and
new life forms arrive over time as the result of mutation. Eco-systems are great for trial and error test-
ing. The same must be said for P2P overlay networks. Peers come and go, crash during data transfers,
lose their visibility, and are rediscovered. New devices are accepted on face value, and permitted to
participate in the network’s activities. P2P networks are spawing grounds, playgrounds for creative
thinkers. In this manner, a P2P network can continue to gather new members, make them as known as
policy permits, and behave much like eco-systems where diversity leads to survival of the fittest. Peers
are free to assemble with others for the interchange of content. Peers like mobile-agents are free to
traverse the P2P network, visit those peers where entry is permitted, gather information, and return it to
their originators.
As such, “Freedom of Assembly” is the ultimate P2P right. As “what is P2P” defines, although each
single device is part of a cooperative whole, it is a node in a P2P network and makes its own decisions
and acts independently. A device’s presence is neither required nor denied. Hence, the failure of a
device should not be harmful to other devices. If two devices are connected, and one abruptly crashes,
this should be a little hiccup in the P2P network, and there ought to be a redundant device to take its
place. Still, everything has two sides, this independence also means that there might not initially be
anyone who will help this poor, temporarily stranded guy. As for a highly-available client-server sys-

1-13
tem, there always are servers behind each client, and back-up servers configured for each server but
they are subject to bottlenecks, resource depletion and denial-of-service attacks. So these self-main-
taining, self-healing and self-adaptive features cannot always reduce the burdens on client/server, cen-
tralized systems. On the other hand, for a device in a P2P network they are not only essential but rather
they are inherent in the network ecology. Thus, the “poor guy” who was sharing content and abruptly
lost the connection can expect to resume the operation with another node although this recovery may
not be instantaneous. During its apparent isolation it might receive the welcome visit of a wandering
mobile-agent that is validating the network topology and can redirect the peer to a new point of con-
tact. Similarly, denial-of-service attacks are difficult because like an eco-system, there is no center to
attack because of the built in redundancy.
From a software engineer’s perspective, ideally, P2P software must be written to reside in a self-heal-
ing network topology. Typically, any device uses software to monitor its tasks, schedule its resources to
maximize its performance, set pointers and re-flush memory for resuming operations efficiently after a
failure. At the higher level, the P2P software should be able to adjust to the new environment for both
recovery and better performance. For example, a device might have dedicated caches or tables to store
its favorite peer neighbors to be used for fast-track connections or network topology sanity checks.
When either a neighboring device fails or one its of buddies is not so “trustful” for a intolerable period,
the P2P software on the device should be able to dynamically modify its records. In this way, at least
the same connectivity can be guaranteed. This is just one of the most straightforward cases showing
that P2P software needs to be self-healing and self-adaptive if the network is to behave in the same
manner since the P2P network the “sum of its nodes.” The engineering dynamics of these scenarios is
discussed in detail in latter chapters.
Unfortunately, not all devices are capable of self-management, for example, the handheld, wireless
devices. Such small devices don’t have enough computing power and memory to have such sophisti-
cated software or hardware embedded. So, they must rely on other machines for the above purposes.
Although the P2P purists hate to use the “server” word, it is true that the small devices require such
“server-like” or surrogate machines, and this fits perfectly with the definition of P2P overlay networks
defined above.
As mentioned just above, “Freedom of Assembly” in P2P networks is supportive of a multiplicity of
devices and software “organisms.” They arrive, try to succeed, to find a niche, and either continue to
flourish or die out. Since the early 1990’s mobile-agent technology has been looking for the appropri-
ate execution environment. They can be invasive, pervasive, informative, or directed, and come in all
shapes and sizes. They best work when implemented in JAVA. Mobile-agents can be written to adapt to
self-healing, ad-hoc network behavior and, in fact thrive in such an environment. The very fact that
they are mobile by nature, and can have self-adapting itineraries and agendas, can be signed and thus
secured, and are opportunistic as they travel, has always required a network eco-system for their sur-
vival, and evolution in mainstream technology. The authors of this book are advocates of mobile-agent
technology as applied to P2P overlay networks. The engineering details are discussed in Chapter 4.

1-14
1.5.3 “Freedom of Speech” for the Right Publish Data and Meta-data
As previously mentioned, the data or information published and transferred on the Internet is multi-
dimensional, and enormous in volume1. Thus, brute force pattern matching techniques for searching
for data are no longer durable, and become dinosaur-like relics from the era when only textual data was
available. A file sharing system which depends on such simple techniques can be easily hacked since it
only requires data corruption by a virus to destroy data. Now, a description of data, data about data,
meta-data, is an essential component of the organization of data on the Internet to make tasks like
search achievable. With meta-data, for example, one can keep signed hashes of the data in meta-data
that permit one to detect modification attacks. Similar techniques can be used to detect data that has
been modified while being transferred on the Internet. Nodes on a P2P overlay network have the abso-
lute right to exchange data or meta-data or both.
This meta-data can be public or private, freely published and subscribed to by everyone, or absolutely
secret and viewable by a select few. Meta-data can be stored as clear text and posted to a public domain
site for wide distribution of the data described by this meta-data. One of the immediate uses of these
sites is to share research publications among institutes. On the other hand, P2P applications have the
choice of not hiding or hiding meta-data. They can have strong encryption or use secure IP (IPsec) so
that data or meta-data that is being exchanged can be impossible to monitor because well written secu-
rity systems can assure the end-to-end privacy of these “conversations.” Thus, encrypted meta-data can
and will be impossible to detect on peers’ systems. Also, access to a system’s data directories, i. e., the
meta-data describing the files on that system, can be password protected, and this meta-data can be
transmitted as clear text or encrypted descriptions of these directories. Thus, again it may only be visi-
ble to the possessor of the decryption key so that detection, in this case, is again impossible. Processing
speed is so fast that encrypting or decrypting a megabyte of data only takes a second or two. Thus, the
processing time required to keep both local and remote user data and meta-data secret is almost not
noticeable in human time. “Freedom” of Internet privacy protection has almost no obstacles because
the cryptography code which implements the required algorithms is freely available on the Internet
[CRYPTIX, BOUNCYCASTLE, PURETLS, OPENSSL].
Noting that thirty three percent of all internet traffic is directed towards pornographic sites [TechNews-
World], will P2P networks be any different with respect to data and meta-data that is published? The
answer is probably not. The “Freedom of Speech” gives one the right to publish data and meta-data
within certain limitations, and the Internet knows fewer and fewer boundaries to data exchange. The
first amendment to the United States Constitution is being applied world-wide inspire of resistance
from governments that wish to control the information that crosses their borders.
When P2P networks are pervasive, the publication of content will reside more and more on individu-
als’systems. These systems will be much more difficult to locate because their network addresses will
be temporary and often mobile. Still, the permitted and accepted private exchange of data and meta-
data is no different than a telephone conversation. The problem does not reside with the system that is
used to enable the conversation to take place, but rather with the endpoints of the conversation, the
1. Google searches more than 2 billion publically accessible web pages as of July, 2002. http://www.google.com

1-15
individuals using the system. New technology forges new pathways across all systems including legal
ones. This always has and always will be the side-effect of crossing new frontiers. It’s exciting for
some and frightening for others, and one generation’s laws may become the next generation’s blue
laws, i. e., outdated relics of historical interest which will always have some diminishing in time sup-
port. Undeniably, all users will be subjects to the current “laws of the land,” and open to arrest and
prosecution for their violation. But, P2P technology will create new markets for honest digital com-
merce with enormous economic incentives and will also permit private network conversations between
consenting adults along with the expected criminal activity. The latter is well understood human
behavior that P2P neither encourages nor discourages much like a highway neither encourages nor dis-
courages a driver to exceed the posted speed limit. The solutions to these problems is neither to abolish
driving nor stop innovative technological progress because it can be misused. Clearly such reactions
are neither well thought out, nor well founded, and will not lead to long term, fair and meaningful solu-
tions. The latter will come from the technologists themselves working with law makers and media
firms, and always respecting an individual’s Basic P2P Rights.
The engineering aspects of data and meta-data are discussed in Chapter 3.

1.6 Contents of This Book and P2P PSEUDO-Programming Language


This book is organized as following:
• Chapter 2 gives an historical perspective on both the evolution of the Internet in two phases,
pre-WWW and post WWW, and the roots of P2P.
• Chapter 3 covers the essential, engineering components of the generic P2P model. These
include the descriptive language and resulting document space, unique peer identity, the
overlay network, and communication components. The chapter concludes by showing how
these components can be assembled to permit communication on the overlay network given
the limitations of the real, underlying network and physical layers.
• Chapter 4 gives life to the components, and describes protocols used to create an active P2P
network. Here connecting, routing, load balancing, and querying are discussed.
• Given an operational P2P network which is an instance of the documents, components, and
protocols presented in the previous three chapters, and thus, a model of a P2P system, in
Chapter 5 we present the details of how one can implement standards based security in
such a system. We conclude this chapter by applying these security fundamentals to dem-
onstrate how to create secure Java mobile agent P2P services.
• Chapter 6 is a thorough discussion of wireless networks followed by showing how P2P can
enable exciting new applications that are device and bearer network independent, and thus
be a long needed, unifying force between wired and wireless networks. We also describe
what is required to build a Java P2P application for mobile handsets.

1-16
• Chapter 7 explores some possible P2P applications starting with the familiar email, and
chat programs and ending up with less familiar and innovative, almost science fiction like
possibilities.
In order to explicitly express the engineering principals in this book a P2P Pseudo-Programming Lan-
guage, 4PL has been devised. The syntax and semantics of 4PL are defined in Appendix I. 4PL permits
one to both programatically define nodes on the P2P overlay network, as well as describe their interac-
tion by defining each P2P component we introduce in Chapter 3 as a 4PL data type, and creating a set
of associated 4PL commands to which one can pass and return typed variables.
As mentioned above, in Chapter 4 we define several overlay network communication protocols. We
will use 4PL here to create flow charts to describe the precise protocol behavior. 4PL thus will give a
solid logical foundation to the engineering and protocol definitions, and eliminate the possibility of
inconsistent behavior barring any 4PL programming bugs. It is suggested that the reader uses Appen-
dix I as a reference when reading Chapters 3 and 4.

1-17
Chapter 2 P2P Is the
End-Game of
Moore’s Law

“I recall quite vividly having been awe struck by the processing power and
memory size that was available to me when I took it upon myself in 1981 to
rewrite my PDP11-05 mini-computer based, multiple protocol router code to
take advantage of a master degree student’s, Andy Becholstein, mc68000
micro-processor board. The latter had a clock speed of 8 megahertz and 256K
bytes of Chip RAM while the former had a clock speed of 2Mhz and 56K bytes
of core memory available to run software. Andy’s board was one of the first in
a long line that would bring us to where we are at this time.1” Today, systems
with clock speeds in excess of 2 gigahertz, several gigabytes of high speed,
RAM and in excess of 100gigabytes of disk storage are readily available on
desktops and laptops. At the same time better mobile phones have 40 mega-
hertz clock speeds, a megabyte of RAM, and 64megabytes of flash memory.
Thus, twenty two years or 264 months from 1981 we see the predictive power
of Moore’s law: Gordon Moore stated in 1965 that the transistor density will
double every 12 months. Yes, this slowed down to doubling every 18 months
over the years. Still, along with the doubling of transistor density we have had
a concomitant increase in processor speeds. This is because, first, there are
more transistors in a given space, and second the delay on the transistors and
wires is reduced.2 Indeed, since 1981 we have had 8 such doublings, or
1. Bill Yeager’s recollections of his days at Stanford University, December, 2003.
roughly one doubling every 33 months, yielding processor clock speeds that
are now 256 times as fast as they were in 1981: 28 = 256, and 256 x 8 mega-
hertz = 2048 megahertz or 2 gigahertz. It is assumed that by the end of this
decade Moore’s law will no longer apply to silicon based transistors. Similarly,
given the current computing resources in the hands of the average user, and
the Internet, it is no coincidence that the new millennium was greeted with a
rebirth of P2P technology. The potential “computational energy” available bog-
gles one’s mind. Indeed, we view the emergence of P2P as a means to tap
this energy source, and is as such, the final moves, the logical conclusion to
the evolution brought about by Moore’s Law, that is to say, “P2P is the End
Game of Moore’s Law.” Decentralize and conquer is the end-game’s winning
strategy. The inevitability of harnessing the latent CPU power of personal sys-
tems into communities of users with common interests is the logical conclu-
sion of Moore’s Law. There is just too much CPU power to be ignored.

2.1 The 1980’s


As the increase in processor speeds began to diligently follow the almost mystical curve of Moore's
law, the initial benefactors were the servers in the client/server model of distributed computing. Still,
there were those who even in the 1980's dreamed of harnessing the untapped computational resources
of the workstations that began to dominate the desktops of researchers, those who viewed these sys-
tems as peerNodes capable of sharing content and computational services. The embryonic beginnings
of the P2P technology that would surface at the debut of the new millennium were already in place
twenty years before its birth. And, during this twenty-year gestation period several events put in place
exactly the requirements necessary for the successful rebirth and growth of P2P computing during the
first ten years of the new millennium. Owen Densmore, formerly of Sun Labs, and now working for
Complexity Workshop [ComplexityWorkshop] in Santa Fe, predicts that 2000-2010 will be the
“Decade of The Peer,” and we believe, as do many others that Owen is correct. In this chapter we look
at the history of P2P, its initial appearance in the 1980’s, and the historical highlights of the twenty
year gestation period leading to its inevitable and logical reappearance in 2000.
One imagines that most of the software architects and engineers that designed Napster, Gnutella, and
FreeNet were about thirty years old in 2000. That puts them at the introduction to their teenage years at
the time that the Arpanet switched from NCP, the Arpanet protocols, to IP in 1983, and from our point
of view the internet became The Internet with the addition of end-to-end IP connectivity. During the
decade of the 1980’s, IP quickly obsoleted the other network protocols that were in use at that time.
Some examples are XNS, DECNET, Appletalk, Localtalk, Chaosnet, and LAT. By 1984 IP and its
2. Delay is proportional to resistance times the capacitance, and both resistance and capacitance are reduced as a result of
Moore’s law.

2-2
accompanying protocol suites were already well on their way to becoming the global network stan-
dards.So, what does this have to do with P2P? Clearly, the rapid growth of networking technology that
took place in the 1980’s was the major impetus, the force that pushed us technically from the perspec-
tive of applications to where we are today. Behind these advances, behind the scenes if you like, we
also find first the effects of Moore’s law: Smaller is faster; smaller is more memory; more memory
implies network hardware and software with more functionality; better networks imply higher produc-
tivity and new and creative ways to communicate. Second, the IETF as a forum for open Internet stan-
dards was then, and still is a major factor. From 1980 until 1990 three hundred and two rfc’s were
approved. The 1980’s, indeed, set the stage, put in place the sets, and the scenery for as yet unwritten
dramas to take place. They would be all about the evolution of the Internet and P2P will play a major
role in the act that began in the year 2000.

2.1.1 LAN’s, WAN’s and the Internet


While some Local Area Networks (LAN) did exist in the 1970’s at places like Xerox PARC where eth-
ernet was invented, the real upsurge occurred in the 1980’s, and in particular, after 1983. In this context
it is important not to forget that the 3mbps ethernet, ethernet “version 1,” using the Parc Universal
Packet (PUP) specifications was officially documented in 1976 in a paper by Bob Metcalfe and David
Boggs entitled, “Ethernet: Distributed Packet-Switching for Local Networks.” We include in appendix
III a copy of the first page of a PARC Inter-Office Memorandum by Ed Taft and Bob Metcalfe written
in June of 1978 which describes the PUP specifications [PUP]. We certainly can assume that the hard-
ware and software that is discussed in this latter memo existed well before this date. PUP was used to
link together Altos, Lisp Machines, Xerox Star Systems, and other servers at Parc. Bob Metcalfe left
Xerox in 1979 to form 3COM, and promote ethernet. The Ethernet version 2 standard, or 10mbps eth-
ernet is specified in IEEE 802.3. The standardization is the result of a consorted effort by Xerox, DEC,
and Intel from 1979 to 1983 that was motivated by Bob Metcalfe. Today ethernet is the world’s stan-
dard comprising 85% of the LAN’s.
We now give a brief description of the emergence of the Stanford University LAN. It is by no means a
unique event but one we can describe from Bill Yeager’s deep personal involvement in this effort. And
certainly, as we will see, what happened at Stanford did spearhead the growth of networking on a
world-wide scale.
By the means of a grant of hardware and software received in December of 1979 by Stanford Univer-
sity from Xerox PARC, PUP became the original 3mbps ethernet LAN protocol at Stanford University
and was primarily used to link D-Machines, Altos, Sun Workstations, VAX’s and TENEX/TOPS20
systems across the University. The first three subnets linked the medical center, with departments of
computer science and electrical engineering by the means of the first incarnation of Bill’s router which
ran in a PDP11-05 and routed the PUP protocol. These three subnets were the basis for the original
Stanford University LAN. IP showed up at Stanford in late 1981, and to support IP Bill invented the
multiple protocol, packet switched ethernet router that routed both PUP and IP. He continued to
develop the code over the next 5 years. By 1985 the code routed PUP, IP, XNS and CHAOSNET. It

2-3
was officially licensed by Cisco systems in 1987, and was the basis for the Cisco systems router tech-
nology. As of late 1981 the hardware for these multiple protocol routers was known as the “bluebox.”
The first had a multibus backplane outfitted with a power supply, mother board with a mc68000 CPU
and 256Kbytes of chip memory, and held up to 4 ethernet interfaces. The bluebox was invented in the
Stanford department of computer science. The mother board was Andy Becholstein’s invention
[CAREY]. The first cisco routers used the identical hardware. They ultimately became the Internet
router of choice.
10mbps ethernet subnets appeared in early 1982 and along with IP began to dominate the LAN which
blossomed from a research LAN with three subnets to one that began to connect all of Stanford’s aca-
demic and non-academic buildings. The Stanford LAN had the IP class A internet address 36.0.0.0,
and was first connected to the Internet in 1983 by the means of a BBN router called the Golden Gate-
way that was maintained by a graduate student name Jeff Mogul. Jeff has been very active in network
research since graduate student days in the early 1980’s. As a graduate student he co-authored rfc903
on the Reverse Address Resolution Protocol in 1984 and since that time he has joined with others to
write an additional fifteen rfc’s. His most notable effort may be rfc2068 which specifies HTTP/1.1.
One of his co-authors was Tim Berners-Lee.
By 1990 the Stanford LAN had more than 100 subnets. This growth was accomplished by formally
acknowledging the necessity of the Stanford LAN for doing day-to-day business and forming in 1985
a department under the management of Bill Yundt to support their further growth. Stanford continued
to support the PUP, IP, XNS and Chaosnet protocols into the late 1980’s since the ongoing research
required it. The Stanford LAN service was superb and permitted seminal work in Distributed Systems
to be done which is clearly a forerunner of P2P. This research is discussed in section 2.1.3.
In a similar context, the MIT Media Labs had Chaosnet which was originally used to link its Lisp
machines, and later a large selection of machines at MIT. This was documented by David Moon in
1981 [Moon81]. By the mid-1980’s LAN’s like these were common place in Universities, and busi-
nesses began to follow suit.
In 1985 golden was retired and Bill Yundt’s department provided connections to the Internet, NSF
backbone. These were T1 1.52mbps networks formally called the NSFNET. Similarly, a T1 Bay Area
Network was created to link up Universities, research institutions and companies in the general Bay
Area. BARNET extended to U. C. Davis near Sacramento. Bill Yundt played a major role and was an
impetus to the formation of BARNET which was one of the first Wide Area Networks (WAN). We
believe there were no restrictions with respect to whom might connect to BARNET. It was a pay as you
go network. From 1985 onward, LAN’s and WAN’s popped up everywhere with the NSFNET provid-
ing the Internet connectivity. Networking was the rage.
The Internet grew dramatically with NSFNET as the major motivating force. “NSF had spent approxi-
mately $30 million on NSFNET, complemented by in-kind and other investments by IBM and MCI.
As a result, 1995 saw about 100,000 networks—both public and private—in operation around the
country. On April 30 of that year, NSF decommissioned the NSF backbone. The efforts to privatize the

2-4
backbone functions had been successful, announced Paul Young, then head of NSF's CISE Directorate,
and the existing backbone was no longer necessary. [NSF]”
Before we move on, it is worth reflecting just why networking was the rage? What was behind the
rapid adaptation of the latest technology? Clearly, one did not spend millions of dollars on a technol-
ogy because it was something “cool” to do. Rather, those who financed the research and development
ultimately required a return on investment (ROI). It is because networks stirred the imagination of
visionaries, researchers, software designers, systems analysts; of CEO’s and CTO’s; of thousands of
students in Universities; and above all the users of the network applications. Thus, a job market was
created to implement what would become the next industrial revolution. The ability to network appli-
cations streamlined business processes, opened the doors to new forms of interactive entertainment,
provided avenues for long distance, collaboration in real time. Expertise became location independent,
as did one’s geographical location with respect to her or his office. The effective, network virtual office
was born in this decade because of the network infrastructure and the early network applications.

2.1.2 Early Network Applications


The first, foremost and most popular network application has always been email. It existed in the
1970’s on the ARPANET and became standardized in the 1980’s on the Internet with SMTP, POP1 and
IMAP. The rapid information exchange that email provides has always made it a great tool for commu-
nication be it for research, business, or personal use. It also points out that above all, applications that
promote person-to-person communication will always yield a decent return on investment.
The IMAP email service of the late 1980’s was a harbinger of how effective protocols can be that are
specifically targeted at both the network infrastructure and computer resources. It had significant
advantages over POP at that time since IMAP permits one to access the properties of messages as well
as the messages themselves, and the parsing of messages was entirely done on mail servers. This
greatly simplified the writing of client UI code, and maximized the use of network bandwidth. It is
important to recall that the NSFNET was T1 based at this time and clients were very limited with
respect to computational and storage resources. Also, many clients ran on computers that used serial
lines to connect to the network. Bill Yeager recalls demonstrating macMM and IMAP at the Apple
Corporate building in Cupertino in 1989, reading his email on the Stanford LAN via BARNET and
was not surprised to see no human recognizable difference between reading email in Cupertino from
the SUMEX-AIM IMAP server some fifteen miles away at Stanford and doing the same thing on his
desktop at Stanford. A great deal of this performance was implicit in the original design of the protocol
provided that the clients were well written as macMM and MM-D both were. While mail is not P2P, it
gives the user a sense that it is P2P. Servers in the middle tend to be invisible servants for most of
today’s email users. And, just like IMAP was born out of the necessity of the requirements of the
1980’s, we see today’s network infrastructures and computer resources ready for new email protocols.
To this end, for a discussion of P2P Email see chapter 7.
Similarly, one cannot discuss early network applications without mentioning telnet and ftp. Both can
be viewed as early P2P applications for those users who had Unix workstations or Lisp machines as

2-5
desktop. Each system was both a telnet/ftp client and server. Users regularly ran applications on one
another’s systems, and exchanged data with ftp. This is discussed further in the next section.
It is also amusing how today’s users believe Instant Messaging and chat rooms are a phenomena of the
new millennium. Chat room applications were available on mainframes before we had networks. One’s
buddy list was extracted from a “systat” command which showed who was on at the time. And the chat
room’s were supported by dumb terminals like heathkit Z29’s or datamedia’s. Mike Achenbach wrote
a chat application exactly like this that ran on TENEX mainframes. The terminal screen was broken
into rectangles to support each member of the chat room. These chat rooms were small but the func-
tionality is identical. When networks arrived, we had PUP based chat on Lisp Machines. The UI’s were
graphics based. Finally, the Unix talk command was always networked and used in conjunction with
rwho for presence detection. The protocols have evolved over the years but the ideas came from the
1980’s. Nothing really new has been invented in this arena since that time. Chat, and talk were both
P2P applications.
Also, LAN and Internet router software used routing protocols to communicate routing information
with one another in a P2P manner. Here we are thinking about the Routing Information Protocol (RIP)
and inter-domain routing based on the Border Gateway Protocol (BGP). By either broadcasting routing
information (RIP) or supplying to it with a reliable connection (BGP), the service was symmetric, with
each such system behaving as both a client and a server.
Another application that was born in the 1980’s is the network bulletin board. They arrived in many
forms. While some were simple digests with moderators, others were fully distributed client/server
systems with interest lists to which a user could subscribe, and both post and read messages. Good
examples of the former are the SF-Lovers digest, and info-mac. The SF-Lovers digest was ongoing
email messages where the daily threads were mailed out as a moderated digests to the subscribers. The
subject was science fiction and fantasy and SF-Lovers became extremely popular in the late 1970’s
with the release in 1977 of the first Star Wars film, “A New Hope.” Info-mac was all you wanted to
know about the macintosh and was hosted by Sumex-AIM for more than a decade. What was admira-
ble about such digests was the dedication of the moderators. Keeping such a digest active was a full
time job, and those who moderated these digests did it out of a passion for the subject. It was volun-
tary. The Network News Transport Protocol (NNTP) was specified in rfc877 in 1986. “NNTP specifies
a protocol for the distribution, inquiry, retrieval, and posting of news articles using a reliable stream
(such as TCP) server-client model.” USENET servers on top of NNTP were P2P systems in the sense
that they were clients of one another in order to update their news data bases. The actual net news cli-
ents could run on any system that had TCP/IP as the transport. The client side protocol is simple and
elegant, and USENET client/server system provided a powerful mechanism for the massive exchange
of opinions on almost any topic imaginable.
One might pause and ask where is the ROI, “show me the money.” Even applications as simple as ftp
on TCP/IP encouraged digital data repositories to be created, and thus the rapid exchange of informa-
tion. People caught on quickly in the 1980’s and soon other content query, storage, and exchange pro-
tocols were placed on top of TCP/IP. Among these were networked SQL, distributed data base

2-6
technology; Digital libraries of medical information at NIH; Remote medical analysis; networked print
services from companies like IMAGEN; and Laboratory-to-Laboratory research as exemplified by
national resources like the Stanford University Medical Experimentation in AI and Medicine
(SUMEX-AIM). All of these networked technologies led to huge cost savings and streamlined both
research and business processes thus yielding more profit and ROI.
Finally, “During the late 1980s the first Internet Service Provider companies were formed. Companies
like PSINet, UUNET, Netcom, and Portal were formed to provide service to the regional research net-
works and provide alternate network access (like UUCP-based email and Usenet News) to the pub-
lic.[HISTINTERNET]”

2.1.3 Workstations and Distributed File Systems


The 1980’s also hallmarked the birth of systems such as the personal Lisp machine, the Unix worksta-
tion desktop, the macintosh, and the PC. These machines, for the first time, gave users their own sys-
tems for running applications and network services, and broke away from the approach of “all of your
eggs in one basket,” that is to say, a dependency on a serial-line tether to a time-shared mainframe to
run applications and store data. As already discussed, routers on the other hand inspired the, “Let’s
connect everything to everything” attitude. They provided the means to this inter-connectivity be it on
a LAN, WAN, or the Internet. An important feature of routers that is often overlooked is that they also
form barriers that isolate local subunit traffic to that subnet. Consequently, they permit a great deal of
experimentation to take place within a LAN without having it disrupt the day-to-day business that is
conducted through interaction of many of the hosts connected to the LAN. Thus, in particular, the
1980’s found users and researchers alike in the ideal network environment where co-habitation was the
accepted policy, and routers effectively administrated the policy. We were at this time clearly on the
path towards both centralized client/server and decentralized, distributed computational services. And
as seen below, although not called P2P, the freedom this environment provided encouraged both dis-
tributed file sharing and computation.
Since many of these systems (Unix desktops and Lisp Machines in particular) had client as well as
server capabilities, telneting or ftping between them was the norm. Also, mutual discovery was done
with DNS. Every host on the Internet could have a fixed IP.v4 address, and it was easy to keep track of
the unique host names of interest that were bound to those addresses. In this sense, a users having sym-
metric ftp access to one another’s systems is P2P in its generic form. Noting that this was as easily
done across the Internet as on one’s local subnet or LAN since each such system had a unique IP
address, the true end-to-end connectivity that existed at that time yielded P2P in its purest state.
The early 1980’s featured the rise of Unix servers. These servers ran the rdist software that permitted
them to share binary updates automatically and nightly. They were peers from the perspective of rdist.
Similarly, Lisp machines such as Symbolics Systems, and Texas Instruments Explorers were extremely
popular as research workstations, and they too behaved as both clients and servers, as peers using their
own file sharing applications as well as ftp.

2-7
The Network Files System (NFS) was introduced by Sun Microsystems in 1984, and standardized with
rfc1094 in 1987. This was quickly followed by the Andrew File System from Project Andrew at Carn-
egie Melon University. While NFS was restricted to the LAN, AFS was Internet wide. These file sys-
tems run on both clients and servers, and permit users to view a distributed file system as a collection
of files virtually on their own systems. The Unix “ls” command was location independent. Therefor, to
access a file one used the usual local command line interfaces since drag and drop user interfaces did
not yet exist. As long as the connectivity was there, any file for which the user had access privileges
could be simultaneously shared as if it was on the local system. This is again an example of p2p file
sharing. A major difference between NFS and AFS file sharing, and what has become known has file
sharing in the current decade is that the latter is done by retrieving a copy and storing it locally, while
the distributed file systems worked and still work perfectly well as virtual file systems. The file itself
need not reside on the local system even if it appears to do so. Thus, a file can be read, or written with
simultaneous access and appropriate locking mechanisms to prohibit simultaneous writes. One other
difference is the nature of the content. During the 1980’s for the most part shared content was either
text, or application binaries, and thus the impetus for massive file sharing did not exist as it does now.
The user communities in the 1980’s were initially technical and research based, and evolved to include
businesses towards the end of the decade. Still, it is easy to imagine what might have happened if even
digital music was available for distribution during that epoch.
We are quite sure that speakers would have appeared on workstations, and distributed virtual files sys-
tems like NFS and AFS would have been one of the communication layers beneath the Napster equiv-
alents of the 1980’s. Sure, the audiences would have been smaller but the technology was there to do
what was required for LAN/WAN wide distribution of digital content, and the Internet connected
LAN’s and WAN’s. You get the picture.
Using these distributed file systems for P2P was a natural for read-only file sharing of multimedia con-
tent. Recall that disk drives were limited in size, and that many of the early workstations were often
diskless. They booted off of the network and ran programs using NFS. Still, peerNodes could have
auto-mounted the file systems containing content of interest, and then search, list and view it as appro-
priate for the media type. The meta-data for each file could have been cached throughout the P2P Net-
work on small servers behaving much like mediators, and carry with it the file system location of
where the file resided.The meta-data and content may have migrated with access to be close to those
users to whom it was most popular. Noting that scatter-gather techniques are a variation on the themes
used in the 1980’s for both the interleaving of memory as well as storing files across multiple disk
drive platters for simultaneous access with several disk drive read heads to improve performance, com-
ing up with a similar scheme for distributing files in a more efficient way is and was an obvious next
step. A file may have in fact existed in several chunks that were co-located on the thus constructed P2P
network. The demand would have motivated the innovation. Finally, since the content never needed to
be stored on the system that was accessing it, if necessary, digital rights management software could
have been integrated as part of the authentication for access privileges. Thus, P2P content sharing
existed in a seminal, pure form in the 1980’s and the technological, engineering innovations in place
today that give us global content sharing on P2P networks are really a tuning/reworking of old ideas

2-8
accompanied with the expansion of the Internet, the performance enhancing corollaries associated
with Moore’s law, and drastically increased local disk storage. The authors sincerely believe that care-
ful research for prior art would uncover sufficient examples from the 1980’s to invalidate huge num-
bers of current software patents.
Just as distributed file systems were developed in the 1980’s, so were distributed operating systems.
The latter bear a very strong resemblance to P2P systems. In this spirit we next describe The V-System
that was developed at Stanford University.

2.1.4 The V-System


One thing that can be said about the 1980’s is that all efforts were made to support heterogeneous com-
puting environments. We’ve already mentioned the multiple network protocols that were present.
Along with these protocols one also found a large assortment of computers. At Stanford University, for
example, research was done on systems such as Sun workstations, VAX’s, DEC-20’s and Lisp
machines3. These systems also supported student users. Appropriately enough one would expect dis-
tributed systems research to look at ways to use these machines in a co-operative fashion, and this is
exactly what the V-System did under the guidance of computer science professors David Cheriton and
Keith Lantz. It brought together the computing power of the above collection of machines in a way that
was really P2P. The major goal of the V-System was to distribute processing and resources, and to do
so with protocols and API’s that were system independent. Why discuss the V-System in detail? As you
will see, the organization of the V-System, it’s model, the approach that was taken towards develop-
ment, were carefully thought out and implemented to satisfy the needs of the user; to separate each
system component with API’s and protocols that were machine independent; to yield a system that had
user satisfaction and performance as primary goals rather than after thoughts; and addressed its net-
work protocols to the IETF. The software development practices adhered to by the graduate students
were way ahead of their time. All protocols and API’s were carefully documented and rules for writing
consistent C code were strictly followed. And, last but not least, it exhibited many features of P2P sys-
tems.
The V-System begins with its user model. Each user had a workstation, and state-of-the-art user inter-
face support was a first principle. “The workstation should function as front end to all available
resources, whether local to the workstation or remote. To do so the V-System adheres to three funda-
mental principles:
1. The interface to the application programs is independent of particular physical devices or
intervening networks.
2. The user is allowed to perform multiple tasks simultaneously.
3. Response to user interaction is fast [V-SYSTEM].”

3. Bill Yeager wrote an Interlisp version of the VGTS in 1985. The VGTS is a V-System component and explained in this sec-
tion. The Interlisp VGTS was used to demonstrate the power of remote virtual graphics by communicating to a V-System
running on a Sun workstation where the graphics were displayed. The graphics were generated on a Xerox D-machine.

2-9
It is refreshing to see the user placed first and foremost in a research project. All processing was paral-
lel, and a “reasonably sophisticated” window system was employed. Applications ran either locally or
remotely and when user interaction was required were associated with one or more virtual terminals.
“The V-System adheres to a server model [V-SYSTEM].” In the V-System resources are managed
servers and accessible by clients. One can view a server as an API that hides the resource it represents
and thus it is by the means of the API that the resource can be accessed and manipulated. The API’s are
well defined across servers thus yielding consistent access. In a sense, the V-System has many proper-
ties of today’s application servers with perhaps the following exception. A server can act as a client
when it accesses the resources managed by another server. “Thus, client and server are merely roles
played by a process [V-SYSTEM].” And here, we see the P2P aspect of the V-System.
It is easy to imagine the collection of workstations running the V-System all sharing resources in a
symmetric way. The resources can be cpu cycles or content or both. This is again pure P2P. Let’s look
a little more closely to see what else can be revealed.
The system is a collection of clients and servers that can be distributed throughout the Internet, and
that can access and manipulate resources by the means of servers. The access to a resource is identical
if the resource is local or remote since there are API’s and protocols that are used for this access. This
access is said to be “network transparent.” There is a principle of P2P that resources will tend to
migrate closer to the peerNodes that show interest in them. The V-System has a similar feature. V-Sys-
tem clients may influence or determine the location of a resource.
In order to support the server processes the V-System has a distributed kernel which is the collection of
V-Kernels that run on each machine or host in the distributed system. “Each host kernel provides pro-
cess management, interprocess communication, and low-level device management facilities.” Further-
more, there is an Inter-kernel Protocol (IKP) that permits transparent, inter-process communication
between processes running in V-Kernels. Let’s take a quick look at a few of the typical V-Servers:
1. Virtual Graphics Terminal Server: Handles all terminal management functions. There is
one per workstation. An application may manipulate multiple virtual terminals and the Vir-
tual Graphics Terminal Protocol (VGTP) is used for this purpose. The VGTP is an object
oriented protocol where the graphic objects can be recursively defined by other graphic
objects and thus the VGTS supports structured display files which are highly efficient with
respect to both the frequency of communication and amount of data communicated.
2. Internet Server: Provides network and transport level support.
3. Pipe Server: Standard asynchronous, buffered communication.
4. Team Server: Where a team is a collection of processes on a host, the team server pro-
vides team management. Noting that applications can migrate between hosts, this migra-
tion and remote execution is managed by the team server.
5. Exception Server: Catches process exceptions and manages them appropriately.
6. Storage Server: Manages file storage.

2-10
7. Device Server: Interfaces to standard physical devices like terminals, mice, serial lines
and disks.
It is therefor simple to visualize a typical workstation running the V-System, and users running appli-
cations communicating with processes which form teams all managed by the distributed V-kernel’s
servers. The symmetry of client/server roles is clear, and symmetry is at the heart of P2P.
Now, suppose that the distributed V-Kernel is active across a LAN on multiple hosts, and that there are
team processes on several of the hosts that have a common goal, or at least a need to share resources.
What is the most efficient way for this communication to take place? First, we need to organize the
teams. In the V-System the teams are organized into host groups. A host group is a collection of servers
on one or more hosts. And, certainly, there can be many host groups active at the same time in the V-
System. They are similar to our connected communities as well as Jxta peer groups. In fact, a host
group can be implemented as a connected community. Again, the computer science roots of P2P reach
back at least to the 1980’s.
In order to efficiently communicate between the distributed host groups the V-System uses mulitcast
that is first described in rfc966, and ultimately obsoleted by rfc1112. The authors of rfc966 are David
R. Cheriton and Steve Deering. Steve was a computer science graduate student in David’s distributed
systems group. The author of rfc1112 is Steve Deering. Rfc1112 is entitled “Host Extensions for IP
Multicasting.” Rfc1112 is an Internet standard. What follows is an excerpt from rfc1112:
IP multicasting is the transmission of an IP datagram to a “host group”, a set of zero or
more hosts identified by a single IP destination address. A multicast datagram is delivered
to all members of its destination host group with the same “best-efforts” reliability as regu-
lar unicast IP datagrams, i.e., the datagram is not guaranteed to arrive intact at all members
of the destination group or in the same order relative to other datagrams.
The membership of a host group is dynamic; that is, hosts may join and leave groups at any
time. There is no restriction on the location or number of members in a host group. A host
may be a member of more than one group at a time. A host need not be a member of a
group to send datagrams to it. A host group may be permanent or transient.
Indeed, host groups are the forerunners of connected communities. To accommodate host groups in
IP.V6 there are dedicated group multicast addresses.
It would have been quite simple to implement P2P chat rooms in the V-System given the VTGS. The
implementation would have been quite efficient with the use of IP Multicasting as it is implemented.
This is because IP Multicasted datagrams were directed to the subnets on which the host groups reside.
On that subnet a single IP datagram is multicast to all of the host group members yielding a huge sav-
ings in bandwidth. This is very much like the mBone that is used for multicasting video on the Internet.
Content sharing would also be straight forward with the storage server and VTGS. The V-System could
have been also used for grid computing where host groups partition the grid for targeted, host group
based calculations.

2-11
Finally, we are sure that the V-System is not the only example from the 1980’s of a distributed system
that is very P2P-like in its behavior. P2P is really a naissant capability that a couple of decades has
brought to the mainstream. We next look at the decade of the 1990’s that was a decade of maturation of
the ideas from the 1980’s with a lot of help from excellent hardware engineering taking advantage of
Moore’s Law.

2.2 The 1990’s - The Decade of the Information Highway


Recall from section 2.1.1 that the Internet had been so successful that on April 30, 1995 NSF aban-
doned the NFSNET backbone in favor of a fully privatized backbone having achieved a growth to
about 100,000 networks in the United States. During the same time the Internet4 was on a global
growth path. While universities, research laboratories, governments and companies were discovering a
better, more stream lined way of doing business using the Internet, it is clear that the invention of the
world wide web by Tim Berners-Lee in 1991 was the real force behind bring the Internet from where it
was then to where it is now, in 2004.
Tim Berners-Lee writes, “Given the go-ahead to experiment by my boss, Mike Sendall, I wrote in
1990 a program called “WorlDwidEweb”, a point and click hypertext editor which ran on the “NeXT”
machine. This, together with the first Web server, I released to the High Energy Physics community at
first, and to the hypertext and NeXT communities in the summer of 1991. Also available was a “line
mode” browser by student Nicola Pellow, which could be run on almost any computer. The specifica-
tions of UDIs (now URIs), HyperText Markup Language (HTML) and HyperText Transfer Protocol
(HTTP) published on the first server in order to promote wide adoption and discussion.[Berners-Lee]”
The first web server, info.cern.ch, was put on-line in 1991 and the access grew by an order of magni-
tude each year up until 1994.
By 1994 the interest in the web was so large in both business and academia that Tim decided to form
the World Wide Web Consortium (w3c). At the same time a series of rfc’s specified the protocols and
definitions in the IETF:
1. rfc1630 Universal Resource Identifiers in WWW: A Unifying Syntax for the
Expression of Names and Addresses of Objects on the Network as used
in the World-Wide Web. T. Berners-Lee. June 1994.
2. rfc1738 Uniform Resource Locators (URL). T. Berners-Lee, L. Masinter, M.
McCahill. December 1994.
3. rfc1866 Hypertext Markup Language - 2.0. T. Berners-Lee, D. Connolly.
November 1995.

4. The term “Internet” as we use it includes both the private and public networks. Purists my find this objectionable, but in
hindsight that is what the Internet became in the mid-90’s.

2-12
4. rfc1945 Hypertext Transfer Protocol -- HTTP/1.0. T. Berners-Lee, R.
Fielding, H. Frystyk. May 1996.
The 1990’s also launched the commercial use of the Internet. There was resistance from academics to
the commercialization. “Many university users were outraged at the idea of non-educational use of
their networks. Ironically it was the commercial Internet service providers who brought prices low
enough that junior colleges and other schools could afford to participate in the new arenas of education
and research[HISTINTERNET].”
With the end to commercial restrictions in 1994 the Internet experienced unprecedented growth. ISP’s
flourished and began to offer both web access and email service. Fibre optic cables were pulled to glo-
bally connect major industrial areas, satellite service was added, and the web went mobile-wireless
with the introduction of the Wireless Access Protocol (WAP) in the late 1990’s bringing web access to
mobile phones and PDA’s. As we are all aware, by the end of the decade web sites flourished, and
the.COM era arrived. Perhaps the best measure of this growth is the number of web pages that are
indexed: “The first search engine, Lycos, was created in 1993 as a university project. At the end of
1993, Lycos indexed a total of 800,000 web pages[HISTINTERNET].” Google currently indexes
4,285,199,774 web pages. In a little over ten years the increase is 5,000-fold!
With respect to standards, during the decade of the 1990’s the IETF was extremely active. 1,679 rfc’s
were published. 380 were published in the previous decade. The intellectual contribution to the IETF
was escalating, and this is a standards body that became extremely formal during the 1990’s. Processes
were put in place to create application areas, working groups and an overall governing body for the
IETF. The WAP specification 1.0 was released in 1999, thus giving a standards foundation for proxied
Internet access by mobile phones. As is discussed in chapter 6, Java standards for these same devices
were put into place with MIDP 1.0 in the Fall of 1999. These standards along with the increase of
Internet bandwidth brought to the users of the Internet the protocols to access a diversity of content
types on a variety of devices. In the 1980’s we had for the most part text based content. The innova-
tions of the 1990’s provided multi-media content to the masses: Text, images, sound, and video. And,
the masses loved it!
Finally, we have the issue of security and in particular ciphers and public key algorithms. With respect
to the former, the patent for the DES cipher expired in 1993. Although 56bit DES can be cracked by a
brute force attack which makes it obsolete, 3DES was introduced to make up for this shortcoming.
Also, Bruce Schneier introduced the Blowfish cipher in 1993 as a public domain DES alternative. With
respect to the latter, Diffey-Hellman expired in 1997 and on September 6, 2000, RSA Security made
the RSA algorithm publicly available and waived its rights to enforce the RSA patent. Thus, by the end
of the 1990’s developers were able to secure their software without a concern for royalties which gave
a large boost to e-commerce on the Internet.
As we reflect upon the last few paragraphs, the one salient thing beyond the global connectivity pro-
vided by the Internet, beyond the hardware, that motivates this growth is the driving force for people to
communicate and to be entertained. Behind it all is a starvation for social interaction, for access to
information for education, and entertainment. They will pay dollars for the applications that fulfill

2-13
these needs. And, here the stage is set for P2P to enter the scene and play a major role. The Information
Highway is in place and ready to handle the traffic!

2.3 The New Millennium


P2P exploded into the public’s eye in 2000 with the flurry of lawsuits against Napster for contributing
to the infringement of copyright by its users or peers. By that time billions of MP3 music files had been
exchanged by users of the Napster network and client. The network was a collection of servers that
indexed music found on users’ systems that ran the Napster client. The Napster client used the servers
to find the indexed music and the peers on which it resided, and the network provided the mechanisms
necessary to permit the music to be shared between peers. Napster was created by Shawn Fanning in
May of 1999 as a content sharing application. It was not P2P in the purest sense. At its peak there were
160 Napster servers at the heart of the Napster network. The lawsuits had the ironic effect of popular-
izing Napster. In March of 2001 a ruling by a U. S. District Court of Appeals upheld an injunction
against Napster thus requiring it to block copyrighted songs.
In June of 1999 Ian Clarke brought us Freenet. Freenet, unlike Napster, is a pure P2P system. Ian is
very interested in personal privacy and the freedom of speech. In an article by Ian on “The Philosophy
of Freenet, Ian states:“Freenet is free software which lets you publish and obtain information on the
Internet without fear of censorship. To achieve this freedom, the network is entirely decentralized and
publishers and consumers of information are anonymous. Without anonymity there can never be true
freedom of speech, and without decentralization the network will be vulnerable to attack
[FREENET].” Communications by Freenet nodes are encrypted and are “routed-through” other nodes
to make it extremely difficult to determine who is requesting the information and what its content is.
Ian’s most recent P2P system is Locutus. Locutus emphasizes security, runs on.NET and is targeted to
the Enterprise.
A third generic P2P system of this decade is Gene Kan’s Gnutella. Gnutella is an elegant protocol for
distributed search with five commands. It too is pure P2P where each peer plays the role of both a cli-
ent and a server. A brief description is the following:
Gnutella 2 is a protocol for distributed search. Although the Gnutella protocol supports a
traditional client/centralized server search paradigm, Gnutella’s distinction is its peer-to-
peer, decentralized model. In this model, every client is a server, and vice versa. These so-
called Gnutella servants perform tasks normally associated with both clients and servers.
They provide client-side interfaces through which users can issue queries and view search
results, while at the same time they also accept queries from other servants, check for
matches against their local data set, and respond with applicable results. Due to its distrib-
uted nature, a network of servants that implements the Gnutella protocol is highly fault-tol-
erant, as operation of the network will not be interrupted if a subset of servants goes offline
[GNUTELLA].

2-14
Gnutella has undergone a huge amount of analysis since it was launched. It had weaknesses, and these
weaknesses were part of its strength. They encouraged, and yield excellent research in P2P and as a
consequence improved algorithms. The P2P world is really grateful to Gene for his vision of P2P, and
his energy as an evangelist of the P2P technology.
Finally, we close our discussion of the history noting the launch of Sun Microsystems’ Project Jxta on
April 25, 2001. Jxta is open source and has been under continuous development since this time. It’s
specifications define a P2P infrastructure that includes both peer nodes and super-peers called rendez-
vous. The Jxta infrastructure is fully realized with a Java implementation. From the beginning one of
the goals of Jxta has been to create a P2P standard. Currently, P2P networks are not interoperable with
differing protocols creating P2P networks that are isolated islands. As a direct consequence of this
desire to standardize P2P by members of the Jxta community, there is now an Internet Research Task
Force Research Group on P2P. Jxta is used world-wide by P2P enthusiasts for creating P2P applica-
tions. The web site is http://www.jxta.org.
We have made a conscious choice in writing this book to not be encyclopedic and thus, not list the
remaining P2P applications and networks that now exist. No such list will ever be current until a con-
sensus is reached on a P2P standard for the Internet. What we have now are cul-de-sac protocols that
cannot possibly do justice to the possibilities of P2P imagined by its visionaries. Understandably, these
dead-end alley ways are driven by the desire to capitalize, to profit on P2P. While not bad in itself,
since capital is necessary to support research and development, we really want to see the history of P2P
come to the place where the agreement on a standard is reached.
We hope that this brief history of P2P has given the reader an idea of its roots some of which are not
apparently P2P on the surface. So much of technical history is filled with side-effects. One can not
always guess what a new idea will bring along with it. The original Internet of the 1980’s had very few
worries about security until the end of the decade when it became global and surprise hackers arrived
to cause serious problems. The Internet’s age of innocence was short lived. Still, the energy and cre-
ativity of those who continued to build out this amazing infrastructure could not be stopped. Security
problems are an impedance that this creative energy continues to overwhelm. Most of the history of
P2P is in front of us. Let’s get to work to realize its possibilities.

2-15
Chapter 3
Components
of the P2P
Model
From thirty thousand feet a P2P overlay network appears as a collection of
peer-nodes that manage to communicate with one another. At this altitude it is
sufficient to discuss content sharing, its pro’s and con’s and how it will create a
new Internet digital economy as was done in Chapter 1. In order to come
down out of the clouds and to discuss the ground level engineering concepts it
is necessary to define the real engineering parts of these peer-nodes, the
components that comprise the P2P network, as well as the protocols used for
peer-node to peer-node communication. This is not unlike assembling a
detailed plastic model of a classic automobile, or futuristic spacecraft. Each
part is important, and the rules for assembling them correctly, i.e., the blue-
prints, are indispensable to the process. To this end, in this chapter we first
have a discussion of the P2P document language which is a universal,
descriptive meta-component (a component for describing components). Just
like the final blueprint of a home is always a collection of blueprints, one
describing the plumbing, several for the multiple views, others for each room,
the exterior walls, etc..., our final peer-node document will also be a collection
of documents. Thus, as we define our P2P model component by component in
this chapter, by starting with those that are fundamental and using these as
building blocks for more complex components, we will be defining a set of 4PL
types and combinations thereof. These types will then be the grammar of our
document language and permit us to create the multiple blueprints that will be
the engineer’s guide to constructing a peer-node. To help the reader build a
conceptual understanding of the final, assembled model, each section
explains the motivations and behaviors of the components it defines. It is from
these explanations and 4PL that we derive the semantics of the document lan-
guage.

3-1
3.1 The P2P Document Space

3.1.1 XML as a Document Language


In any society, to establish a communication between people, either every one needs to speak a com-
mon language or to be able to translate their language into a language which can be understood by both
parties. Peer-node to peer-node P2P network communication is not an exception to this rule. For exam-
ple, either for the transfer of content or to initialize connections, one peer-node will send messages to a
target peer-node, the target peer-node will send responses. The “language” in which the messages are
expressed must be understood by all peer-nodes participating in the communication. In the networking
world, the meaning of such a “language” is not the same as that of a programming language in the
computing world. Instead, the former language permits us to define the structure (or format, or syntax)
in which messages are written, and unlike programming languages, this structure is independent of the
message’s semantics, or meaning. This structure should be able to allow messages not only to say
“hello”, but also to permit peer-nodes to negotiate a secure communication channel, and to transfer the
multi-dimensional data along that channel. The semantics of such a negotiation, or data transfer will be
defined by the associated protocols’ behavior and not the underlying document language. A required
feature of the document language is to permit the creation of structured documents with flexibility of
descriptors or names. It is difficult to describe almost arbitrary peer-node components in a document
language whose descriptor set or namespace is fixed. The language of choice must also be an accepted
standard in the Internet community and off-of-the-shelf, open source parsers must be available. It
should also be simple to write parsers for minimal, application defined namespaces so that it can be
used across the device space.
Extensible Markup Language (XML) [XML] naturally meets above requirements, is a widely
deployed on the Internet, is a World Wide Web Community (w3c) standard, and for us, is an ideal
markup language to create structured documents that describe the various engineering components we
require in order to communicate their properties amongst the peer-nodes on a P2P network. With its
structured format, XML parsers are easily implemented. Also, the tags used by XML are not fixed like
in HTML, and therefore, the elements can be defined based on any application’s needs. Each applica-
tion can have its particular XML namespace that defines both the tags and the text that appears
between these tags that the application uses. Because of these properties, HTML can be expressed in
XML as XHTML[XHTML]. Given the freedom to chose tags suitable for specific applications or
devices, XHTML basic [XHTMLbasic] which is a subset of XHTML is used for mobile phone like
devices. There are XHTML basic browsers with a footprint of about 60K bytes which shows the pro-
grammability and power of XML. The parsers are a small percentage of this overall browser footprint.
In other words, XML is a versatile markup language that is able to represent the nature of both com-
puter and human behavior. For example, one can imagine a small “talk” application expressed in
XML. Here at the highest level John talks to Sam:

3-2
<?xml version=”1.0”?>
<talk-behavior>
<run> talk.jar </run>
<from> John </from>
<to> Sam </to>
<text> Hi Sam! </text>
</talk-behavior>

John’s local system analyzes the document, starts the talk program, connects to Sam’s system, and
sends the document. Upon receipt, Sam will see something like:

Message from John: Hi Sam!

The structure is in the document, the behavior is in the talk program. This is admittedly an over simpli-
fication but does express the power of XML. In this case, for example, the talk application might only
need to have four or five tags, and XML is used to describe several programmatic actions:
• The “run” tag is interpreted by an application to mean to run a program named talk.jar.
Here, the extension implicitly will invoke Java to accomplish this task.
• The “from” and “to” tags implicitly describe the local and remote ends of a communication
and are explicit parameters for talk.jar.
• Finally, the “text” tag is also a parameter for talk.jar and the program’s code sends this as a
text message to Sam. Notice that the data is included in the scope of <text>... </text>.
Again, it is important to emphasize that the meanings of the tags are not implied by the XML docu-
ment. Our brains have associations with respect to the tag names, and they are named in this manner
because humans have come to understand “run program.” We could have just as well used <u>, <v>,
<w>, <x>, and <y> as tags:

<?xml version=”1.0”?>
<u>
<v> talk.jar </v>
<w> John </w>
<x> Sam </x>
<y> Hi Sam! </y>
</u>

The talk program’s code doesn’t care, and can be written to interpret any text string to mean “send the
message to the name bound to the tag pair defined by this text string.” After all, this is just a string
match.
Now, let’s generalize the above example to explain what is meant by Meta-data, or data about data.

3-3
<?xml version=”1.0”?>
<behavior>
<from> tcp://John </from>
<to> tcp://Sam </to>

<meta-application
xmlns:dc="http://purl.org/dc/elements/1.1/">
<dc:name> talk.jar </dc:name>
<dc:type> java </dc:type>
<dc:version> 1.0 </dc:version>
<dc:size> 27691 </dc:size>
</meta-application>

<metadata
xmlns:dc="http://purl.org/dc/elements/1.1/">
<dc:access-control>
<dc:access> read-only </dc:access>
<dc:path>
file:///home/John/friends-only
</dc:path>
</dc:access-control>
<dc:greeting>
<dc:filename> hello </dc:filename>
<dc:filetype> txt </dc:filetype>
</dc:greeting>
<dc:attachment>
<dc:content-type> image/gif </dc:content-type>
<dc:filename> John.gif </dc:filename>
</dc:attachment>
<dc:attachment>
<dc:content-type> video/jpeg </dc:content-type>
<dc:filename> Hawaii.jpeg </dc:filename>
</dc:attachement>
</metadata>
</behavior>

In the above example “xmlns:dc” identifies the namespace with a Uniform Resource Identifier (URI)
[RFC2396] “http://purl.org/dc/elements/1.1/.” This latter URI name need only have
uniqueness and persistance, and is not intended to reference a document. There are several examples of
meta-data: The application version, size, type, the access control fields, and the attachments’ content
types. Because meta-data is such a powerful tool, many efforts have been made to standardize its for-
mat, such as the open forum Dubin Core Metadata Initiative (DCMI), the w3c standardization commu-
nity and the knowledge representation community. Out of the w3c efforts we have the Resource
Description Framework (RDF) [RDF]. The goal of RDF is not only to specify what kind of tags are
needed, but also to enable the creation of relationships between these tags, i. e., RDF is explicitly about

3-4
semantics, and uses XML syntax to specify the semantics of structured documents. For example,
here’s a relationship: Holycat is the creator of the resource www.holycat.com.
RDF will indicate this relationship as the mapping to the proper tags: Holycat as the creator in an RDF
“Description about” www.holycat.com. The relevant part of the metadata is below:

<?xml version=”1.0”?>
<rdf:RDF
xmlns:rdf=”http://www.w3c.org/1999/02/22-rdf-syntax-ns#”
xmlns:dc="http://purl.org/dc/elements/1.1/">
<rdf:Description rdf:about=”http://www.holycat.com”>
<dc:title>
Holy Cat .com
</dc:title>
<dc:creator>
holycat
</dc:creator>
</rdf:Description>
</rdf:RDF>

In the sections and chapters that follow we require a markup language for the structured documents
we define to describe the P2P components of our overlay network. We are selecting XML for this pur-
pose. As mentioned above, it is a w3c standard with wide and growing support in the high-tech indus-
try; permits us to create our own namespace to clearly express the concepts of each component; and
the engineers who wish to implement a system modeled on these components will have the tools avail-
able to parse the documents [XMLparser], or write their own, application, namespace specific XML
parsers as has been done for many small devices, and existing P2P systems.

3.1.2 Publish and Subscribe: Administrated vs. Ad-hoc


Our P2P document language, XML, will provide structured information describing our components,
their network behavior, the content exchanged between peer-nodes, crytographic data, and much more.
When this information is published either locally, or remotely, it is crucial to efficiently administer this
large document space, and this administration may or may not be centralized. Recall the “P2P Spec-
trum” introduced in Chapter 1. A P2P network may be configured as hybrid or pure ad-hoc, and each
point in the spectrum needs various methods to distribute, publish and subscribe to these documents as
well as the policies that control both the publication and subscription. Inside a hybrid P2P network,
any document can be stored and managed on centralized nodes. The more centralized the network
becomes the more server-like these peer-nodes become, and consequently, the P2P network has much
more control imposed upon the peer-nodes’ behavior. For example, the initial copies of the most popu-
lar music, digital certificates, the status of critical peer-nodes, and the registration of peer-node names
can be published on content, enterprise key ESCRO, system status, and naming servers, and access is

3-5
controlled by means such as passwords, and firewalls. On the other hand, in a pure, ad-hoc P2P net-
work, the information is not required to be centrally stored, and the administrative polices that control
its access are set between the peer-nodes themselves. In fact, in this latter case, all peer-nodes may
have unrestricted read access to all documents.
3.1.2.1 Administrated Registries
There are already existing, well administrated registries in use on the Internet. In 1983 the concept of
domain names was introduced by Jon Postel [RFC811]. This was accompanied by the full specifica-
tions [RFC882] by Paul Mockapetris as well as implementation specifications [RFC883] and an imple-
mentation schedule [RFC897, RFC891]. Soon afterwards, the Domain Naming Service (DNS) was in
place and was exclusively used for domain name to IP address lookups, or conversely. With time the
DNS implemention has become a general, administered, Internet database. Thus, we can manage XML
documents in such a fashion. While it is possible to store an entire XML document in DNS servers,
this is impractical given the billions of possible peers and their associated documents. On the other
hand, several fields these documents must be unique, and in some cases, their creation controlled by
those who administer the P2P network in question. Collisions of the some fields that require unique-
ness, given the right algorithms for their generation, will be probabilistically zero. Other such fields
will be text strings, for example, a peer’s name, and as such, have a high probability of collision. These
will certainly be administered and controlled in enterprise deployments of P2P systems, and DNS may
be appropriate for these names. The problem with DNS is that it is that this service is already over-
loaded and adding even millions of additional entries is not a good idea.
The Lightweight Directory Access Protocol (LDAP) [RFC1777] provides naming and directory ser-
vices to read and write an X.500 Directory in a very efficient way. Its operations can be grouped to
three basic categories: binding/unbinding which starts/terminates a protocol session between a client
and server; reading the directory including searching and comparing directory entries; and writing to
the directory including modifying, adding to and deleting entries from the directory. Such a simple
protocol can be used to store and access our P2P documents. LDAP has the advantage over DNS that
the administration and scope of its directories are more flexible. A small company, or even a neighbor-
hood in a community can decide to put in place and administer its own LDAP directory. For a highly
centralized P2P network it is appropriate to store entire XML documents or selected fields from these
documents in LDAP directories. Tagged fields can have URI’s referencing LDAP directory entries. In
certain cases, it will be necessary to authenticate peer’s so that they can access, for example, private
data, and their login name and password can be validated through an LDAP search.
Here is a hypothetical example of using LDAP to access XML fields from an LDAP directory. Assume
LDAP server, P2PLDAPServer, exists with host name P2PCommerce.com. Furthermore, assume that
the organization P2PCommerce.com supports shopping in department stores. Now, a client will make
a query to search for Anne-Sophie Boureau who works for a department store in Paris, France. The
“Grand Magasins” in Paris are well organized, and have XML documents for each employee.

<?xml version=”1.0”?>

3-6
<rdf:RDF
xmlns:rdf=”http://www.w3c.org/1999/02/22-rdf-syntax-ns#”
xmlns:dc="http://purl.org/dc/elements/1.1/">
<rdf:Description about=”http://www.holycat.com”>
<dc:title>
Holy Chat .com
</dc:title>
<dc:creator>
holychat
</dc:creator>
<dc:dn>
<dc:uid> Anne-Sophie </dc:uid>
<dc:company> holychat.com </dc:company>
</dc:dn>
<dc:cn>
Anne-Sophie Boureau
</dc:cn>
<dc:gn>
Anne-Sophie
</dc:gn>
<dc:sn>
Boureau
</dc:sn>
<dc:email>
Anne-Sophie.Boureau@holychat.com
</dc:email>

</rdf:Description>
</rdf:RDF>

So, in the following code the client creates a query, then makes a connection to the Nth server. After the
connection succeeds, the client can perform the search and save the result:

LDAP_NAME[N] = "P2PLDAPServer";
LDAP_SERVER[N] = "P2PCommerce.com";
LDAP_ROOT_DN[N] = "ou=DepartmentStores,o=P2PCommerce.com";
LDAP_QUERY = "L=Paris,C=France,CN=Anne-Sophie Boureau";
connect_id = ldap_connect(LDAP_SERVER[N]);
search_id = ldap_search(connect_id, LDAP_ROOT_DN[N], LDAP_QUERY);
result = ldap_get_entries(connect_id, search_id);

In the result, we will have several items, common name (cn), distingished name (dn), first name (gn),
last name (sn), and email address (email).

result["cn"] = "Anne-Sophie Boureau"

3-7
result["dn"] = "uid=Anne-Sophie,company=holychat.com"
result["gn"] = "Anne-Sophie"
result["sn"] = "Boureau"
result["em"] = "Anne-Sophie.Boureau@holychat.com"

3.1.2.2 Ad-hoc Registries


Peers will be required to have a set of unique XML documents to describe the component of which
they are comprised. Since the overlay network is really the join of these components, then the network
itself will be consistent and there will be no conflicts from duplicate documents. The details of each of
these XML documents is described later in this chapter and Chapter 4, and are not important for this
discussion. When a collection of nodes on a network decides to form an ad-hoc, structured P2P overlay
network, the associated XML document management cannot rely on the centralized “super” machines,
instead, in our model for P2P behavior, in a purely ad-hoc network, each peer manages its own docu-
ment space, as well as conflicts due to duplication to assure consistency. Suitable algorithms are
described in the following sections to avoid duplication, i. e., the probability of duplication is
extremely small, and when duplication arises the code will be present to deal with it. Other degrees of
ad-hoc behavior are possible. For example, an ad-hoc P2P network may have a peer-node, or several
peer-nodes whose systems are reliable. In this case one can off-load the real-time registration of these
documents to these peer-nodes. Registration of documents is purely ad-hoc without any administration
except for the knowledge of how to connect to these peer-nodes. One advantage of this pseudo-central-
ization is the presence of systems whose role is real-time guardians of document consistency. The sys-
tem that volunteers for this role is making the commitment to provide solid or near-solid services:
being up constantly, having a predictable degree of reliability so that the problem of duplication XML
documents is non-existent.
We create unique documents for components by including unique identifiers (see section 3.2). An
enterprise P2P overlay network behind firewalls can guarantee consistency to eliminate duplications
by helping to form unique identifiers, and managing documents on enterprise servers. But when these
peer-nodes connect to an non-enterprise P2P overlay network, this guarantee is difficult to maintain
because there is no global monitoring of unique identifiers and the algorithms used to generate unique
identifiers may permit the duplication some of the components’ XML documents, consequently yield-
ing an inconsistent join of these different overlay networks. Thus, even if both of these P2P networks
are strictly administered with independent, centralized control to guarantee each one’s consistency, we
run into this problem. The same problem exists for the pure ad-hoc network discussed just above.
Because there is no global registration joins might be inconsistent. Figure 3-1 shows the problem in
both situations.

3-8
Sub-network_1
Adhoc

Sub-network_2

Document Administrator

Figure 3-1. The P2P Join Problem

While it is clear that for security reasons enterprises may not want to permit joins of this nature, the
join problem can be solved with global, ad-hoc registries. We can argue that prohibiting a global P2P
address space, and that is what we are describing, is the wrong way to solve such security problems,
and that such an address space is a good idea to promote P2P technology and e-Commerce. What
would the internet be like today if the IP address space was treated in the same manner. The security
solutions are best solved with security algorithms, etc... How a global P2P overlay network with glo-
bally unique identifiers, and therefor a globally consistent document space is accomplished is dis-
cussed in section 3.2.2 of this chapter.

3.2 Peer Identity

3.2.1 One Peer among Billions


As the Internet evolves during this decade, and billions of devices become Internet enabled, each of
these networked devices will be capable of being a peer-node on a P2P overlay network, and each will
require a unique identity that is recognizable independently of the device’s user and location within
this network. While we do not expect most appliances in a home network to be mobile, unique peer
identities will be necessary for a large majority of these devices, so that one peer among billions like
one human being among billions, has its own, DNA like, identity, that accompanies the node, for

3-9
example a laptop, PDA, mobile phone, or Bluetooth enabled necklace, as the former move about the
Internet, and the latter just meanders in a crowd looking for contact. To permit individual peers to have
anonymity, a peer should be capable of changing this identity at will, and this new identity must main-
tain both its uniqueness and mobility properties on the P2P network to which it belongs. You are stuck
with your DNA but not with your peer identity. And, as always, policies that permit change of identity
are set by the appropriate members of the peer network. It is clear that most enterprises, and probably
all governments will want to carefully administer the peer identities that they legally control. But the
large majority of these identities may not be registered, and registration is neither necessary nor always
desirable in a P2P network. A peer should be able to generate a unique identity when it is initially con-
figured, and the engineering fundamentals for building the P2P model described in this book will per-
mit such a peer to become a member of a P2P network, to be discovered by other peer-nodes on this
network, and to communicate with those peer-nodes. Registration of unique identities is also within
the scope of this model. The model is, in fact, indifferent to administrative policies which are decided
and implemented by the P2P network owners.
So, what can one use for such a universal peer identity? Certainly, IP version 4 (IPv4) addresses are out
of the question given that its 32 bit address space is nearly exhausted. IPv4 addresses are the current
Internet host address default, and the difficulties this raises for P2P networks are discussed in section
3.5 of this chapter. We can consider using IP version 6 (IPv6) addresses which provide a 128 bit
address space but at this time and for the near future, IPv6 will not be universally deployed. Still, in
anticipation of this deployment, the IPv6 option must be considered from the point of view of format.
If we believe that IPv6 addresses are the inevitable future for the Internet, then we can argue for at least
using the IPv6 format to both minimize the formats the software is required to support, and to stay as
close to the Internet standards as possible with a long term goal of interoperable P2P software. Another
choice is a unique random identifier of 128 or more bits generated by a secure random number genera-
tor. We can also “spice” any of these identifiers with cryptographic information to create unique,
secure crytographic based identities (CBID)[SUCV, CBID]. These and other possibilities are discussed
in the next section.

3.2.2 Unique Identifiers for Peers


As mentioned above, it was recognized by the early 1990’s that the 32 bit IPv4 address space would be
soon exhausted. Most of the early innovators of the Internet thought that 4,294,967,295 addresses were
sufficient and thus the initial IP specification, RFC760, which was authored by Jon Postel, and pub-
lished by the Department of Defense (DOD) as a DOD standard in January of 1980, allocated two 32
bit fields in the IP packet header for the source and destination IP addresses. As an historical note,
interestingly enough, the XEROX Palo Alto Research Center’s XNS network protocol arrived at the
same time as IP, and had 80 bits of address space, 48 bits for the host and 32 for the network, and the
48 bit host address was usually the 48 bit MAC address. As we will see below, using the MAC address
as part of the IPv6 address is one possible format option. It is always amusing that from an historical
perspective, so many cool things were done in the 1980’s, and that they are not abandoned, but rather,
as was pointed in Chapter 2, sit on the shelf until rediscovered. The problem with XNS was that it was

3-10
XEROX-proprietary at the time, and IP has never been proprietary. They came from two different
visions and it is clear which vision won, i. e., open standards, and this is the vision for the future. In
any case, in December of 1995, the first specifications for IPv6 were submitted to the Internet Engi-
neering Task Force (IETF) in RFC1883 and RFC18841. In order to more fully understand the appropri-
ateness of IPv6 addresses for unique identifiers, a very careful reading of RFC2373 and RFC2460,
which obsolete the previous two RFC’s, is necessary. We give the reader a good enough overview of
these RFC’s in subsection 3.2.2.1. Again, it is important to keep in mind that the immediate goal with
respect to IPv6 is, when appropriate, to use its address format to store, and publish generated, unique
identities.
As mentioned in the introduction to this section, IPv6 is not the only possible choice for the format of
what is called a Universally Unique Identifier (UUID). There are many good algorithms that can gen-
erate thousands of UUID’s per second, the IPv6 format may not be suitable in some cases, and in this
book’s model multiple UUID’s will be necessary for its component set. In subsection 3.2.2.2 these
UUID’s are discussed. Keeping this in mind let’s first move on to the primer on IPv6 addresses.
3.2.2.1 IPv6 Addresses
IPv6 solves several shortcomings of IPv4. IPv6 is designed to improve upon IPv4’s scalability, secu-
rity, ease-of-configuration, and network management [King99]. The scalability improvements reflect
both increasing the address space size as well as providing the mechanism for scalable Internet rout-
ing. We’ve adequately discussed the 32 bit address space limitation in this Chapter which IPv6 elimi-
nates with a 128 bit addresses. This is an almost unimaginable number. If we start now and use
one billion addresses per second without recycling, then we have enough addresses to last 1022 years.
Noting that our sun will go supernova in 109 years, and if the universe is closed, its calculated lifetime
is about 1011 years, clearly, IPv6 addresses solve the address space size problem for the foreseeable
future. Since IPv4 addresses do not provide a mechanism for hierarchical routing, like, for example,
the telephone exchange does for phone calls with country and area codes, IP routers’ routing table size
has become problematic as the Internet has grown in a way that was not anticipated by its founders.
With the original IPv4 address space formats, the class A, B, and C networks provided no mechanism
for hierarchical routing. The classic IPv4 address format, as defined in RFC796, permits 127 class A
networks, 16,383 class B networks, and 1,048,537 class C networks. Since this is a flat address space,
to route to every network using this scheme, an entry for each network is required in each router’s rout-
ing table. With the advent of Classless Inter-Domain Routing (CIDR)[RFC1519] in 1993, an hierarchi-
cal means of creating 32 bit IPv4 addresses was devised as a near-term solution to this problem. CIDR
is backward compatible with the older IPv4 addresses, but does not eliminate the already existing leg-
acy networks. A route to each one still must be maintained in the routing tables for all routers that pro-
vide a path to such a network, but they can coexist with the CIDR network addresses. Thus, in spite of
the CIDR near term solution, a true hierarchical addressing scheme is required, and IPv6 provides such
a mechanism.
1. For a discussion of the Internet standards process see Appendix II.

3-11
1 2 3
0 1234567890123456789012345678901
0 Network Local Address
Class A Address

1 2 3
0 1234567890123456789012345678901
10 Network Local Address
Class B Address
1 2 3
0 1234567890123456789012345678901
11 0 Network Local Address
Class C Address
Figure 3-2. IPv4 Address Format

IPv6 offers multiple ways to format it’s 128 bit addresses, and there are three types of addresses: uni-
cast, anycast and multicast. Where a node on an IP network may have more than one interface attached
to that network a unicast address is an identifier for a single interface; an anycast address is an identi-
fier for a collection of interfaces for which an anycast packet destined for this collection of interfaces is
delivered to one and only one of the interface members; a multicast address is an identifier for a collec-
tion of interfaces for which a multicast packet destined for this collection of interfaces is delivered to
all of them. Because an anycast address is syntactically indistinguishable from a unicast address, nodes
sending packets to anycast addresses don’t generally aware that an anycast address is used. We will
concentrate our explanations on those addresses which are most useful for our P2P UUID purposes. In
particular, the IPv6 aggregatable global unicast is salient here [RFC2374] since it solves the scalable
routing problem and also provides a method to generate globally unique IP addresses when used in
conjunction with IPv6 Neighbor Discovery (ND) [RFC2461] and IP stateless address autoconfigura-
tion [RFC2462]. As we see in Figure 3-3, the aggregatable global unicast address permits aggregation
in a three level hierarchy.

3-12
3 13 8 24 16 64 bits
FP TLA RES NLA SLA Interface ID
ID ID ID

Public Topology Site


Topology
Interface Identifier

FP Format Prefix (001)


TLA ID Top-Level Aggregation Identifier
RES Reserved for future use
NLA ID Next-Level Aggregation Identifier
SLA ID Site-Level Aggregation Identifier
INTERFACE ID Interface Identifier
Figure 3-3. Aggregatable Global Unicast Address Structures

The Top-Level Aggregator (TLA) identifiers are at the top node in the Internet routing hierarchy, and
must be present in the default-free routing tables of all of the top level routers in the Internet. The TLA
ID is 13 bits and thus permits 8,191 such ID’s. This will keep these routing tables within reasonable
size limits, and the number of routes per routing update that a router must process to a minimum. It is
worth noting that in spring, 1998 the IPv4 default-free routing table contained approximately 50,000
prefixes. The technical requirement was to pick a TLA ID size that was below, with a reasonable mar-
gin, what was being done with IPv4 [RFC2374].
The Next-Level Aggregator (NLA) identifier is for organizations below the TLA nodes and is 24 bits.
This permits 16,777,215 flat ID’s or can give an arrangement of addresses similar to that of IPv4 that is
hierarchical. One could, for example, do something similar to CIDR here. Next we have the Site-Level
Aggregator (SLA) for the individual site subnets. This ID is 16 bits which permits 65,535 subnets at a
given site. The low order 64 bits are for the interface identifier on the local-link to which an host with
an IPv6 address belongs. This is usually the real MAC address of an host’s interface. It is certainly pos-
sible that during certain time windows, two hosts may end up with the same such address, and there are
means available to resolve these conflicts and to guarantee global uniqueness. These are discussed just
below.
The authors of the IPv6 RFC’s understood clearly the burden IPv4 imposed on network administrators.
The seemingly simple task of assigning a new IP address, is in fact, not that simple. The address must
be unique. Yet, often enough there are unregistered IP addresses on a subnet, and in most cases the per-
pitrator is innocent, the original intent usually requiring a temporary address for a test and the tempo-

3-13
rary address was never unassigned. The unfortunate side effect is that two systems will receive IP
Address Resolution Protocol (ARP) requests from, for example, a router, and both will reply. Which
system will receive the packet that initiated the ARP is arbitrary. There is also the assignment of a
default router, and DNS servers. While most of this is now solved with the Dynamic Host Configura-
tion Protocol (DHCP)[RFC1531], it is still a source of administrative difficulty when hosts change
subnets, or IP addresses must be renumbered. Also, mobility adds a new twist to the equation (Mobile-
IP). Most large organizations have a dedicated staff to deal with these administrative issues which are
more often than not a source of painful and costly network bugs. An illustrative example, as recalled by
one of the authors, William Yeager, is sufficient here: In the late 1980’s Stanford University had net-
works that supported multiple protocols, and an organization was put in place to administer the univer-
sity’s rapidly growing local area network. One afternoon all XEROX Interlisp machines in the
Knowledge Systems Laboratory (KSL) went into hard garbage collection loops. These systems were
used as desktops as well as for research and so about one hundred and twenty five people lost the use
of their systems. Rebooting did not solve the problem. William Yeager always watched the network
traffic, he kept a network sniffer continuously active in his office, and he noticed a huge, constant
upsurge in Xerox Network Services (XNS) routing table updates and all of the routes being advertised
were illegal, constantly changing, and non-repeating. The lisp machines in question cached XNS rout-
ing table entries, and thus, were madly updating internal data structures, freeing up entries, resulting in
a hard garbage collection loop. At that time, when a router received a new route it always immediately
advertised it. These routes were originating on the backbone network from a previously unknown pair
routers. Fortunately, the KSL managed its own routers and code they ran. William generated an imme-
diate patch which was added to the appropriate router to firewall the errant routing table advertise-
ments to keep them on the backbone. A phone call to a Stanford network administrator alerted them to
the problem. It was their own. They had installed two XNS routers to support some administrative soft-
ware, and assumed they worked fine. They did on small networks, but when the number of XNS net-
works exceeded 17 all hell broke loose. The KSL had 17 such networks, and triggered this bug. The
routers were shutdown until the problem was resolved. Such scenarios are not atypical. They arrive out
of nowhere on a daily basis. Anything that can be done to ease the burden on network administrators is
important.
To simplify the task of assigning IPv6 addresses, IPv6 autoconfiguration capabilities have also been
defined. Both stateful and stateless autoconfiguration are possible. Either one or the other or both can
be used, and this information is flagged, and thus automated, in periodic IPv6 router updates. If stateful
autoconfiguration is used, then a stateful configuration server is contacted which assigns an IPv6
address from a known, administered, list. Even in this case ND, as described below, is used to assure
that the server supplied address is unique. If it isn’t the address is not assigned, and the appropriate
error message is logged.
Stateless autoconfiguration begins with the assignment of a link-local address [RFC2462] as the 64-bit
interface ID. This is usually the MAC address but any unique token will do. Next, the host uses the ND
Neighbor Solicitation (NS) Message to see if this identifier is unique. If no system complains, an
ICMP Neighbor Solicitation message would be received from a neighbor with a matching token, then

3-14
it is assumed to be unique. If it is found not to be unique, then an administrator is required to assign an
alternative link-local address. This may appear to be heavy handed, but is not. It is important to verify
if there are in fact two identical MAC addresses on the local-link. The authors believe that it is suffi-
cient to log the problem, and use a secure random number generator to create 64-bit tokens to be used
here in conjunction with ND. These can be created in such a way as not to be in MAC address format.
Such a system will at least permit a system to auto configure, and get on-line. A later administrative
action can fix the address if necessary. Next, given a unique link-local address, periodic router adver-
tisements contain the necessary prefix information to form a complete IPv6 address of 128 bits. A node
can request such an advertisement with an ND router solicitation. IPv6 addresses have preferred and
valid lifetimes where the valid lifetime is longer than the preferred lifetime. An address is preferred if
its preferred lifetime has not expired. An address becomes deprecated when its preferred lifetime
expires. It becomes invalid when its valid lifetime expires. A preferred address can be used as the
source and destination address in any IPv6 communication. A deprecated address must not be used as
the source address in new communications but can be used in communications that were in progress
when the preferred lifetime expired. Noting that an address is valid if it is preferred or deprecated, an
address becomes invalid when its valid lifetime expires. Thus, deprecation gives a grace period for an
address to pass from preferred to invalid. The interested reader can explore the complete details of
autoconfiguration in the RFC’s mentioned in this section. A full list of the IPv6 RFC’s can be found in
the bibliography.
Finally, IPv6 provides a header extension for Encapsulation Security Payload (ESP)[RFC2406]. This
can permit authentication of the data’s origin (anti-source spoofing), integrity checks, confidentiality,
and the prevention of replay attacks. The well tested MD5, and SHA-1 hash algorithms are used, and
authentication is done with Message Authentication Codes (MACs) (symmetrically encrypted hashes),
and symmetric encryption algorithms like 3DES, AES, and Camellia. Sequence numbering is manda-
tory in the ESP. They are monotically increasing and must never wrap to prevent replay attacks.
The duration or lifetime of an IPv6 address poses a problem for their use as UUID’s on an P2P overlay
network which is independent of the underlying real, in this case, IPv6 network. While the 64-bit inter-
face ID can be assumed to have an infinitely unique lifetime even if periodic ND checks must be made
to assure that this is the case, the router prefixes can expire, and do arrive with preferred and valid life-
times bound to them. Periodic router updates must be monitored to assure that an address is not depre-
cated, and if it is, then appropriate actions must be taken. These actions are discussed in detail in
Chapter 4. As mentioned in the introduction to this section, UUID’s are used to give a unique identity
to each peer on the overlay network. These peers are also mobile. Thus, if one takes a laptop from the
office to the home, or vice-versa, the IPv6 prefix will most likely change, and thus, a new UUID will
be required. Why? If the prefixes are different, which can be discovered from router updates, then there
is no way to use ND at the new location to verify the lifetime of the UUID. It could well be that if one
is at home, then at the office another system has acquired this IPv6 address because the system at home
cannot respond to ND Neighbor Solicitation Messages. This same system can join the P2P overlay net-
work using the IPv6 address as a UUID, and therefore create a conflict. This implies that when a sys-
tem becomes mobile, it must abandon its old IPv6 address and acquire another for use on the local-link

3-15
as well as for a UUID on the overlay network. This again does not impose a management problem on
the P2P overlay network given the mechanisms described in Chapter 4. One thing is clear. If IPv6
addresses as described above are used as UUID’s, then before a system disconnects from the overlay
network, if it intends to be mobile, it must be able to flush any knowledge of itself on the overlay net-
work, or the overlay network has time-to-live values associated with dynamic information that permit
this information to be expunged at expiration time.
It is important to understand that the IPv6 stateless, autoconfiguration protocols are attackable. There
are obvious attacks like a malicious host replying to all ND MS messages, thus denying any new node
the ability to auto configure. This kind of attack is detectable with a reasonableness heuristic: Generate
up to five 64 bit interface ID’s using a good pseudo random number generator. If each of these five is
denied as a duplicate, then there is an attack, and measures can be taken to find the attacker. Another
equally obvious form of this attack is a node with a duplicate interface address not responding to ND.
In this case, a duplicate IPv6 address will be created on the same local-link. Also, a node masquerad-
ing as a router and generating bogus prefixes or valid prefixes with incorrect lifetimes is possible.
It is important to understand here that even with these possible attacks, IPv6 is a major step forward,
and can be deployed while solutions to these attacks are in progress. The IETF does not stand still, and
its members are pursuing solutions. Also, IPv4 can be similarly attacked, is being attacked as we write,
and many of the IPv4 attacks are not possible with IPv6. In spite of these security problems, IPv4 has
been tremendously successful, and IPv6 will be even more so.
Finally, there are alternatives to using IPv6 addresses as UUID’s and they are discussed in the next sec-
tion.
3.2.2.2 Universal Unique Identifiers (UUID)
In the previous sections we have given a good overview of IPv6 addresses, and their appropriateness as
UUID’s on the P2P overlay network. The major problem faced with the IPv6 alternative is deploy-
ment. The attacks on IPv6 described in our chapter on security should not slow down its deployment
for general use, and are less menacing for IPv6 address format for P2P UUID’s. The most serious
attack in the P2P space would theft of peer identity. As dangerous as this sounds, recall that someone
attached to the internet can use almost any IPv4 address they wish if they are clever enough, and they
have a non Internet Service Provider (ISP) connection. ISP’s can refuse to route some addresses for
example. It is all to easy to change one’s IPv4 address with most home systems. IPv6 can be made
more difficult to attack if stateless, auto-configuration is used. There is a computational and personal
cost, the user must beware and take the right precautionary measures, and it is that cost that must be
weighed against the probability of being hacked which is miniscule. In any case, we feel that IPv6
gives us a good future solution for UUID’s for several reasons:
1) Built-in global registration,
2) Barring attacks and administrative errors, the possibility of globally unique addresses, and therefor
UUID’s,

3-16
3) IPv6 addresses can be used as UUID’s when a system is mobile to permit reattaching and acquiring
a new UUID, and here the interface identifier is almost always reusable,
4) The attacks and related security problems are being addressed as we write,
5) Global uniqueness also permits disjoint overlay networks to join as mentioned in section 3.1.2.2.
Until IPv6 is sufficiently deployed, we can implement a P2P UUID generation strategy that is quite
similar to ND. The interested reader can read Chapter 4, section 4.3.3.4, on mediator prefixed UUID’s.
There are other methods that can be used to generate UUID’s with a high probability of uniqueness
given enough bits and essentially impossible to spoof. One can use a good pseudo random number
generator, or better yet, a secure random number generator, to generate enough random bits per ID to
make the probability of duplication essentially zero. If one uses 128 bit UUID’s generated in this way,
the probability of a collision is less than winning the lottery 9 times in a row. We can never fill up the
UUID space. Yes, there will be cheaters who will attempt to create peers with duplicate UUID’s since
these values are public. This problem is currently resolvable with several emerging identifier genera-
tion techniques.
There are Statistically Unique and Cryptographically Verifiable (SUCV) Identifiers [SUCV], Crypto-
Based ID’s (CBID) [CBID], which are referred to as Cryptographically Generated Addresses (CBA) in
[Arkko02]. While the security issues discussed in these papers will be covered in chapter 5, the basic
common ideas that play a role in UUID generation will be reviewed here. Where H-x is the high order
x bits of the hash algorithm H, a host generating a UUID can do the following:
1) Create a public/private key pair using, say, RSA or Diffey-Helman.
2) Using a hash function, H, like SHA-1, generate H(Public Key), the 160-bit SHA-1 hash.
3) For a CBA use H-64(Public Key) as the CBID IPv6 interface identifier along with the high order 64-
bit prefix. This can be an IPv6 based UUID.
4) For a UUID one can also use H-128(Public Key) CBID.
Given such a UUID, a challenge can be used to verify the owner of the UUID possesses the private key
associated with the public key. When peer1 receives a document containing the UUID from peer2,
peer1 requests a private key-signed message from peer2 containing peer2’s public key, and a random
session identifier, SID, generated by peer1. The latter SID prevents peer1 from spoofing peer2’s iden-
tity in a communication with peer3. Without the SID peer1 can save and send the signed message from
peer1 thus faking the ownership of peer1’s private key. Continuing, peer1 can calculate H-128(Public
Key), and if the hash is correct, then verify the signature of the message. The signature can be a straight
forward private-key signed SHA-1 hash of the message. If the signature is correct, then the document
indeed belongs to peer2, and peer2’s identity has been established.
How can this be attacked? There are those that worry that the H-64(Public Key) interface identifier can
be attacked with brut force. Here, a successful attacker would need to find a different public/private
key pair where the public key hashes to the exact H-64(Public Key) value, i. e., find another public key

3-17
that collides with the original one. Let’s assume RSA1536 is used. First, to generate a table with 264
values let’s make the assumption that a disk drive with a 1 inch radius can hold 10 gigabytes of data.
We will need 264 64-bit or 8 byte values. A back-of-the envelope calculation says the total disk surface
required to store the collision table is about 105,000 square miles. Now, if one just wants to compute
until a collision is found and it is generous to assume that an RSA1536 public/private key pair can be
computed in 1 millisecond, then let’s assume that some time in the future the calculation will take 1
microsecond, or that multiple systems are used in parallel to achieve the 1 microsecond/public/private
key pair calculation. In this case, an exhaustive search for a collision will take 3 million years. Assum-
ing that only half of the possible values are required to achieve this single collision, this reduces to 1.5
million years. That’s a lot of CPU cycles. Even with Moore’s law, we should not lose sleep over this
attack succeeding in the near future. All this implies that 128-bit UUID’s are impossible to attack by
brute force. Other attacks are possible if care is not taken to prevent them. These are typically the
“man-in-the-middle” (MITM) attacks.
There are several ways to prevent MITM attacks: One can use a secure channel like TLS to exchange
CBID’s; one can use infrared communication with eyeball contact between the individuals exchanging
the CBID’s; out-of-band verification is also possible where upon the receipt of a 16byte CBID, the
source is contacted and asked to verify the value; and a trusted 3rd party can be a CBID escrow. Finally,
MITM attacks are not always a threat. For example, if one is exchanging mpeg or jpeg content in an
ad-hoc P2P network where CBID’s are used as UUID’s, then as long as the content satisfies the recipi-
ent, there is not real security threat. And, a great deal of P2P activity will be the ad-hoc exchange of
content. When financial information like credit card numbers are exchanged, then it is necessary to use
strong security and verifiable CBID’s. This, as with security details, is covered in Chapter 5.
3.2.2.2.1 The BestPeer Example

BestPeer [BESTPEER] is a self-configurable, mobile agent based P2P system. It is highly centralized,
and relies on the Location Independent Global Names Lookup (LIGLO) server to identify the peers
with dynamic IPv4 addresses. When a peer node joins BestPeer P2P system, it registers to a LIGLO
server. The server gives the peer node a unique global ID (BestPeerID). This ID is a combination of
LIGLO server’s IPIPv4 address and a random number which the server assigned to the peer node. The
LIGLO server saves the BestPeerID, peer node IP address pair. The LIGLO server also sends the new
peer node a list of such (BestPeerID, IP) pairs to which it can connect. When a node has a new IP
address, it should update its LIGLO server with this information. These ID’s can be easily spoofed,
thus permitting identity theft, because any MITM can watch the network activity to obtain Best-
PeerID’s and then notify the LIGLO server of a change in IP address associated with these ID’s.
3.2.2.2.2 Microsoft’s Pastry Example

Pastry [PASTRY] is a P2P overlay network performing application-level routing and object locating.
Each peer node in the Pastry network is randomly assigned a nodeID in the numerical form of 128-bits
in length. When a peer node joins the Pastry network, its ID can be generated through a cryptographic
hashing of the node’s public key or of its IP address. The value of the ID plays a crucial role when
doing the scalable application-level routing. Applications hash file name and owner to generate a

3-18
fileID and replicas of the file are stored on the nodes whose ID’s are numerically closest to the file ID.
Given a numeric fileID, a node can be routed to the node with the ID which is the numerically closest
to the file. Although there is no mention of CBID’s in [PASTRY], if the hash of the public key is used,
then CBID techniques could be used to secure Pastry routes.
3.2.2.2.3 The JXTA Example

The project JXTA is another P2P overlay network and assigns a UUID, the node’s peerID, to each
peer. The peerID is implemented as 256-bit UUID’s, is unique to the peer node and is independent of
the IP address of the node. JXTA permits peers to form groups which are called peer groups [JXTA].
The groups make the overlay network more scalable since all peer activities are restricted to the current
peer group of which the peer is a member. Also, all communication between peers on the overlay net-
work is done through pipes. Besides peer nodes, there are UUID’s for peer groups, for data, and for
communication pipes. In JXTA a UUID is a URI string, for example:
urn:jxta:uuid-59616261646162614A78746150325033E7D0CCAB80FD4EBB99BB89DD0597D12F03
The peer and its current peer group’s UUID’s along with ID type are encoded into the above yielding
the 256-bit peerID. CBID’s are also implemented for JXTA peerID’s. In this case the 128-bits of the
peer’s UUID are the SHA-1 hash of its X509.v3 root certificate. If peers use X509.v3 certificates for
peer group membership authentication, then the peer group’s UUID part of the peerID is also a SHA-1
hash of the peer group root certificate.

3.2.3 Component 1 - The Peer-UUID


We require that every peer node on the overlay network have a UUID which we call the Peer-UUID.
From the above discussion it is clear that we have many options for generating these UUID’s. The fea-
ture we desire given any of the options is global uniqueness. An absolute requirement is uniqueness
within one’s peer overlay network. If an enterprise decides to form an enterprise-wide, overlay net-
work, then registration techniques can be used to administrate uniqueness. One might consider the
SHA-1 hash of each system’s IP address or MAC address. But this can lead to problems if an enter-
prise decides to renumber its IP addresses, uses IPv6 where IP addresses have a definite lifetime, or if
one inadvertently programmatically creates two identical MAC addresses. In ad-hoc networks other
techniques are required. In this latter case the best choice is using a sufficient number of bits, x, from
the H-x (public key or X509.v3 certificate). If one uses, for example, RSA1536, then public/private
key pairs are unique. Thus if x equals 120, then the probability of a hash collision is sufficiently close
enough to zero to guarantee global uniqueness, and as discussed in section 3.2.2.2, one can get by with
even fewer bits from the hash.
Therefore, while the choice of UUID is up to the designer of the P2P software, in our discussions we
will assume uniqueness UUID’s within the overlay network, and when security is an issue, CBID
based UUID’s will be used. If one is behind a firewall, and all communication is secure, this may not
be necessary. Still, we cannot overlook the implicit advantage of cryptographic information being
embedded in the peer-UUID.

3-19
3.2.3.1 Towards a Standard UUID for Peers
Why do we need a standard? The answer is straight forward. We want to have a single, world-wide
peer-to-peer network for all devices. And, when and if the Internet becomes Interplanetary or even
Intergalactic, we want this to be true. Standards drive a world-wide Internet economy.
What should the standard look like? We didn’t intent to waste the reader’s time reading about IPv6.
This is clearly the correct approach for standardized Peer-UUID’s. As will be explained in our Chapter
4, we introduce the mediator component. Mediators behave like Internet routers on the overlay net-
work. Therefore, we can introduce protocols similar to neighborhood discovery and mediator prefix
assignment to yield Peer-UUID’s in IPv6 format.
Then, when IPv6 is fully deployed, we can then use IPv6 addresses as long as we use CBID’s for the
64 bits of interface identifier. The reasons for this latter requirement are discussed in section 3.2.2.1.
Open source cryptographic software is available for the generation of public/private keys and SHA-1
hash algorithms [BOUNCYCASTLE, OPENSSL]. Similar code can be found in versions of JDK 1.2
and higher [JDK].
3.2.3.2 The PeerIdentity document
It is not an accident that a great deal of what will described here is an outgrowth of the JXTA peer
advertisement. Both of us have worked on project JXTA, helped define the specifications, and after all,
there are generic requirements that cannot be avoided. At the highest level each peer needs an XML
document description of several basic properties which are common to all P2P systems. First, a peer
will have a human readable peerName, and a peer UUID. The name is usually assigned by the person
using the peer node during a configuration phase. Because we wish to use this peerName for applica-
tion like P2P Email (see chapter 7), the allowable characters are restricted by both XML and MIME.
For the actual details we refer the reader to the application XML and MIME specifications in the refer-
ences. The MIME Header Extensions specify how the XML characters in the name must be formatted
to satisfy email address constraints.
We do expect some peers to have machine generated peer names. Certainly, the peer name may not be
unique. In cases where uniqueness is an absolute requirement, some kind of registration is required as
discussed in section 3.1.2.1. If one were to ask for an elaboration of all of the peers on the overlay net-
work, then a list of peer name, peer UUID’s pairs would be given. In most ad-hoc networks users of
peer nodes will be familiar with the names of peers with whom they regularly communicate, and regis-
tration will not be necessary. The names may be unique in the personal peer community in which they
are being used. In this case, a peer can restrict its searches to this community and not worry too much
about unrecognized peer names. Still, it is possible to have name collisions in such a community. To
help with this situation we add an optional peer description field to the document. The description is
usually created by the user when the peer is iniitially configured, and is there to help differentiate peer
names in the case of duplication. The description will usually be a simple text string but almost any
digital data is permitted, for example, a photo.gif of the user. Note that it is always important to con-
sider performance, and PeerIdentity documents will be frequently accessed. Consequently, text is a

3-20
good choice for this field, and in the case of a gif file, or even executable code, a URN should be pro-
vided so that it can be accessed only when necessary to disambiguate name collisions. The details of
personal peer communities are discussed in section 3.4 of this chapter. A peer’s PeerIdentity document
is required to communicate with that peer. One needs a unique identity as well as the other information
discussed just below to communicate.
When two peerNodes communicate with one another or by the means of a mediator, each peerNode
must provide the other, or the mediators with the possible ways it can communicate. For example, a
peerNode may prefer to always use a secure communication channel like TLS when it is available, or
may be behind a firewall where a means of traversal such as http or SOCKS is necessary. To this end
the PeerIdentity document will contain a list of available protocols for communication.
Communication protocols can be on the real network or on the overlay network. For example, TCP/IP
is a communication protocol on the real network, and requires an IP address as well as a TCP port in its
specification. On the other hand, TLS is between peers on the overlay network and only requires the
peer-UUID in its specification since all communication on this network is between peers, and indepen-
dent of the real network and underlying physical bearer networks. Thus,
tcp://150.8.11.3.8788
udp://150.8.11.3.9999
and
tls://uuid-AACDEF689321121288877EEFZ9615731
are URI’s describing real and overlay network communication protocols that can be used to contact the
peer that includes them in the special document described just below.
Finally, a physical layer may or may not permit multicast communication. If it does, and the peerNode
is configured to take advantage of this functionality. As appropriate the multicast field is marked as
TRUE or FALSE.
Given this introduction we define the PeerIdentity document as follows:
Document type = PEERIDENTITY
Content tags and field descriptions:

<peername> Restricted Legal XML character string [XML][MIME] </peername>


<peerUUID> uuid-Legal UUID in hexadecimal ascii string </peerUUID>
<description>
<text> Legal XML character string [XML] </text>
<URN> Legal Universal Resource Name </URN>
</description>
<comprotocols>
<real> real protocol URI </real>
<overlay> overlay network URI </overlay>

3-21
</comprotocols>
<multicast> TRUE | FALSE </multicast>

There may be multiple protocols specified on both the real and overlay network.
Below is an example of a PeerIdentity documument:

<?xml version=”1.0”?>
<!DOCTYPE 4PL:PeerIdentity>
<4PL:PeerIdentity xmlns:4PL=”http://www.aw.com”>
<peername> LucBoureau </peername>
<peerUUID> uuid-AACDEF689321121288877EEFZ9615731 </peerUUID>
<description>
<text> Je suis le mec francais </text>
<URN> http://www.Beaune.org/chateau/grandCru/LB </URN>
</description>
<comprotocols>
<real> tcp://152.70.8.108.9133 </real>
<real> http://152.70.8.108.1111 </real>
<overlay>
tls://uuid-AACDEF689321121288877EEFZ9615731
</overlay>
</comprotocols>
<multicast> FALSE </multicast>
</4PL:PeerIdentity>

Using 4PL we create the above example as follows:

Document pi = new Document (PEERIDENTITY, “LucBoureau”);

The other PeerIdentity document fields will be known to the system as part of its boot time configura-
tion data. These details will vary from implementation to implementation. In some implemenations the
creation of a PeerIdenity document will automatically publish it. Document publication on the overlay
network is described in detail in Chapter 4. To functionally publish a document in 4PL we use the pub-
lish command:

publish(pi);

In the next section we discuss the Virtual P2P Network. For two peers to communicate with one
another on this network they must possess one another’s PeerIdentity document. This, of course,
enables communication on the real, underlying networks, and a P2P system using what we describe
must implement the code necessary to create, update, and analyze the above document as well as
establish communication on these real networks.

3-22
3.3 The Virtual P2P Network
Up to this point we have generally discussed the notion of an overlay network. The reader has in mind
a possibly ad-hoc collection of peer nodes with certain rights, and policies for communication. Also,
the requirement that each such peer has a UUID as a means of identification is well understood at this
point. A single UUID is necessary to facilitate communication on this network. It does in fact give a
single point of entry but lacks a means to organize the information that might be communicated. One
might argue that the parameters required to organize data triage can be included in the initial part of the
data. While this is true, first there may be several underlying transports on the real network, and we
want an end-to-end delivery mechanism. Second, including addressing information as part of the data
does not yield a very satisfactory network communication stack and hides what is really going on.
Such a stack has always been a part of networking protocols, and we will build a overlay network stack
in the same spirit.
One can ask, why not just use the IP stack and be done with it? Why go to all of the trouble of invent-
ing yet another network stack on top of network stack? As we have carefully examined in the sections
on IPv6 and UUID, in order to reestablish end-to-end network communication UUID’s are required
that are independent of the underlying real networks which may not in fact use IP In the case of IPv4,
the reader now understands the address space problem, and that IPv4 addresses cannot for this reason
be used as UUID’s. Also, with the eminent arrival of literally billions of devices the ability to create
ad-hoc UUID’s is necessary. We have mentioned that IPv6 addresses are possible candidates for
UUID’s but we still have a deployment issue here. Finally, we have the Network Address Translator
(NAT), firewall and other underlying real network barriers, as for example, the prohibition for good
reasons of propagated multicast that in turn makes long range, ad-hoc discovery impossible without
UUID’s. Thus, in order to establish a viable P2P network topology, a simple network stack where the
UUID layer is at the bottom is necessary.
Before we give the details of the overlay network stack, let’s briefly examine the IP network stack.

3.3.1 Hosts on the Internet


There are many network stacks. The most general is probably the Open Systems Interconnection (OSI)
stack which has seven layers ranging from the physical layer on the bottom to the application layer on
the top. A few examples of physical layers are ethernet, 802.11a/b, GSM, Bluetooth, and wide band
CDMA. An IP stack has five layers, and level 1 is the physical layer. The IP stack is used on the Inter-
net, and is seen in Figure 3-4 with OSI layers:

3-23
Application

Presentation Application

Session

Transport Transport

Network Internet

Network
Data Link Access

Physical Physical

OSI layers TCP/IP layers

Figure 3-4. The IP Network Stack

Level 2 is the link or network layer, and this where the device drivers do their work. On ethernet, the IP
packet has a unique number, 4096, that identifies it to the device driver and this is used to dispatch the
packet to the next level. IP is at level 3. There are other IP protocols like the IP Address Resolution
protocol (ARP), Reverse Address Resolution Protocol (RARP) at this level. The transport is at level 4.
Here we have, for example, TCP, UDP, and ICMP. Finally, the application is at level 5.
There are a multitude of network applications that run at level 5. A few examples are telnet, ftp, imap,
pop3, http, smtp, and snmp. The IP ports are well defined and registered through the Internet Assigned
Numbers Authority (IANA). For those interested in the complete list look at the latest assigned num-
bers published on the IANA web site [IANA]. As a consequence, in order to organize these applica-
tions, as discussed in the next section, the transport protocols at level 4 that dispatch data to these
applications will in this way have well defined port numbers.
3.3.1.1 Addresses and Ports
Given that each host on the IP network has an IP address by which it can be contacted, or at least if the
address is not registered, then responded to, these addresses give hosts end-to-end communication dur-
ing the lifetime of the hosts’ IP addresses. At the transport layer to permit application triage, network

3-24
application port numbers are associated with each transport layer protocol. Looking at the above short
list we have:

Application Protocol TCP Ports


telnet 23
ftp 21,22
smtp 25
http 80
pop3 110
imap 143
snmp 161

Thus, a telnet daemon listens on port 23 for incoming TCP/IP telnet connections at a host IP address.
The listening IP-address.port pair can be viewed as a socket which will accept incoming connection
requests to run the application associated with the port number. Not all port numbers are used, and this
leaves room for experimentation as well as the assignment of a random port number to the host
requesting a connection for a particular Internet application’s service. That is to say, if a host with IP
address A1 wishes IMAP service on the host with IP address A2, then the initiating host uses as a
source port, a unique, unassigned port number, PN, to be associated with A2.143, and creates the
source socket, A1.PN. The combination of A1.PN and A2.143 is a unique connection on the Internet.

3.3.2 Peers on The Overlay P2P Network


As is the case with the IP network, we also define a stack on the overlay network. This stack has three
layers because there is neither a physical nor a link layer. At level 1 is the Overlay Network Protocol
(ONP) which is analogous to IP in the IP stack, and thus, peer-UUID’s play the role of IP addresses.
There is a transport layer at level 2. Here there are two protocols which are discussed in detail later.
Where ONP messages are our IP packet equivalent, for transports we have the Application Communi-
cation Protocol (ACP) which is a reliable message protocol, and the Universal Message Protocol
(UMP) which like UDP is not reliable. Hence, for UMP, when required, reliability is application
dependent. At level 3 applications reside. As with IP, we also require virtual ports for the triage of
incoming application data.

3-25
Application

Application Communication Protocol


Universal Message Protocol

Overlay Network Protocol

Figure 3-5. The Overlay Network Stack

3.3.2.1 The Virtual Port


Like the peer-UUID, a virtual port is also a UUID that defines a point of contact for a given applica-
tion. Just like the Peer-UUID, the virtual-port-UUID can be a random number of enough bits, say 128,
to guarantee uniqueness, or as we prefer, a CBID so that cryptographic challenges can be made to ver-
ify the application source of the virtual port information. A virtual port can be ad-hoc or registered and
has an associated name that usually identifies the application. So, for example, for instant messaging
one might have the name, virtual-port-UUID pair, IMApp.UUID. Again, the names are not necessarily
unique as with IP ports unless some kind of registration is used. This will certainly be the case in more
formal, enterprise P2P networks. Continuing with the IP analogy for TCP, we might have, either on an
ad-hoc or registered network:

Application Protocol ACP Ports


chat UUID1
p2pftp UUID2
chess UUID4
mobileAgents UUID5
p2pEmail UUID6

In the case of ad-hoc networks we will thoroughly describe how peers discover such ports within the
context of their personal peer communities.
3.3.2.2 Level 2 Communication Channel Virtual Port
Once a peer possesses another peer’s PeerIdentity document, it has enough information to
communicate with that peer at level 2 on the overlay network. This “operating system” to “operating

3-26
system” communication is required in order to publish system documents to which other peers
subscribe. These system documents enable level 3 or application and services communication. In a
sense, one is distributing the operating system communication primitives across overlay network, that
is to say, we are really dynamically bootstrapping an adhoc distributed operation system. We use the
Level 2 communication virtual port (L2CVP), and UMP for this purpose. This communication is
established using the reserved virtual port 128 bit UUID whose value is all 1’s.
3.3.2.3 Unicast, Unicast Secure, Multicast and Multicast Secure Virtual Ports
There are two basic types of virtual ports on the overlay network. These are unicast and multicast. A
unicast port permits two peers to establish a unique bi-directional, overlay network connection. Simi-
larly, a multicast port accepts uni-directional input from multiple peers. Each of these ports has a
secure counterpart that can insure the authenticity of the communicating parties, and always guaran-
tees the privacy and integrity of the data that is exchanged. The actual protocols that can secure the
overlay network communication in this manner are discussed in Chapter 5. As previously mentioned, a
virtual port is identified by a UUID.
3.3.2.4 Component 2: The Name.Virtual-Port-UUID
The Name.virtual-port-UUID is our second component. As with the peer-UUID, the name must be a
legal XML character string. It is used by services and applications to both publish and establish com-
munication channels on the overlay network. The publication of this component is by the means of a
VirtualPort document.
3.3.2.4.1 The VirtualPort document

Two documents are required to be published by a peer to establish level 3, application based communi-
cation on the overlay network. The first is the PeerIdentity document as discussed above, and the sec-
ond is the VirtualPort document. The VirtualPort document is created by applications and services at
level 3, and, as the PeerIdentity document, is published and subscribed to at level 2 using the L2CVP
and UMP. See section 3.5 for an overview of publication, subscription and how communication is
established.
Given this introduction we define our VirtualPort document as follows:
Document type = VIRTUALPORT
Content tags and field descriptions:

<vportname> Legal XML character string [XML] </vportname>


<vportUUID> uuid-Legal UUID in hexadecimal ascii string </vportUUID>
<vportType> unicast | UnicastSecure | multicast | multicastSecure </vport-
Type>
<multicastGroup> uuid-Legal UUID in hexadecimal ascii string </multicast-
Group>
<expirationDate> MMM DD YYYY HH:MM:SS +/-HHMM </expirationDate>

3-27
<sourceExclusive> Right to publish ID - Hexadecimal string </sourceExclu-
sive>
<comHints> <owner> peer UUID of publisher </owner> </comHints>

The <multicastGroup> tag’s field is the UUID of the multicast group bound to the vportUUID. This
tag is exclusively for a virtualPort of type multicast, and its functionality is described in chapter 4, sec-
tion 4.1.4.
The <expirationDate> is the date after which this port is no longer accessible. If this field is missing,
then the virtualPort has no expiration date. Examples of such a date are:
Aug 03 2032 05:33:33 +1200
Jun 16 2040 05:20:14 -0800
The format is standard. The +/-HHMM is the hours and minutes offset of the time zone from GMT.
The <sourceExclusive> field provides a mechanism to prove the source for this document is the cre-
ator. Other peerNodes should not publish this document. Mechanisms for detecting false publication
are discussed in chapter 5. If this field is not present, then the default is FALSE.
The <commHints> tag is to identify to a subscriber to this document, a communication hint to aid in
contacting its publisher. Thus, by including the peer-UUID, and acquiring its associated PeerIdentity
document, a peer may have all that is required to communicate. We say “may” because of possible real
network barriers which can prohibit true end-to-end communication on the real network. In this case
additional communication hints will be necessary to add to this document. See section 3.5 and Chapter
4 for the overlay network solutions to this latter problem.
Below is an example of a Virtual Port documented:

<?xml version=”1.0”?>
<!DOCTYPE 4PL:VirtualPort>
<4PL:VirtualPort xmlns:4PL=”http://www.aw.com”>
<vportname> MobileAgent </vportname>
<vportUUID> uuid-AACDEF689321121288877EEFZ9615731 </vportUUID>
<vportType> unicastSecure </vportType>
<commHints>
<owner> uuid-61AAC8932DE212F169717E15731EFZ96 </owner>
</commHints>
</4PL:VirtualPort>

The following 4PL commands create, and publish the above VirtualPort document:

3-28
Document pi = new Document(peerIdentity, “LucBoureau”);
Document vp = new eDocument(VirtualPort, pi, “MobileAgent”, unicastSe-
cure);
publish(pi);
publish(vp);

Again, we include the creation of the PeerIdentity document for clarity of the requirements to build a
VirtualPort document. In most implementations the system will be able to get these values from a
peer’s global context. In the next section we discuss the virtual socket which is used to establish a con-
nection between two peers.
3.3.2.5 Component 3: The Virtual Socket
To enable application to application communication on the overlay network we require peer unique
network identifiers on each peer that are analogous to the IP sockets mentioned just above. We call the
unique peer-UUID.virtual-port-ID pair a virtual socket on the overlay network. On a system there are
two kinds of virtual sockets. The first kind is well known and published by the means of the Virtu-
alPort document, and its publication implies that the publishing peer will accept incoming connection
requests for this virtual socket. The second is generated by the peer that is trying to establish a connec-
tion on the overlay network. While the peer UUID part uniquely defines that peer, the virtual port num-
ber must be unique only to that peer, and can be generated in any way that peer desires as long as it
preserves peer-local uniqueness. When we are discussing published virtual sockets, that is to say, pub-
lished PeerIdentity documents and their accompanying virtual port documents, we will refer to them as
listening virtual sockets.
Note that on each peer a human readable representation of the listening virtual sockets is given by
peer-name.virtual-port-name. This permits a socket-like application programming model, which in
turn hides the underlying complexity of ther real network’s behavior.
In 4PL to create and publish a listening virtual socket as well as an outing socket on the MobileAgent
virtual port we do the following:

// First we create the PeerIdentity document


Document pi = new Document(PEERIDENTITY, “LucBoureau”);
// Second we create a unicast VirtualPort document
Document vp = new Document(VIRTUALPORT, pi, “MobileAgent”, unicast);

// We now create the listening virtual socket using the VirtualPort document
VirtualSocket vs = new VirtualSocket(vp);
listen(vs);

// We next publish the VirtualPort document so that incoming connections can


be established
publish(pi);
publish(vp);

3-29
// Create a virtual socket used to establish outgoing connections.
// This virtual socket can the be used to connect to any listening socket.
// Note: The P2P system has the peerIdentity document stored locally. Thus,
// a call to createSocket without any parameters will generate a virtual
// socket with a random source virtual port UUID. There is no virtual port
// document associated with the virtual port that is used. Rather, it only
// appears in the outgoing messages. Responses can be sent to this socket
// which is registered in the system until it is closed.
VirtualSocket local_out = new VirtualSocket();

// Imagine we have “discovered” the mobileAgent listening virtual socket,


remoteMobileAgent.

// We then open a connection as follows (see chapter 4 for a definition of


// TYPE):
VirtualChannel out = new VirtualChannel(local_out, remoteMobileAgent,
TYPE);

Given these basic fundamentals we can describe the peer communication channel which is used for
data transfer. It is important to note here that we are giving an overview of the fundamentals required
to understand the detail specifications in Chapter 4.

3.3.3. Putting it all together: Communication on the Virtual P2P Network


Each peer now can create a peerIdentity document, virtual port documents and virtual sockets. These
are the components that are necessary for establishing end-to-end communication between peers on
the overlay network. The medium for end-to-end communication on the overlay network will be called
a channel. Let’s imagine that peer1 and peer2 wish to establish a channel to permit a mobile agent to
migrate between them. Peer1 and peer2 are named rita and bill, and they have each personal data
repositories that contain restaurant reviews that they wish to share with others. Both therefore need lis-
tening virtual sockets that can accept incoming connection requests to establish either unicast or uni-
castSecure channels. In any system there must be a service or application that created the initial
listening socket, and which will send and receive data using either UMP/ONP or ACP/ONP. Whether
or not these applications can accept more than one connection request is implementation dependent,
and not important to our discussion.
Thus, we can assume without loss of generality that rita and bill have listening sockets named
rita.mobileAgent and bill.mobileAgent, that their virtual port documents have been both published and
received by one another. Furthermore, let us assume that rita is contacting bill, and vice-versa, and

3-30
have created the virtual outgoing sockets, rita.338888881, and bill.338888881, and two channels have
been established. We will then have on each of the peers:

Bill Rita

local remote local remote


bill.mobileAgent rita.338888881 rita.338888881 bill.mobileAgent
bill.338888881 rita.mobileAgent rita.mobileAgent bill.338888881

Note that we intentionally use the same virtual output port number on each peer because the only
requirement is that the number 338888881 is unique on the systems where it was created so that the
socket pairs define a unique channel. The reason that two channels are required even if they are bi-
directional is that each peer is playing the role of both a client and a server. The listening socket is the
server side of any protocol they establish between themselves. Certainly, it is possible that a single
channel can be used to send migrating mobile agents in both directions, and in this case the listening
servers would have dual personalities. This does lead to code complexity that is best avoided by adher-
ing to the strict separation of client/server roles in applications.
So, what do we have? The fundamental mechanisms that permit peers to connect with one another.
But, there are some protocols missing at this point. The above scenario is high level and postpones the
engineering details until we have established a more complete description of the components that com-
prise a P2P overlay network, as well as the protocols which manage communication. That is to say, we
need to describe the publication and subscription mechanisms in detail as well as how two peers can
discover one another given the complexities of the underlying real networks. The interested reader can
skip to Chapter 4 where the engineering description of how to connect is given.

3.4 Scope and Search - With Whom Do I Wish to Communicate?


Imagine a global P2P overlay network with hundreds of millions devices which can range from light
switches to massively parallel super computers. The global network will be a union of home networks,
manufacturing assembly line control processors, enterprise wide networks, military and government
networks, etc... This gives us a massive collection of PeerIdentity documents and their associated vir-
tual port documents. P2P communication given such a search space is unmanageable without some
means to scope or limit search to those peers that a given peer desires to contact. Why should a light
switch wish to contact a tank on military maneuvers? Searching the entire space given the best of algo-
rithms is not only too time consuming, but also ridiculous. It would render this P2P topology useful to
researchers in search and routing algorithms but would have no practical applications. To attack this

3-31
problem we need to organize the search space in a way which has a certain logic to it, and reflects how
humans and machines might really want to group themselves.

3.4.1 The Virtual Address Space


The above global overlay network at this point can be described as a collection of PeerIdentity and Vir-
tualPort documents. So, given an overlay network with n peers, and with the PeerIdenity documents,
peerIdentity(i), i = 1, 2, ..., n, then for each such i, virualPort(i, j), j = 1, ..., mi, is the complete collec-
tion of virtual ports for peeri. Thus, in this virtual address space we have n + m1 + ... + mn total docu-
ments. If peer1 and peer2 wish to communicate with one another on a well known socket, the problem
of discovery can be both bandwidth and computationally intensive. To minimize the discovery prob-
lem we need to minimize the document search space. To this end we use a common sense, real world
approach, and recognize that communication is usually based on shared interests, friendship, and other
human ways of social interaction. There are lonely hearts clubs, gambling clubs, baseball clubs, fami-
lies, instant messaging buddy lists, assembly lines, military maneuver squadrons, mission control, for-
est fire fighting teams, political groups, chess clubs, etc., etc. The list is endless.
Therefore, we define a connected community, CC, to be a subset of the collection, {peerIdentity(i) | i =
1,2,..., n}, where the peer members have a common interest. A peer can belong to multiple connected
communities, i. e, given CC(1) and CC(2), CC ( 1 ) ∩ CC ( 2 ) may be non-empty. Any peer can create a
CC, and a peer must always be a member of at least one CC. The default CC should be set when a peer
is initially configured. Given our overlay network with its collection of connected communities,
{ CC ( i ) 1 ≤ i ≤ N } , let CC(j) be any member of this collection. Then CCS(j) = { peer nodes p | p is a
member of CC(j) } forms an overlay subnetwork with a very special property: Where m ≠ n , if we have
p1 and p2 as nodes on CCS(m) and CCS(n), respectively, then p1 cannot connect to p2 and vice-versa.
Here we are saying that the virtual ports and virtual sockets on CCS(m) are visible only to the nodes on
CCS(m). As we will see later, the act of creating and publishing virtual ports and sockets is always
restricted to a CCS. Thus, connected communities become secure “walled gardens” in the larger over-
lay network with inter-CCS connectivity not permitted. Thus, the set of all CCS’s is pair-wise, commu-
nication disjoint. Certainly, p1 can be an active member of multiple CC’s and concurrently
communicate with members of each such CC on the associated CCS. With this definition the larger
overlay network is the union of pair-wise, communication disjoint CCS’s.
In Figure 3-6 we illustrate a simple overlay network with four CC’s each determining a CCS. The
ellipitical boundaries indicate the pair-wise, communcation disjoint attribute of the CCS’s. Note that
Peer 1 and Peer 2 are in multiple CC’s.

3-32
Overlay Network
CC2
CC1 CC3

CC4

Peer_1 Peer_2
Figure 3-6. Connected Community on Overlay Network

We see a CC isolated to its CCS in this manner as a very powerful concept that will first speed up the
discovery of the virtual sockets on ad-hoc P2P networks by limiting the search to a single CCS; Sec-
ond, minimize the time required to establish connections and route data between its peer node mem-
bers; and third, simplify implementing CC-based policies to guarantee strong authentication, as well as
the privacy, and integrity of data that is local, being transmitted on the CCS, or remotely stored.
A CCS does raise one problem. How can information that is publically readable be accessed by any
peer? An example of this occurs in the following section where some of the documents describing
connected communities must be publically available. That is to say, a connected community document
for CC0 must be accessible outside of the explicit access control of CC0 if non-CC members are to be
able to find it, and thus join CC0. To solve this access problem we define the Public Connected Com-
munity(PubCC) for this purpose. All peers are members of this community and as a member can both
publish documents, and access all documents published by the PubCC’s members. As a general rule,
documents describing CC’s, and meta-data describing data that is publically accessible can be pub-
lished in the PubCC. The former are like CC bootstrap documents, and certainly, other CC documents
will be published in particular CC’s and their access will be restricted to the CC members scope. The
latter might be meta-data containg URL’s that can be used to access information about connected com-
munities, for example, images and other publicity. Thus, the PubCC permits a peerNode global context
for publishing some CC documents, and meta-data. As is seen in chapter 4, section 4.2.3.3, the pubCC
restricts data access to CC documents.
To bind a virtual port to the CC in which it is created we are going to require another tag in the Virtu-
alPort document. This will be the CC UUID that is generated by overlay network’s UUID algorithm
and is described in the next section.
Let’s now revisit the VirtualPort document:
Document type = VIRTUALPORT

3-33
Content tags and field descriptions:

<vportname> Legal XML character string [XML] </vportname>


<vportUUID> uuid-Legal UUID in hexadecimal ascii string </vportUUID>
<vportType> unicast | UnicastSecure | multicast | multicastSecure </vport-
Type>
<multicastGroup> uuid-Legal UUID in hexadecimal ascii string </multicast-
Group>
<comHints>
<owner> peer UUID of publisher </owner>
<connCom> connected community UUID </connCom>
</comHints>

Below is the revised virtual port example:

<?xml version=”1.0”?>
<!DOCTYPE 4PL:VirtualPort>
<4PL:VirtualPort xmlns:4PL=”http://www.aw.com”>
<vportname> MobileAgent </vportname>
<vportUUID> uuid-AACDEF689321121288877EEFZ9615731 </vportUUID>
<vportType> unicastSecure </vportType>
<commHints>
<owner> uuid-DDEDEF689321121269717EEFZ9615659 </owner>
<connCom> uuid-FBCAEF689321121269717EEFZ9617854 </connCom>
</commHints>
</4PL:VirtualPort>

Given this additional tag, the virtual socket also reflects its CC identity. Let’s assume Bill and Rita are
both members of CC1 and CC2, have created mobileAgent ports in each CC, and established connec-
tions on the associated CCS’s. Then the connection tables on Bill and Rita could appear as follows:

Table 3-1. Local and Remote Socket with CC Identity.

Bill Rita

local remote local remote


bill.mobileAgent.CC1 rita.338888881.CC1 rita.338888881.CC1 bill.mobileAgent.CC1
bill.338888881.CC1 rita.mobileAgent.CC1 rita.mobileAgent.CC1 bill.338888881.CC1
bill.mobileAgent.CC2 rita.338888881.CC2 rita.338888881.CC2 bill.mobileAgent.CC2
bill.338888881.CC2 rita.mobileAgent.CC2 rita.mobileAgent.CC2 bill.338888881.CC2

When CC’s are created, its description needs to be published in a connected community document so
that their existence can be recognized by other peers. This document is discussed in the next section.

3-34
3.4.2 Component 4: The Connected Community Document
When a connected community is created, it is given a human readable text name to render it recogniz-
able, and a UUID so that it has a unique identity on the overlay network. Admittedly, as we’ve fre-
quently discussed, it is possible to have connected community name collisions in purely ad-hoc
networks. Because of this problem, the connected community document contains a description field
provided by the creator which can give detailed information about the community to aid in distinguish-
ing it from others with the same name. What this field contains is digital data whose format and use is
up to the creator. It could be a plain text string, a gif, mpeg or jpeg file. If a non-text description is cho-
sen, we suggest using a URN [URN] to reference the data for better performance. Certainly, a URN list
can be used to point to multiple locations, such as peers, or even websites. This data must be publically
available, and accessible via the Public Connected Community. Because of virus problems, we caution
against but do not prohibit the use of executable code. It is certainly possible to create execution envi-
ronments that are virus safe, and JAVA is such an example. The forth, and final field is for policies that
moderate community behavior. Examples are authentication, digital rights, content restrictions, data
security, etc. These policies may also refer to executable code and the same cautions apply. We define
three membership policy types. They are ANONYMOUS, REGISTERED, and RESTRICTED:
1. ANONYMOUS - Any peerNode can join a CC, and access its associated content
2. REGISTERED - A peerNode must register with the CC creator. This is a “good faith” reg-
istration and is not intended to strickly control access to content. Registration does result in
returning a registration validation stamp that must be presented to other members for con-
tent access. On the other hand no particular attempt is made to control the registration vali-
dation forgery. There is no deep security here
3. RESTRICTED - Here a secure credential is returned when membership is granted. The cre-
dential is used to authenticate the new member, and without the credential, access to con-
tent is denied. Such a credential may use strong security or “Webs-of-Trust”-like
mechanisms. Descriptions of how to implement this security are discussed in chapter 5.
The fifth field is the optional Email VirtualPort UUID for this connected community. For details see
chapter 7, section 7.4.
The following is the connected community document:
Document type = CONNECTEDCOMMUNITY
Content tags and field descriptions:

<ccName> Restricted Legal XML character string [XML][MIME] </ccName>


<ccUUID> uuid-Legal UUID in hexadecimal ascii string </ccUUID>
<description>
<text> Legal XML character string [XML] </text>
<URN> Legal Universal Resource Name </URN>
</description>
<policy>

3-35
<type> ANONYMOUS | REGISTERED | RESTRICTED </type>
<text> Legal XML character string [XML] </text>
<URN> Legal Universal Resource Name </URN>
</policy>
<emailVportUUID> uuid-Legal UUID in hexadecimal ascii string </emaiVport-
UUID>

Below is the connected community example:

<?xml version=”1.0”?>
<!DOCTYPE 4PL:ConnectedCommunity>
<4PL:ConnectedCommunity xmlns:4PL=”http://www.aw.com/p2p/comms/CC”>
<ccName> China green tea club </ccName>
<ccUUID> uuid-AACDEF689321121288877EEFZ9615731 </ccUUID>
<description>
<text> We love green tea from China </text>
<URN>
urn:peerIdentity:uuid-DDEDEF689321121269717EEFZ9615659/TeaCup
</URN>
</description>
<policy>
<type> ANONYMOUS </type>
<text> No membership restrictions </text>
<text> GREEN tea related content only </text>
</policy>
<emailVportUUID> uuid-AACDEF689321121288877EEFZ9615731 </emailVportUUID>
</4PL:ConnectedCommunity>

We now have complete descriptions of four components: The peerIdentity Document, the virtualPort
Document, the virtual socket, and the connectedCommunity Document. We have also introduced the
concepts and definitions of the overlay network, and connected community subnetworks. Along with
this we have given high level descriptions of how connections are established on the overlay network
given the restriction to connected community subnetworks. Then we also noted the requirement for the
Public Connected Community for data that must be available to all peer nodes on the overlay network
when they are active members of this community.
Virtual connectivity exists to avoid the underlying issues of real network connectivity, and to provide
by the means of peer identities both unique and true end-to-end visibility across all possible nodes on
the overlay network in spite of these former issues. Yet, a system that provides overlay network func-
tionality must have code that addresses the real underlying issues to permit this simplified form of net-
work access. The implementation of the required software is where the real engineering takes place. In
the next section we begin to discuss these real network issues and their concomitant programmable
solutions to make them transparent to the application programmers using the overlay network.

3-36
3.5 How to Connect

3.5.1 Real Internet Transport


Imagine yourself in front of your familiar web browser connecting to some website anywhere in the
world after just having done a search. What really happens? What Internet protocols are involved when
you click on the URL? Let’s take a typical URL, http://www.ietf.org. First of all, the browser code
knows that the protocol that will be used to connect to www.ietf.org is http, and that the default TCP
port is 80. Before a connection can be made the real IP address of www.ietf.org must be found. How
does that happen? In most cases one has a Domain Name Service (DNS) server which will return the
IP address of a valid, registered domain name like www.ietf.org. Therefore, the DNS protocol is used
to do the name to address translation before the connect is even attempted. But, to use the DNS proto-
col one must be able to locate DNS servers. How is that done (see section 3.5.1.1)? Let’s assume we
know a DNS server and the appropriate translation takes place. Then your system can connect to the IP
address of www.ietf.org using the http protocol and TCP/IP. But, your system is on the Internet and
most likely not on the same subnetwork as www.ietf.org. Therefore, your system requires a route from
your IP network to the destination IP network where www.ietf.org is hosted. How are routers discov-
ered? Sometimes they are part of your system’s initial configuration, and otherwise there are protocols
to find them (see section 3.5.1.1). Let’s assume your system knows the IP address of its router so that
an attempt to connect to www.ietf.org can be made. Neither your system’s network interface nor the
router’s understand IP since they are physical layer devices. They each require the MAC addresses that
work on the physical layer, e. g., ethernet. Since your system and the router must be on the same sub-
net, your system uses the Address Resolution Protocol (ARP)[RFC826], broadcasts an ARP packet
which includes its own IP and MAC address, and asks the system with the router’s IP address, i. e., the
router, to respond with its MAC address. We assume the router is up, and its MAC address is found.
Given that the network is in good form, IP packets from your system destined for www.ietf.org are sent
to the router and forwarded on to the final destination. In this manner the connection will succeed and
http running on top of TCP/IP will retrieve the site’s home page. Needless to say, we have glossed over
many details here, and have left out other possible barriers that might have been traversed to connect to
the IETF’s web-site. But, we have pointed out the key steps required to connect to a host somewhere
on the Internet. In the following sections we fill in the missing details that are required to be known by
your P2P software in order to permit any two peers to connect to one another.

3-37
Application http://www.ietf.org

Http http protocol

TCP TCP destination port 80

IP IP address of www.ietf.org

Physical Layer MAC Address of router


to destination network

http://www.ietf.org

local DNS DHCP IP Address ARP


Registry of Router

Port 80 IP Address MAC Address


Figure 3-7. Connecting on the Internet

3.5.2 Issues
The most basic IP component your system possesses is its IP address. We have previously discussed
the IPv4 address limitations and the as the yet not fully deployed IPv6 solution and why it is highly
probable that your IP address is not fixed. Here we mean that each time your system boots or after an
appropriate lease expiration discussed just below, this address may change.

3.5.2.1 Dynamic Host Configuration Protocol (DHCP)


ISP’s and Enterprises with more clients than can be accommodated by their assigned IP address space,
require a mechanism to fairly distribute IP addresses to these clients. Here we mean that an address is
reusable when not yet allocated. The assumption is made that every client is not up all of the time.
DHCP provides a method to allocate shared IP addresses from pools of reusable, administratively
assigned addresses. DHCP includes a lease time for address use as well as a way for a client to relin-
quish the address when it no longer needs it. DHCP in fact can provide all that is required to auto-con-
figure a client under most situations. The DHCP service must be reachable by broadcast. Below is a
short list of the more than one hundred DHCP options for providing information to a host about the

3-38
network to which it is connected. A complete list can be found at the Internet Assigned Numbers
Authority [IANA].

Data
Tag Name Length Meaning
--- ---- ------ -------
0 Pad 0 None
1 Subnet Mask 4 Subnet Mask Value
2 Time Offset 4 Time Offset in
Seconds from UTC
3 Router N N/4 Router addresses
4 Time Server N N/4 Timeserver addresses
5 Name Server N N/4 IEN-116 Server addresses
6 Domain Server N N/4 DNS Server addresses
7 Log Server N N/4 Logging Server addresses
8 Quotes Server N N/4 Quotes Server addresses
9 LPR Server N N/4 Printer Server addresses
10 Impress Server N N/4 Impress Server addresses
11 RLP Server N N/4 RLP Server addresses
12 Hostname N Hostname string
13 Boot File Size 2 Size of boot file in 512 byte
chunks
14 Merit Dump File N Client to dump and name
the file to dump it to
15 Domain Name N The DNS domain name of the
client

Since, on the Internet in general, there is no guarantee that a systems IP address will remain fixed. This
breaks end-to-end connectivity on the real network. Since end-to-end connectivity is guaranteed on the
overlay network with a unique peer-UUID, this guarantee must be reflected in the connectivity on the
real network. To do this the underlying P2P software must recognize the change of IP address and
appropriately update the PeerIdentity document when this occurs. Let’s assume that the peer, Luc-
Boureau, relinquished the IP address 152.70.8.108, and then DHCP assigned the new address
152.70.8.169. This peer’s PeerIdentity document would be updated as follows using 4PL:

String oldStr = “tcp://152.70.8.108.9133“;


String newStr = “tcp://152.70.8.169.9133“;
replaceField(peerIdenity, “<comprotocols>”, oldStr, newStr);

oldStr = “http://152.70.8.108.1111“;
newStr = “http://152.70.8.169.1111“;
replaceField(PeerIdenity, “<comprotocols>”, oldStr, newStr);

3-39
This results in a new PeerIdentity document that is just below. How these changed documents are
redistributed on the overlay network is discussed in the next chapter.

<?xml version="1.0"?>
<!DOCTYPE 4PL:PeerIdentity>
<4PL:PeerIdentity xmlns:4PL="http://www.aw.com">
<peername> LucBoureau </peername>
<peerUUID> uuid-AACDEF689321121288877EEFZ9615731 </peerUUID>
<comprotocols>
<real> tcp://152.70.8.169.9133 </real>
<real> http://152.70.8.169.1111 </real>
<overlay>
tls://uuid-AACDEF689321121288877EEFZ9615731
</overlay>
</comprotocols>
</4PL:PeerIdentity>

This situation becomes much more complicated with the introduction of Network Address Translators
(NAT).
3.5.2.2 Network Address Translator (NAT)
Again because of the shortage of IPv4 addresses Network Address Translators [RFC1631, RFC3022]
provide a simple scheme to permit a home user, or an enterprise to use internal, Intranet-private
addresses and map them to globally unique addresses on the exterior, Internet. NAT boxes, as they are
called, shield the external Internet from the IP addresses used on what is called the “stub network” for
internal communication, and thus, will permit the massive duplication of internal IP addresses on these
disjoint stub networks. This is shown in Figure 3-8 where the stub network host with address 10.0.1.3
is communicating with the Internet host with address 128.47.88.11:

3-40
128.47.88.11
WAN
src = 192.88.24.7:1024
dst = 128.47.88.11:23

NAT Box 10.0.1.1 192.88.24.7


LAN

10.1.1.2 10.0.1.3 ... 10.0.1.11


src = 10.0.1.3:2077
dst = 128.47.88.11:23

src - source IP address; dst - destination IP address


Figure 3-8. NAT Network Setup

Imagine that a small business assigns the ten class A addresses, 10.1.1.2, 10.0.1.3,..., 10.0.1.11 to the
ten systems it uses, and 10.0.1.1 is the stub network address of the NAT box. Also, assume that exter-
nally, the NAT box has a single, globally unique IP address, which in this case is 192.88.24.7. Further-
more, suppose that the system with IP address 10.0.1.3 wishes to telnet to the external system with
address 128.47.88.11. The telnet application with generate a random local TCP port, say 2077, and try
and connect to TCP port 23 on host 128.47.88.11. The NAT box does several things on the reception of
IP packets from the system on the stub network. First it replaces the source IP address with the glo-
bally unique IP address, 192.88.24.7. Second, it assigns a TCP port to the outgoing connection, say
1024, and changes the source TCP port accordingly. It then maintains a port map, which maps received
packets destined from 192.88.24.7:1024 to 10.0.1.3:2077. In this manner, if there are simultaneous
outgoing TCP connections from multiple systems on the stub network, then every connection will have
its unique port mapping. Because the IP address is changed, this means that the IP checksum must be
recomputed for each outgoing packet. This change of IP address also forces a recomputation of the
TCP checksum because the TCP pseudo-header must be updated to reflect the changed IP address. The
pseudo-header and source port are part of the checksum calculation. Since the checksum calculation
also includes all of the TCP data in the current IP packet, NATs can negatively effect TCP performance
for large data transfers. UDP/IP packets also have a port and can be similarly mapped. Finally, NATs
cause another restriction. One cannot fragment either TCP or UDP packets on the stub network side of
a NAT. Packet fragmentation uses the following IP header fields: Each IP packet has an identification
field, a more-fragment flag, and a fragment offset. Thus, in the first fragment the more-fragment flag is
true, and in the last fragment it is false. If two systems on the same stub network happen to be frag-

3-41
menting outgoing TCP/IP (UDP/IP) packets and the TCP/IP (UDP/IP) packets happen to have the
same identification field and are sending to the same source, then on the global side of the NAT, the IP
header information that is used to reassemble the packets at the ultimate source will be identical. Since
the TCP and UDP header information is only in the first packet, all bets are off. Consequently, if one is
behind a NAT, fragmentation cannot take place for those packets that have global destinations.
Now, since NAT boxes can respond to DHCP, the fixed stub net addresses may be reusable, and
dynamically assigned. We can add the extra complication that the globally unique IP address on the
Internet side of the NAT is acquired with DHCP and thus, changes over time as discussed in the above
section. Furthermore, the stub network systems do not know the globally unique external NAT address.
Consequently, the PeerIdentity document will contain real IP addresses that do not reflect the
addresses really seen by other peers outside of the stub network, and these peers themselves may be on
their own stub networks. The death blow to end-to-end connectivity is one cannot reliably initiate con-
nections from the exterior side of the NAT box. How this is resolved so that one has end-to-end con-
nectivity on the overlay network is discussed in section 3.5.3.
3.5.2.3 Multiple Devices and Physical Layer Characteristics
Besides the headaches coming from firewall and NAT traversal, the growth of the scope and dimension
of the Internet is causing more pain. Wireless LAN, Blue Tooth, sensors, etc. Will each small device
have an IP address? Even without these wireless devices, we are running out of IPv4 addresses. How
do different species talk to one another? Now even a USA-based GSM cellular phone is not working in
Japan-China CDMA network. The two wireless standards, Bluetooth and 802.11a/b are fighting one
another deployment, and clearly, both are being deployed. It is too much to ask if it is possible to form
end-to-end communication in the sea of the devices. To build a P2P overlay network including each
single device already sounds like a dream given this reality. However, when we look at the fundamen-
tals of the problems, we are foreseeing the opportunities:
• The problems caused by the growing number of devices are already covered by our careful
design of PeerIdentity document. Not every device has to have an IP address, instead, a
peer-UUID can be assigned for identification purposes independent of the underlying phys-
ical layer.
• To solve the problems caused by the growing types of devices and their associated commu-
nication protocols, we need to find the common transport point between different networks.
Then, the end-to-end, P2P overlay network communication can be established on the top of
the transport chain using the common transport point which in our case is the mediator.
3.5.2.4 Changing Identities - How do I Know the Color of a Chameleon?
All of the above issues are from technological point of view, but the most unpredictable element which
contributes to the "chaos" of the Internet is human users. Yes, the carefully designed PeerIdentity doc-
uments are perfect to identify people, and the systems that they use. Yet, look back to Ebay.com’s rat-
ing engine. A badly behaved user can discard an old identity, apply a new user Id and start a new “life”
with a new “face”. The same thing can happen on a P2P overlay network.

3-42
Although this sounds too idealistic, personally, we think one should be given a second chance. Internet
life should be more ideal than the real life in a reasonable sense. Going back to the changing identity
issue, Ebay’s problem can be solved by giving the new account Id the smallest rating value. Hence, for
a badly behaved user, even he/she starts over, so there is not much benefit. This is just one of the sim-
plest examples of how to deal with such problems. As will be pointed out in Chapter 5, there are many
possible ways to engineer a nearly-fair overlay network where the evil can be caught and the damage
can be limited.

3.5.3 Solutions
Let’s briefly summarize what we have learned about our P2P overlay network up to this point, and also
what obstacles we must overcome to permit P2P communication on this network. The P2P overlay net-
work is a collection of peer-nodes each having a name, unique identity, and description. And, each
peer must belong to a connected community in order to communicate with other peers in these com-
munities. Also, since inter-connected community communication is not possible, and we have overlay
network information that must be communicated independently of these communities, there is a Pub-
lice Connected Community to which any peer can belong to make access to this information universal.
In particular, a peer must be able to elaborate all publically available connected communities. It is
important to note that “stealth” connected communities are not prohibited and that their existence
would never be publically available. Other out-of-band means are necessary to discover and join these
communities.
For peer-nodes to discover, and communicate with one another, we have defined three documents: The
PeerIdentity document, the VirtualPort document and the Connected Community document. As noted
just above, the PeerIdentity document identifies peers. The VirtualPort document permits a peer to
establish both listening and outgoing virtual sockets, and the Connected Community documents cre-
ates safe “walled gardens” to both minimize the requirements for discovery, content exchange, routing,
and provide a framework upon which privacy and security can more easily be added.
First, as we began to look under the network hood, we found that there are many obstacles to prevent
end-to-end communication on the real network. As a means to overcome these obstacles we mentioned
the requirement for systems which we call mediators. Most importantly, mediators make these barriers
invisible to the P2P application programmer. Second, in the PeerIdentity document is a description of a
peer-node’s real transports. While the overlay network is an abstraction that simplies communication
as mentioned above, this network must be bound to the real network transports by the underlying P2P
software for the abstraction to be realized. In this section we describe in some detail exactly what a
mediator does as well as how the above binding happens.
3.5.3.1 Transport Mediator
Reviewing the issues with the real transports, and their current protocols:
1. IP Multicast limited to local subnet prevents peer-nodes from discovering one another if
they are not on the same subnet,

3-43
2. Multicast is either non-existent as in mobile phone networks, or device discovery is limited
to a physical network with a small radius such as blue tooth,
3. Non-fixed IP addresses in the IPv4 address space,
4. NAT limitations which are like (3) but even worse,
5. Small devices with limited memory and processor capabilities requiring both storage and
computational aid,
6. Routing our overlay network protocols to give us end-to-end communication,
7. The 100,000,000,000 devices requiring robust search algorithms for the discovery of peer-
nodes, their associated documents, and content. This is currently a serious problem even
with more than million peer-nodes,
8. Ad-hoc document registration for these same peer-nodes, i. e., administered registration is
no longer feasible.
Our solutions to all of these problems begin with the P2P Mediator. Mediators host peer-nodes and
any peer can be a mediator. For small, ad-hoc, P2P overlay networks, a single mediator is sufficient.
Certainly, this depends on the processing power, memory size, and disk storage of the mediator. If one
imagines a neighborhood network with a maximum of 100 peer-nodes, then any fully equipped home
system available today is fine. On the other hand as we move to networks with thousands to millions of
peer-nodes the ratio of mediators to peer-nodes on a single P2P overlay network should decrease
because the systems hosting peer nodes must be more powerful and we wish also to maximize
response time and stablitity so that the technology is compelling for users. In particular, there are some
subtle computational problems related to routing. We are all familiar with the structure of telephone
numbers. Local calls are easily differentiated from national and international calls. Our Mediators will
be organized similarly to simplify routing, and minimize routing table memory usage. The primary
requirement is that each mediator have a fixed network address that is not NAT/firewall limited so that
it is always reachable from anywhere on its overlay network. Note that a mediator may be firewall lim-
ited but in this case firewall traversal is not possible using this mediator. This is appropriate for an
enterprise P2P overlay network that prohibits firewall traversal. A second requirement is that a media-
tor must be available within the context of the needs of the peer-nodes it supports. For example: A
neighborhood mediator may only be up from 6pm until midnight during the week and 24 hours on Sat-
urday and Sunday while enterprise mediators will surely be 24x7 highly available.
Now, let’s take a closer look at the problem of two peers discovering one another. If both peers are not
NAT/Firewall bound from each other, then even if they have dynamic IP addresses via DHCP or they
are on the same NAT stub netowrk, discovery is possible. They may be on the same local network, and
then multicast can be used to announce a peer’s presence and send the documents previously
described. Again, this is short lived because both NAT and DHCP acquired addresses can be leased
and reused. Once two peers are not on the same local network or NAT stub network, the story changes.
If these same peers are not on the same local network and using DHCP services, then an out-of-band
communication, e. g., email, a telephone call, or an ad-hoc name/address registry like an LDAP server,
can be used to exchange network addresses so that communiation is established and the documents
sent to one another. This is cumbersome but it works. Finally, if either of the peers is NAT bound, and

3-44
whether or not the NAT uses DHCP, then the external IP socket is hidden from the peer’s software
since port mapping is used, and there is no reliable way that this peer can give the information neces-
sary for an externally located peer to contact it. Simply stated: Because of NAT port mapping, this is
impossible in general. Clearly, we have a real problem with discovery. So, how do our mediators solve
the problem of discovery?
Mediators are always visible to the peers they host, and to the other mediators that are necessary to
support the P2P overlay network. That is to say (see chapter 4 for the protocols, and algorithm
details):
1. They must have fixed IP addresses,
2. Mediators must be external to all NAT stub networks they support,
3. If the P2P overlay network supports multiple physical layers whose level2 transports are
incompatible, e. g., Bluetooth and IP, then the mediator must support all such transports. In
this case note that the overlay network as we have defined is independent of these incom-
patibilties,
4. If one wishes to traverse firewalls, then the mediator must be outside of the firewall, and
some protocol must be permitted to traverse the firewall. Examples are http and SOCKS
Version 5,
5. The mediators must maintain a map of all known mediators, and this map is called the
mediator map. This is maintained using the mediator discovery protocol.
Peers must have either a preconfigured or other means to acquire the information needed to initially
contact a mediator. For IP based mediators/peers the contact information will be an IP address/socket
pair. This may be in the software when it is booted, the software may have the address of an external
website where mediators are registered, or for a neighborhood P2P overlay network, it can be acquired
out-of-band using, or example email, a telephone call, or a casual conversation. Let’s assume a media-
tor hosted peer has the required contact information. What does the peer do to make itself known? It
registers its peerIdenity, virtualPort, and connected community documents on that mediator. Since
mediators will know about other mediators by the means of the mediator-to-mediator communication
protocol, this information is distributed among themselves. A mediator may have a limit to the num-
ber of peers it can host and may redirect a peer to a different mediator using the mediator redirect pro-
tocol. In Figure 3-9, ecah mediator has a complete map which includes all four mediators.

3-45
peer

m0 mediator map
m0 m1
m2 m0
m3 m1
m1 m2
m3
m0
m2 m1
m2
m3 m0
m1 m3
m2 mediator
m3

Figure 3-9. Simple Mediator Map

As the number of peers on a P2P overlay network grows, discovery becomes more and more difficult
because the amount of data that is distibuted among the mediators is enormous, and even with the best
search algorithms, the task is still very difficult. One can imagine using a distributed hast table (DHT),
and hashing references to all documents across the meditor map. In this case, a peer search for a string
will contact its meditor with the query, the query’s search string will be hashed, and sent the to the
meditor which has the information in its hash table along with the query. If we are clever, we appended
enough information to the hashed data so that the meditor having the information in its hash table can
forward the query to the peer from which the original document hash originated. This glosses over
some hash algorithm details, and routing issues but none the less, it is a reasonable approach. This is
illstrated in Figure 3-10 where two document descriptions are sent to mediator M3 along with the rout-
ing information. Then M3 hashes peerIdentity and virutalPort document descriptions to M1 and M4,
respectively.

3-46
peer

m0

m1
peerIdentity
hashed
m2 virtualPort hashed

send m3
peerIdentity mediator
virtualPort

document list
Figure 3-10. Hashing Scheme
In Figure 3-11, assume peer P1 queries Mediator M1 for P2’s virtualPort Document. This query
includes the routing information necessary for the responding peer to reach P1. M1 hashes the query
and finds that M3 has the required information, then sends the query to M3. M3 has the routing infor-
mation to forward the query to P2 via M2. Finally, since the query contains the route back to P1, P2
sends the response to P1 through M1 if M1 is reachable directly from P2. Otherwise, P2 has to use its
mediator, M2, to communicate with M1 and P1.
peer

m0 p1

m1

m2

m3
p2 mediator

query response

Figure 3-11. Query and Response

3-47
Now, suppose a mediator crashes. In this case, a great deal needs to be done to stablize the remaining
mediators’ hash tables. Let’s assume that a mediator hosting several peers discovers that a meditor to
which it has hashed data has crashed. What does this imply? There are multiple possibilities, recovery
is difficult and can use a great deal of bandwidth and cpu time. Imagine we have a simple DHT algo-
rithm (note that there are many possible DHT algorithms) where given a (string, object) pair, we do the
SHA-1 hash of the string mod the number of mediators and store the (string, object) as well as the orig-
inating peer on the resulting mediator:
Given mediators M0, M1,..., MN, j = SHA-1(string) MOD (N+1), 0 ≤ j ≤ N ,
and the data will be in mediator Mj’s hash table.
A mediator, Mk, crashes. First, all data hased to that mediator must be rehashed. This implies that
when data is hashed a reference to the mediator to which it is hashed must be kept, and in particular, if
mediator Mj hashes data to mediator Mk, then Mk’s mediator map entry on Mj should maintain the ref-
erence for the sake of computational efficiency during crash recovery. Then Mj need not search all of
its hashed data for Mk, rather it goes directly to Mk‘s mediator map entry. We can decide to keep the
same modulus, N+1, and any data that was stored on the crashed meditor’s hash table would then be
stored on its successor, Mk+1, mod (N+1). This is OK, and all mediators need to do the same. If we
used N rather than N+1 as a modulus, then all of the hashed data on all of the mediators must be
rehashed since a new hast algorithm modulus is being used. One could result to a brute force like
search instead, but this is not good for performance. When a mediator discovers that another mediator
is down, then it must notify all other mediators to keep a consistent DHT. Because there is a race con-
dition here, i. e., a mediator may be hashing data to a crashed mediator and not discover it is down until
it tries to store the data, we would the simple rule: In this kind of failure, the mediator will wait until
there is again a consistent mediator map, i. e., backoff and wait for a notification, and if none arrives,
then apply the the rule that is used to maintain a consistent map. That might be a simple ping of each
member of the map that has not been recently heard from. The more unstable mediators one has, the
more complicated this maintenance problem becomes. Here we must emphasize that mediators are
supposed to be stable systems. And it is important to try to minimize the impact of such disaster recov-
ery.
In Figure 3-12, mediators M0, M1 and M2 have discovered that M3 has crashed, have updated their
mediator maps, and rehashed the data that was hashed to M3 to M0, M3’s successor mod(4). Thus, in
Figure 3-13, P2’s peerIdentity document is rehashed to M0. Finally, P1’s query is directed to M0
instead of M3.

3-48
peer

m0 mediator map
m0 m1
m2 m0
m3 m1
m1 m2
m3
m0
m2 m1
m2
m3 m0
m1 m3
m2 mediator
m3

Figure 3-12. Hashing Recovery Scheme

peer

m0 p1

m1

m2

m3
p2 mediator

query response

Figure 3-13. Query and Response after Hashing Recovery

Recall that connected communities are pair-wise, communication disjoint. Using this feature and the
mediator-to-mediator protocol we do the following:
1. Mediators maintain a mediator map for each CC they host, i. e., for those connected com-
munities in which the peerNodes they host are active (see chapter 4, section 4.2.3.1),
2. Mediators communicate the existence of each CC they host to all other mediators using the
mediator-to-mediator protocol. If another mediator hosts this CC, it adds the contacting
mediator to its CC mediator map, and notifies that mediator that it also hosts peers belong-
ing to the CC so that both have the same CC mediator map.

3-49
It is not necessary for CC mediator maps to be complete, that is to say, contain an entry for every medi-
ator that hosts a peer that is a member of a given CC. Here, it will simply be impossible to find all
members of a CC, and this is permissible in P2P networks, i. e., discovery need not be complete. But
CC mediator maps must be consistent, i. e., every mediator in a CC mediator map hosts at least one
peer that is a member of CC. So, why do all of this? It simplifies disaster recovery because when a
mediator has crashed, and this is discovered, then recovery is limited to those CC mediator maps to
which that crashed mediator belongs. This is cool! Figure 3-14 shows that mediator M0 supports two
connected communities, CC1 and CC2, and M1 only supports CC1. Similarly, M2 and M3 only sup-
port CC2, and M4 is the sole supporter of CC3

m0 m0
m1 m2
m3
m0
m2
CC1
CC2 m3
m1

m4

p1 p2 m4
CC3

Figure 3-14. CC Mediator Map

Figure 3-15 shows that mediator M0 has crashed, its peers have discovered alternative mediators and
CC mediator maps have been updated to reflect this.

3-50
m0 m0
m1 m2
m3
m0
m2
CC1
CC2 m3
m1

m4

p1 p2 m4
CC3

Figure 3-15. CC Mediator Hash Recovery Scheme

Finally, mediators can proxy storage and computation for their hosted peers which are device con-
strained. Given this capability and the above discussion, the eight issues previously mentioned can
each be resolved with the addition of mediators to the P2P overlay network.
3.5.3.2 Putting the Components Together: Mapping P2P Overlay Network to the Real Transport
We now have the components and documents that define the P2P overlay network, and mediators.
We’ve also given an overview of discovery. What is missing is how the overlay network is mapped to
the underlying real network by the means of the real transports in the peerIdentity document and medi-
ators. Looking back at figures 3-4 and 3-5 we have the IP stack and the Overlay network stack. The
code for the implementation of the overlay network is at application layer in the IP stack. We have a
stack bound to a stack at the IP application layer. Here we formalize this binding.
P2P overlay network applications create virtual sockets. At this point it might be a good idea for the
reader to review section 3.3. In following first line is extracted from table 3-1, and the 2nd line repre-
sents a real transport connection:

Bill Rita

local remote local remote


ONP bill.338888881.CC1 rita.mobileAgent.CC1 rita.mobileAgent.CC1 bill.338888881.CC1

Here, the peers bill and rita are members of connected communtity “1” and rita has a listening mobile-
Agent virtual socket active. The above table shows an open connection on the P2P overlay network,

3-51
and we describe below exactly the steps necessary to establish this connection which in turn requires a
real connection on the chosen real transport. Every step from discovery to communication on the P2P
overlay network requires a real transport mapping. This requires a case by case analysis:
1. Both peers are on the same local network,
2. Both peers are not on the same local network, and thus, a mediator is required at some
point, and perhaps throughout the communication.
In case 1 bill and rita would have discovered one another using IP multicast. They will also have the
following in their peerIdentity documents:
On peer node bill we will have

<comprotocols>
<real> tcp://152.70.8.108.3000 </real>
</comprotocols>

and on rita,

<comprotocols>
<real> tcp://152.70.8.109.3000 </real>
</comprotocols>

Having discovered one another on the same subnet, the software that implements ONP will establish a
real tcp connection for communication between bill and rita. The above table now showing both the
ONP sockets and the IP sockets appears as follows:

Bill Rita

local remote local remote


ONP bill.338888881.CC1 rita.mobileAgent.CC1 rita.mobileAgent.CC1 bill.338888881.CC1
TCP 152.70.8.108.3000 152.70.8.109.3000 152.70.8.109.3000 152.70.8.108.3000

In this way, the TCP data that is exchanged between bill and rita is appropriately dispatched by the
ONP software to the mobile agent applications using the channel that is open on the P2P overlay net-
work. There is a minor variation in this case where both peers have TCP/IP connectivity but cannot
discover one another for many possible reasons, e. g., they are not on the same subnet. Here, after
receiving each other’s peerIdentity documents from a mediator, the ONP software attempts an initial
TCP/IP connection to the transport addresses in the peerIdentity documents. This will be successul,
and all further communication will proceed as above and the above table will be identical.
To describe case 2 we assume that bill is on a NAT stub network, and rita is behind a firewall. Thus, a
meditor is required for all communications. This begins with discovery and then applies to every com-
munication between the two peers. bill is a member of CC1 and wishes to communicate with rita who

3-52
is also a member of CC1. We assume that both bill and rita have one another’s peerIdentity and virtu-
alPort documents by the means decribed in the above discussion on mediators. As mentioned several
times in this section, the details of document retrieval are thoroughly covered in chapter 4. bill’s ONP
software already knows its peer is behind a NAT. How? When bill initially contacted it’s mediator, say
M0, the mediator requests bill’s peerIdentity document, notes that the source IP address of the TCP/IP
connection bill made is different than the IP address in the peerIdentity commprotocols fields. In this
case three things will happen:
1. The mediator creates an INBOX for bill. The INBOX will be for all messages from other
peers destined to bill. Recall, the mediator cannot contact bill because bill is behind a NAT,
and so, bill must poll the mediator at a reasonble frequency to retrieve its messages. A
mediator should let bill remain connected as long as the system resources permit, and a
fairness algorithm for disconnecting hosted peers must be implemented,
2. The mediator notifies the bill that it is behind a NAT, and sends bill its mediator document
which contains the routing information necessary to reach this mediator. It may be that bill
will communicate with rita via a different mediator, and need to append to all communica-
tions with rita the route back to bill, ie, via M0,
3. bill will update its peerIdentity and virtual port documents to reflect the routing information
in the mediator document it received. That way, any further requests for either of these doc-
uments provides a highly probable route. Note that peers usually first ask for virtual port
documents. If they contain successful routes, then the peerIdentity document is not
required. If a route fails, then in this case, the virtualPort document also contains the peer-
UUID which can be used to recover the most recent peerIdentity document with a viable
route.
1 and 2 above apply similarly to rita. Let’s assume without loss of generality, that rita is using M1. bill
has an INBOX on M0 and rita an INBOX on M1. bill and rita both have their mediator’s documents.
bill sends a request to connect to rita’s mobileAgent port to rita’s INBOX on M1 using ACP/ONP.
Recall bill has already communicated with rita, received the requisite peerIdentity and virtualPort doc-
uments, and thus, also has received from rita M1’s routing information. This request includes the rout-
ing information necessary for rita to respond to bill. rita is polling the INBOX on M1 and receives this
message. rita then completes the handshake by acknowledging in a similar manner to bill that the
request to connect has been received. Recall that ACP/ONP is a reliable channel protocol and these
messages are guaranteed to be delivered barring a catastrophic partioning of the real network. Now we
have the following connection tables on bill and rita describing the mappings between the overlay net-
work and the real network

3-53
:

Bill Rita

local remote local remote


ONP bill.338888881.CC1 rita.mobileAgent.CC1 rita.mobileAgent.CC1 bill.338888881.CC1
TCP 10.0.1.3.3000 129.14.7.25.3000 152.70.8.109.3000 152.70.96.11.8080

In the above table the overlay network connectivity is always the same. This is what makes P2P a sim-
plifying technology for applications. On the other hand, bill is behind NAT, has received its IP address
using DHCP from the NAT, and the NAT has a hidden but different external IP address. bill’s remote
address is the IP address of M0, 129.14.7.25:3000. rita is behind a fireway and is using an http proxy
address to contact M1 and 152.70.96.11.8080 is the address of this proxy.
This completes the discussion of the functional mapping that takes place so that two peers can com-
municate on the P2P overlay network. Notice that all of the components and documents we have
described up to here are required. Also, the description of the new fields containing routing informa-
tion that are added to the peerIdentity and virtualPort documents will be completed in the next chapter
where we describe the protocols that moderate the behavior of peers and mediators as well as the medi-
ator document.

3-54
Chapter 4
Basic Behavior
of Peers on a
P2P System

We now have a good understanding of the definition of a P2P overlay network,


its peer nodes, and how this network maps onto the underlying real transports.
On the other hand, we have not yet really provided the details necessary to
insure that peer nodes can dependably communicate with one another where
“dependability” must be taken in the context of the particular P2P network’s
place on the P2P spectrum as discussed in chapter 1, section 3. There is no
P2P engineering way around the inherent ad-hoc behavior of some P2P over-
lay networks. In any case, these details are called protocols, or the rules that
peer nodes must follow so that they can predictably and meaningfully interact.
P2P overlay network protocols have both syntax and semantics. The syntax
will define the form of the protocol data sent and/or received, and the seman-
tics the required behavior and interpretation of this data. These protocols are
programming language independent, and, as long as the protocols are well
defined, correct implementations will interoperable. Network protocols are no
different in spirit than the familiar rules of the road we must all know and obey
to drive on the highways of our respective countries. And, although there are
no international standards for such rules, there is a reasonable familiarity, a
core behavior, that is common to them so that a driver in unfamiliar territory
can be well behaved and probably not receive a traffic fine. What we are say-
ing here is that we are all familiar with the use and need for network protocols
even if we have never read the specifications, for example, the RFC’s for IP
and TCP. This chapter is all about P2P overlay network protocols. Where the
P2P overlay network provides the arteries and veins for overlay network trans-
port, the protocols permit them to be filled with their life’s blood and to regulate
its flow.

4.1 The P2P Overlay Network Protocols


Recall from chapter 3, section 3, the Overlay Network Stack. Below the application level is the trans-
port level and there we have two protocols: The Universal Message Protocol (UMP), and the Applica-
tion Communication Protocol (ACP). At the bottom is the Overlay Network Protocol (ONP). The ONP
specifies both the syntax of our message format, and the semantics associated with each field this for-
mat defines. It is the IP equivalent on the Overlay Network.

4.1.1 The Overlay Network Protocol


We assume that the real transports bound to the overlay network will use protocols with the strength of
IP.v4 or IP.v6 to manage the packet traffic on the underlying infrastructure, and thus, real transport
issues are of no concern in this section. Rather, the ONP has as its goal the successful delivery of a
message between two peers on the overlay network. This delivery requires a destination overlay net-
work address. And, just like for IP, we must also supply the source overlay network address because
communication is ultimately between two virtual sockets, and the information defining the two sockets
must be included in the message. There are many reasons for always including the source address even
if the communication does not require a response. For example, one needs to know the source of a
message to discourage denial of service attacks, to maintain audit trails, to do billing, and to authenti-
cate the source of a message for security reasons. Recall that a peerIdentity may contain cryptographi-
cally based information that permits source authentication by a receiver. Moreover, to simplify both
routing to the destination, and the destination peer node’s task of responding to the source, optional
routing information can be added. Please note that we are not specifying an implementation, and thus
not giving real values to any of the fields. A message implementation can have many forms, e. g.,
XML, binary, (name, value) pairs, etc... But, for reasons of performance we do suggest that the mes-
sage be in a binary format. While the current fashion of expressing everything including the “kitchen
sink,” in XML often has it merits, there is a performance hit one must take to analyze an XML docu-
ment. Given that ONP messages contain information that may be used, modified and possibly
extended on a message’s route to the destination peerNode, it is imperative that the ONP fields be
readily accessible without invoking an XML parse. On the other hand, the data payload is certainly a
candidate for XML, or any other format the application writer may prefer.
The ONP message is comprised of following fields:
ONP Header:

4-2
1. Version - ONP Version number. This assures a controlled evolution of the protocol.
2. Length - This is the total length in bytes of the data following the ONP header.
3. Lifetime - How long a message can live. There are multiple possible implementations for
this field. For example, it can be the maximum number of mediators this message can visit
before being dropped. Each mediator decrements the value, and when it reaches zero, the
message is discarded. Another possibility is the maximum time spent at mediators in tran-
sit. It could be initially the maximum value in seconds, and be decreased at each mediator
just before being forwarded to the next one. When the value reaches zero, it is treated
exactly as in the above case. When the lifetime expires a non-delivery message can be
returned to the source peer node.
4. Source Address - PeerIdentity
5. Destination Address - PeerIdentity
6. Connected Community Identity - Communication is restricted to a single connected com-
munity
Optional header extensions:
1. Multicast Group UUID - This is for multicast UMP messages only. In this case, the initial
destination address is set to this value if the multicast message is sent to a mediator for
propagation. See section 4.1.4 for the details.
1
2. Destination routing information - An ordered list, M1, M2,..., Ms , of the PeerIdentities of
the mediators that define the most recent route for the destination path to the destination
peer node. The destination path order is M1, M2,..., Ms.
3. Source routing information - An ordered list, M1, M2,..., Mt, of the PeerIdentities of the
mediators that define the most recent route for the return path to the source peer node.
Given the possible volatility of some P2P overlay networks, the information in the routing extensions
may be incorrect. It is a best guess given previous communication, and routing information established
by the use of the mediator protocols discussed in section 4.2. This information can reduce the network
bandwidth because it is reusable, and the receipt of messages keeps it reasonably current.
Routing on P2P overlay networks can take many forms. The above routing information extensions are
optional for this very reason. P2P overlay networks with a small number of mediators might have pre-
configured routes, and try to determine routes just before sending the message, while very large P2P
overlay networks may use more dynamic, mediator assisted routing mechanisms. Routing is discussed
in section 4.2. Finally, given the future possibility of billions of peer nodes and mediators, a flat space
will not be manageable. The routing information may be based on an hierarchical tree of mediators
with a structure that is similar to that of IP.v6 aggregators (see chapter 3, section 3.2.2.1).
After the ONP headers we have the data. It is formatted as follows:
1. Data Protocol - ACP or UMP
2. Data length - Number of bytes of data

1. In section 4.5 we add more precision to this definition by imposing an hierarchical structure on the P2P Overlay Network.

4-3
Figure 4-1 summarizes the ONP header fields.

Version Length Lifetime


Source Address
Destination Address
Connected Community Identity
Multicast Group UUID
Destination Routing Information
Source Routing Information
Data Protocol Data Length

Figure 4-1. ONP Header


Depending on the data protocol, what follows these headers is different. This is discussed in the next
section.

4.1.2 Universal Message Protocol


The Universal Message Protocol (UMP) is our UDP analogue. Like UDP it is unreliable, and connec-
tionless. One usually uses UMP for sending small amounts of data, or for simple queries. One might
implement an interactive game, or a “ping-like” protocol using UMP. The UMP format is as follows:
1. Source virtual port ID
2. Destination virtual port ID
3. Data payload

Source virtual port ID

Destination virtual port ID

Data Payload

Figure 4-2. UMP Format

Let’s assume we have a UMP status protocol. A peer sends a status request and receives a response. It
is a simple UMP message exchange. Let the following be true (see chapter 3, section 3.3.2 for a defini-
tion of virtual socket and the PeerIdentity Document):
1. The local PeerIdentity document has been created and published

4-4
2. The remote virtual socket, peerStatus, has been discovered
Then, we send and receive the UMP messages using the 4PL code:

// Create our own virtual socket: our PeerIdentity.randomVirutalPort


VirtualSocket loc_vs = new VirtualSocket();

// Bind the local and remote virtualSockets into a local channel


// identifier
VirtualChannel umpChan = new VirtualChannel(loc_vs, peerStatus, UMP);

// send the status request message. Note that sending to the peerStatus
// virtualSocket is all that is required. It listens for peerStatus
// commands.
ump_send(umpChan, NULL);

// receive a status message on a local virtualSocket


// The status information is in the data portion of the UMPMessage.
UMPMessage m = ump_receive(umpChan);

For a second example, suppose we have a go game application. Because the moves are simple position
descriptions on a 19 by 19 board, to return all of the positions given two bytes per position requires
722 bytes. Consequently, the UMP is appropriate for playing the game. We require a listening virtual-
Socket to initiate games, and then a channel for playing a game between two individuals. The follow-
ing 4PL describes how to setup a game, and sketches how it is played:

Document goPort = new Document(VIRTUALPORT, pidDoc, “goListener”, unicast);

// Now we create the Virtual Socket


VirtualSocket goListen = new VirtualSocket(goPort);

// We also publish the VirtualPort document


publish(goPort);

// Now we listen for incoming “let’s play go” messages


// The listen code only detects input on the goListen virtual socket.
// It is protocol independent albeit that UMP messages will be
// received on this channel
VirtualChannel playGo = listen(goListen, UMP);

// Here we remove the waiting message from an implicit input queue on the
// channel with a UMP receive.
UMPMessage newGame = ump_receive(playGo);

// We next accept the challenge to play a new game

4-5
// This requires the creation of a UMP channel
// Extract the source socket give type, field, and UMP message
VirtualSocket source_vs = extractSocket(UMP, SOURCE, newGame);

// create a local virtual socket with a random virtualPortID


VirtualSocket loc_vs = new VirtualSocket();

// Next, we again need a virtual channel that locally binds


// the local and remote sockets
VirtualChannel gameChan = new VirtualChannel(loc_vs, source_vs, UMP);

// We now have our game loop. The receipt of the first move
// is an acceptance to play Go. Underlying the code is
// a UI where the player makes a move. We can only hint
// at that code.
// Note that moves must be sequenced, and acknowledged.
GoBoard gb = showGoBoard();
integer oldSeqNumber = 0;
loop ()
BEGIN “play”;
// get next move from user interface
GoMove myMove = acceptMove(gb);
GoMove opponentMove = null;

// Send the next move as data


ump_send(gameChan, myMove);

// Wait for move and on a timer resend the previous move


loop ()
BEGIN “wait for reply”;
// ump_receive returns null if there is a time out
opponentMove = ump_receive(gameChan, timeout);
IF (opponentMove EQUAL null OR
opponentMove.seqNumber NOT EQUAL oldSeqNumber + 1)
THEN BEGIN “resend”;
ump_send(gameChan, myMove);
continue;
END “resend” ELSE break;
END “wait for reply”;

// keep us in sequence
oldSeqNumber = opponentMove.seqNumber;

// Have move display it


displayMove(opponentMove);
END “play”;

close(gameChan);
close(umpChan);

4-6
delete(loc_vs);
delete(goListen);

delete(goPort);

From the above examples it is clear that we intend UMP to be used for simple overlay network com-
munication. Game playing requires some reliability, for example, retransmitting the last move, but this
can be implemented in such a way as to not impact the real network traffic in a negative way. On the
other hand, if one is moving large amounts of data as would be done in a streaming video application,
UMP is inappropriate. The Application Communication Protocol described just below provides that
kind of functionality, and is an extremely important network protocol that must be included as part of
any P2P system. Once the ACP is written, then it is once and for all debugged and available to all ser-
vices and applications.

4.1.3 Application Communication Protocol


The Application Communication Protocol (ACP) is our TCP analogue. We assume that a reliable com-
munication protocol underlies our P2P overlay network, and we are not going to reinvent TCP/IP.
Because of the requirement for mediators to mitigate the problems of NAT, firewalls, and differing
physical layer protocols, it is possible that ONP messages can be dropped in route to their ultimate des-
tination. TCP/IP, for example, can only guarantee the delivery of a message between two peer-nodes if
they have TCP/IP connectivity. If one of these nodes is a mediator, it will have to buffer the message,
and just like for real routers, it is possible to run out of buffer space, and messages will be dropped.
Also, one may be doing video streaming, transport layer security, etc., and order is important here.
Messages must be received in the order in which they are sent. Mediators, like routers, cannot guaran-
tee the order of delivery. Thus, we must have our own reliable communication protocol on the P2P
overlay network, and this is what ACP accomplishes.
A reliable communication protocol like ACP has several basic requirements some of which we dis-
cussed just above.
1. We require a way to initiate sending messages on a channel. We call this starting a channel
session. Similarly, we must close a session. Consequently, we need a unique channel ses-
sion ID.
2. Given that a channel session can be started, and must also be gracefully closed, either the
sender both starts and closes a session, or a session may be closed by the recipient. In the
latter case, we say that a session is aborted.
3. Messages must arrive in the same order they were sent. Therefore, we need a sequence
numbering scheme so that each message has a unique sequence number associated with it.
These numbers must be monotonically increasing.
4. Given sequenced messages we need a way for the recipient to acknowledge the receipt of a
message, or several messages. To minimize bandwidth we will do a selected acknowledge-

4-7
ment (SACK) just like is done in TCP. Here, the receiver acknowledges all of the messages
it has received even if some may be missing. Also, if an appropriate amount of time expires
without receiving messages, then a duplicate SACK is sent to tell the sender that the
receiver is still alive and not receiving data. As discussed below, this is a sign that that the
P2P overlay network is congested.2
These basic requirements enable the communication between two peers in a normal operation. Figure
4-3 highlights ACP communication without any interference.

Start channel session

Response

Sender Send data Recipient

SACK

Close channel session

Figure 4-3. Basic Communication Behavior


4.1.3.1 Aborting an ACP Session
In the case where some of the messages are missing, we say there is a hole in the sequence numbers.
On the other hand, until the channel session is closed, it may be that the recipient has received and
acknowledged a complete sequence of messages while the sender still is attempting to send more mes-
sages but they are not getting through, i. e., the P2P overlay network is congested. Thus, on a reason-
able timer (reasonable is discussed below), the recipient will send a duplicate SACK which tells the
sender both to back off, and that the recipient is still alive. It may be that each is using a different medi-
ator route, and the recipient’s route is viable while the sender’s is not, or a mediator’s buffer space is
filled, and messages are being dropped. This is expected on a P2P overlay network because misbehav-
ior is normal. Things may slow down, and in fact, the recipient may abort the channel session after a
session time-out has expired.
In Figure 4-4 a ACP session starts normally but during the session sent data is no longer being
received. Let’s assume that two different routes are being used, and the sender to recipient route is no
longer functional. The recipient sends a duplicate SACK while the sender continues to send, and
retransmit unacknowledged data after receiving a duplicate SACK. Finally, the recipient aborts the ses-
sion because a session time-out expired. In Figure 4-4 M_n is message number n.
2. The receiption of a duplicate SACK implying congestion rule only applies to the non-wireless Internet phyical layer.
Wireless devices and their associated link layer characteristics are noisy with at times signal to noise ratios that cause data
to be lost. In this case, the rules one applies are often the opposite since the problem is related to loss and not congestion.
We discuss this in chapter VI.

4-8
Start channel session

Response

Send data (M_1, M_2)

SACK (M_1, M_2)


:
Send data (M_n, M_n+1, ..., M_n+j)

SACK (M_n,M_n+1,...,M_n+j-3)
Sender Recipient

Send data (M_n+j-2, M_n+j-1, M_n+j)

:
SACK (M_n,M_n+1,...,M_n+j-3)

Send data (M_n+j-2, M_n+j-1, M_n+j)

:
Abort

Figure 4-4. Recipient aborts session


Also, if a channel session has been closed, and the recipient receives messages on the closed session, it
will discard them. They are delayed retransmissions. Closed is closed. We do not need to perturb the
network with unnecessary traffic.
4.1.3.2 Maximum Message Space on the P2P Overlay Network
Because we are sending messages on an overlay network we need to consider two things: First, the
impact of message storage on the destination peer, and second, the message buffering capacity of
mediators if the two peers do not have direct communication. Each peer has an input buffering capac-
ity in bytes. During session initiation the receiving peer sends its Peer Maximum Message Space
(PMMS) to the peer initiating the channel session. The PMMS is the maximum number of bytes that
that can be sent. It can be one message of PMMS bytes in length or multiple message whose total size
is less than or equal to the PMMS. PMMS takes into account that multiple channels between peer1 and
peer2 may have active sessions at the same time, and is a per channel maximum.

4-9
In the second case, a mediator is like a store and forward E-mail service. Peer1 sends a message to
peer2 via one or more mediators. The final mediator hosts the destination peer, and must hold the mes-
sages for all active channels until the destination peer polls for them, or a time-out expires (here the
destination peer, for example, may have crashed). In order to respect the mediator’s storage capacity
by the means of the peer mediator communication protocol, a maximum allowable message buffering
size is sent to the peers it hosts. The Mediator Maximum Message Space (MMMS) is the same for all
mediators, and this is a predefined, global constant for the P2P Overlay Network mediators, or can be
negotiated at mediator boot time before any peerNode connections are permitted. This is discussed in
section 4.2. In large P2P Overlay Networks, the MMMS is likely to be large like a mailbox. Still, it is
functionally similar to the Maximum Transmission Unit (MTU) used for IP. The mediators do not sup-
port message fragmentation while applications may implement such fragmentation.
The Maximum Message Space (MMS) is the minimum of MMMS and PMMS when a mediator is
required. If the MMMS is less than the PMMS, then it is not wise to have more than one active channel
between two peers using a mediator for communication. If no mediator is used, then MMS is identical
to PMMS. This value is communicated to the application initiating the transmission prior to sending its
first message. Any attempt to send a message that exceeds the MMS must be rejected by the sending
peer’s ACP software with an appropriate error message. Why? We do not want to waste bandwidth by
sending the message to either a mediator or destination peerNode, and have it rejected there. It will by
necessity be rejected by these destinations.
The MMMS is an attempt to minimize the number of messages dropped by mediators. The idea is that
a mediator will guarantee storage for the peers it hosts, and at the same time take into consideration the
transient buffer space it must also allocate for routing. If the latter is exhausted, then indeed, a message
may be dropped rather than routed. We realize that multiple peers may in fact be communicating with
a single peerNode by the means of its mediator. In this case, it is possible that the hosted peerNode’s
space can be exceeded. This will be treated just like a mailbox that has reached its allocated quota.
Messages will be dropped, and mediator generated signals will be sent to the senders. The ACP will
force these transmissions to slow down until more space is available or they are aborted.
Figure 4-5 gives an example that Peer_1 sends messages to Peer_4 and the messages are stored in
Peer_4’s data storage space on its mediator. Then Peer_4 pulls the messages from its storage space.

4-10
P

Figure 4-5. PMMS and MMMS


To minimize the impact of transmitted messages on the P2P Overlay Network, we have several more
things to take into account:
We need to limit the number of messages that can be sent at a time. The transmission window is the
maximum number of bytes that can be sent without an acknowledgement. Its maximum value is MMS.
But, if a mediator is used, then realizing that the MMS reflects the size of a shared buffer that will be
periodically polled by the hosted peerNode, it is sensible for applications to send large messages in
multiple pieces. This is very general. If for example, MMMS = 10 megabytes, then sending large mes-
sages in 100K byte chunks is a reasonable rule of thumb, that is, PMMS is 500K bytes so that at least
5 messages can be sent without an acknowledgement. What must be kept in mind is if one is using a
security protocol like Transport Layer Security (TLS), then in reality, all messages are broken into
approximately 16K byte messages when implemented as part of the ACP protocol, or otherwise, 16K
byte IP packets if TCP/IP based implementation is used. So, nothing is gained by sending huge mes-
sages over TLS. ACP will mediate the overall behavior of message traffic. Good network behavior is
rewarded with good throughput. Bad behavior gets the opposite treatment.
4.1.3.3 Acknowledgement and Retransmission of ACP Messages
As mentioned above, an application cannot send a message whose byte size exceeds MMS. The trans-
mission window is decreased as the sender fills it with messages, and places these sent messages on the
retransmission queue awaiting acknowledgment from the recipient, as shown as in Figure 4-6. Once

4-11
the transmission window is filled, that is, when the size of the next message to be sent exceeds the
transmission window, then no new messages may be sent.

App. Send Msg

Y Do NOT send
Msg Size > MMS Signal Application
N

Y
Msg Size < TW Send ACP Msg

N (*)

Place on RQ Decrease TW

RQ = Retransmission Queue
TW = Transmission Window
Figure 4-6. Sending a Message
Retransmission can be triggered by three conditions:
1. The receipt of a SACK. a) This removes the acknowledged messages from the retransmis-
sion queue. b) If the retransmission queue is non-empty, one then retransmits unacknowl-
edged messages under the constraints discussed below.
2. The receipt of a duplicate SACK. This acknowledges messages that are no longer in the
restransmission queue. Apply (1b) above.
3. An idle timer expires, and the sender has not received a SACK. Again, apply (1b) above.

The number of messages in the retransmission queue that can be retransmitted is controlled by the size
of the transmission window. Retransmitting messages does not reduce the transmission window since
they are either already accounted for when they were initially sent, or the restransmission window dis-
cussed below is used to assure that the transmission window in not exceeded in the case (*) of Figure
4-6. Still, under certain circumstances one does not always restransmit all the messages that the trans-
mission window permits to be sent.
The P2P overlay network may be congested, and a peer does not wish to exacerbate this situation. How
do we know there is congestion? Again, the typical signal is the receipt of a duplicate SACK. Recall,
this tells us two things: First, the receiving end is still alive, and second, that the sender’s messages are

4-12
not getting through. To manage congestion, the sender will have a retransmission window. This is ini-
tially set to the size of the transmission window. When consecutive duplicate SACKs arrive, the sender
reduces the size of the retransmission window by one half until it finally reaches 0, as shown as Figure
4-8. The receipt of a SACK that acknowledges messages doubles the size of the retransmission win-
dow. However it never exceeds the value of the transmission window size. What to do when the size of
all messages on the retransmission queue exceeds the retransmission window size is discussed in the
next paragraph.

receive SACK

Remove ACKed message


from Retransmission Queue

Increase TW

Y
More ACKed messages
N

if 2*RTW < TW
double RTW

go to Retransmission Procedure

Figure 4-7. Receiption of Normal SACK

receive Duplicate SACK

if RTW size > 0


reduce RTW size by half

go to Retransmission Procedure

4-13
Figure 4-8. Retransmission with Duplicate SACK
After a sufficiently long period of time during which the recipient has not sent a SACK, the sender has
three options. First, the sender may close the session. Second, we do not want to retransmit a megabyte
message which will seriously impact the performance and capacity of the P2P Overlay Network. If an
unacknowledged message is less than 32K bytes, it can be retransmitted. In this case, we examine the
retransmission queue for the message with the minimum sequence number that fits the 32K criteria.
We want to fill holes in the recipient’s input queue so that the messages can be passed up to the
requesting application. Thus, sending any message is better than not sending a message. Figure 4-9
shows this option. Third, if all messages in the retransmission queue exceed 32k or retransmission
window, then a “Peer UP” Signal (PUPS) is sent on the channel. These signals are sequenced and only
the most recent PUPS must be kept on the mediator hosting the destination peer. They are out-of-band,
and when a hosted peerNode request data, PUPS have priority over data. This is a peer-to-peer signal.
Either a SACK or an ABORT are correct responses.

Y
All messages in RQ > 32K Send PUP Signal
N
Y
All messages in RQ > RTW Send 1st message in RQ < 32K
N

Find 1st msg < RTW

Retransmit msg

N RQ empty
Y
Done
Figure 4-9. Retransmission Procedure
In order to better control the frequency of retransmission two traditional variables are used. They are
the Round Trip Time (RTT) and the Retransmission Time-out (RTO). We say traditional because these
variables did not first appear in the original TCP RFC’s. Rather they preceded this specification, and
one of the authors wrote an implementation of the Parc Universal Packet Byte Stream Protocol (PUP/
BSP) that was specified in 1978 in which both RTT and RTO were defined. A round trip time is the
number of milliseconds from the time a packet is sent until it is acknowledged. The running average of

4-14
the 10 most recent round trip times is the RTT algorithm we recommend for the P2P Overlay Network.
The suggested initial RTT is 1000 milliseconds. The RTT can be calculated in many ways and we pre-
fer to keep it simple because algorithms governing protocols like TCP underlay ACP. Leave the com-
plexity to where it is required.
The RTO is calculated from the RTT, and is always greater than the RTT. When the RTO timer expires
one can retransmit messages respecting the transmission window size as discussed above. The RTO
grows monotonically as a function of the RTT if the recipient does not acknowledge messages. For
example, a simple rule is that the initial RTO = 2 x RTT, and it is doubled until a SACK is received up
to a maximum of 10 minutes. When a SACK is received for an outstanding message, the RTO is reset
to its initial value, as shown as in Figure 4-10. On a purely ad-hoc, P2P Overlay Network we recom-
mend a 30 minute abort channel session time-out. If this network is in an enterprise, well adminis-
trated, and the mediators all of sufficient capacity, then a channel session time-out closer to the TCP
time-out can be considered. We still prefer to be more forgiving and suggest, as mentioned above, 10
minutes.

Start

RTO = 2*RTT, Received SACK = false

Wait RTO mSecs

Y
Received SACK

RTO = 2* RTO

N
RTO>Abort timeout Retransmit

Abort Session

4-15
Figure 4-10. Retranmission and RTO
4.1.3.5 Closing an ACP Session
When a session has gone to a successful completion, the initiating application will close the channel
and this invokes the ACP close procedure. The closing peerNode sends an ACP close message to the
destination virtualSocket. It then waits for a close acknowledge response using the RTT as a timeout. If
this response is not received within the timeout window, then the ACP close is resent and the session is
closed as if it had received the response.

Send Close

Wait RTT mSec for Close ACK

N
ACK received Send Close
Y

Done

Figure 4-11. Closing a Session

4.1.3.6 ACP Messages


Given the above requirements and behavior we can now define the ACP messages and their header
fields:
The first ACP message is to initialize a ACP channel session, and has the following headers:
1. ACP message Type - Start ACP session
2. ACP session identifier - Sending peer generated unique identifier
3. Source Virtual Port ID - Sending peer unique port identifier
4. Destination Virtual Port ID - Receiving peer VirtualPort UUID

The second ACP message is a response to the Start ACP session message:
1. ACP message Type - Acknowledge start ACP session
2. ACP session identifier - Sending peer generated unique identifier
3. Source Virtual Port ID - Receiving peer VirualPort UUID
4. Destination Virtual Port ID - Sending peer unique port identifier
5. PMMS - Receiving peer maximum message Space

4-16
The third ACP message is to send data, and has the following headers:
1. ACP message Type - ACP Data Message
2. ACP session identifier - Sending peer generated unique identifier
3. Source Virtual Port ID - Sending peer unique port identifier
4. Destination Virtual Port ID - Receiving peer VirtualPort UUID
5. Sequence number - monotonically increasing sequence number starting from i. Given a
sequence number j, j >= i, the successor of j is j + 1.
6. Data size in bytes
7. Message Data

The forth ACP message is selective acknowledgement:


1. ACP message Type - Acknowledge ACP Data Message
2. ACP session identifier - Sending peer generated unique identifier
3. Source Virtual Port ID - Receiving peer VirualPort UUID
4. Destination Virtual Port ID - Sending peer unique port identifier
5. Acknowledged sequence numbers - monotonically increasing, sorted list of the sequence
numbers of received data messages. Given two sequence numbers in the list, i, and j, if i < j,
then i precedes j in the list.
The fifth ACP message is close ACP session:
1. ACP message Type - Close ACP session
2. ACP session identifier - Sending peer generated unique identifier
3. Source Virtual Port ID - Sending peer unique port identifier
4. Destination Virtual Port ID - Receiving peer VirtualPort UUID

The sixth ACP message is to response to close ACP session:


1. ACP message Type - Acknowledge close ACP session
2. ACP session identifier - Sending peer generated unique identifier
3. Source Virtual Port ID - Receiving peer VirualPort UUID
4. Destination Virtual Port ID - Sending peer unique port identifier

The seventh ACP message is an abort session:


1. ACP message Type - Abort ACP session
2. ACP session identifier - Sending peer generated unique identifier
3. Source Virtual Port ID - Receiving peer VirualPort UUID
4. Destination Virtual Port ID - Sending peer unique port identifier
5. Error message - Optional receiver supplied text error message

The eighth is an ACP Signal:


1. ACP Signal Type - Peer UP Signal
2. ACP session identifier - Sending peer generated unique identifier
3. Signal sequence number - monotonic increasing non-negative integer
4. Source Virtual Port ID - Sending peer unique port identifier

4-17
5. Destination Virtual Port ID - Receiving peer VirtualPort UUID
Similar to UMP, we need 4PL to establish the ACP communication between two peers In order to
demonstrate this, the following assumption is are made:
1. The local peerIdentity document, pidDoc, has been created, and published

Based on the above assumtions, one application, such as file transfer, can use following 4PL functions
for ACP communication.

Document filePort = new Document(VirtualPort, pidDoc, “fileListener”, uni-


cast);
// Now we create the Virtual Socket listener
VirtualSocket fileListen = new VirtualSocket(filePort);

// We also publish the VirtualPort document so other peerNodes can connect


// to our peer ftp daemon
publish(filePort);

// Now we listen for incoming requests to send files


VirtualChannel listenChannel = listen(fileListen, ACP);

// Listen returns signaling ACP message has arrived


// We need to locally create a transfer channel:
// fileListen, source VirtualSocket
VirtualChannel transferFile = acp_accept(listenChannel);

// We assume a protocol is present, and the first message


// contains the number of total bytes to be received and
// some data.
// Here we receive the message from an implicit queue on the channel
ACPMessage fileMsg = acp_receive(transferFile);

// Read file size


integer fileSize = fileMsg.fileSize;

loop()
BEGIN “receiving file”;

integer nDataBytes = fileMsg.dataLength;


byte data = fileMsg.getData();

// assume data is saved to a file


// Account for data received
fileSize = fileSize - nDataBytes;

// See if we have received all data


IF (fileSize EQUAL 0) THEN break;

4-18
ELSE
// read the next message
fileMsg = acp_receive(transferFile);

IF (fileMsg EQUAL null) THEN


BEGIN “Error”;
doErrMessage();
break;
END “Error”;
END “receiving file”;

close(listen);
close(acp_accept);

delete(fileListen);
delete(filePort);

4.1.4 Multicast on the P2P Overlay Network


So far we have discussed the unicast pipes and the three communication protocols in the context of
establishing contact between peers, and permitting them to communicate over channels. The channels
in these instances are bidirectional. In this section we introduce the notion of multicast on the P2P
overlay network. Why? At times the need arises for unidirectional, unreliable, UMP based communi-
cation among a group of peerNodes in a connected community. Applications like chat rooms that use
ACP channels require N2 connections without mediators. If a mediator is required, and unicast is used,
then 2N2 connections are necessary. The latter communication cost can be quite high. In some chat
rooms a degree of unreliability is acceptable. Also, if one wishes to send a message to all active mem-
bers of a connected community, and the message is not required to be reliable, then a means to accom-
plish this kind of communication is necessary. This does not eliminate the use of ACP, rather, the
general multicast protocol we will soon discuss is unreliable. We will also point out in the next section
how a reliable multicast can be implemented using the mediator multicast protocol.
Multicast requires three things. First, multicast on the P2P virtual overlay network is not true multicast
in the sense that latter is hardware based. That is to say, a system enables a multicast address on its net-
work interface. Then all multicast packets with this destination multicast address detected by the inter-
face are passed to system network software. The use of a host multicast group as an Internet multicast
standard first appears in RFC1112. IPv6 allocates 112 bits for multicast groups [RFC2373] and
accomplishes multicast by this mechanism. Both of these techniques use hardware address based mul-
ticast on the ultimate destination local network. The P2P virtual overlay network is software defined.
Thus we need to have a means to imitate multicast without hardware. To provide this functionality the
ONP headers have a multicast group UUID. If the same UUID appears in the multicastGroup field of
a virtualPort document of type multicast, then the mechanisms described below permit the ONP mes-
sage to be delivered to that virtualPort. Membership in a multicast group is application dependent

4-19
within a given connected community, and a single member can belong to more than one multicast
group at a time. We assume an application wishing to use multicast will generate the required multicast
UUID. It is also possible to have a collection of application wide, well known multicast group UUID’s.
Second, it follows that each peerNode creates a UMP virtualSocket that listens for input, is marked as
type multicast, contains the multicast group UUID in the virtualPort document and is then published
locally if multicast is enabled. Third, using the Mediator Multicast Protocol, the virtualPort must be
registered with the peerNode’s hosting mediator. The mediator, also using the Mediator Multicast Pro-
tocol, publishes its own association with the multicast group UUID to all mediators in the CC mediator
map for the connected community in question. Each mediator hosting a peerNode that is listening for
multicast within a multicast group will maintain a list associating the peerNode’s communication
information with the multicast group UUID. The actual delivery of multicast messages also uses the
Mediator Multicast Protocol. This P2P Overlay Network protocol is required because long range mul-
ticast is not generally supported on the Internet for both IP.v4 and IP.v6, and therefor cannot be
counted upon for multicast communication beyond the local subnet. These protocols and details are
discussed in section 4.2.
Additionally, on physical layers that do support multicast, as mentioned in the preceding paragraph,
the multicast virtual port documents are locally published in this way, thus eliminating the need for
mediated multicast. Similarly, if two peerNodes have direct network connectivity, even if their publish/
subscribe mechanism may be mediated, they also have no need for mediate multicast. In both of these
cases mediated multicast must never be used. The cost of communication is N2 and one does not want
to burden the mediator with unnecessary message traffic. In fact, in these latter two cases, a single
peerNode member of the multicast group can agree to represent the other members if they share the
same mediator, and in this manner proxy multicast messages between the mediator and the other peer-
Nodes to which it has direct connectivity. This implies that the proxied peerNodes use the proxy to
send multicast messages to the Mediator, and vice-versa. Such a peerNode is called a CC mediator-
multicast proxy. It is important to note that multicast is possible on the overlay network between any
two peers that have a direct real network connection, and the multicast group organization still works.
It simply is not mediated multicast. The publish/query procedures discussed in 4.2 permit these peerN-
odes to directly exchange the peerIdentity and virutalPort documents necessary to establish direct mul-
ticast group communication.
However, it is possible that given these alternative publication mechanisms duplicate messages may be
received by some peerNodes. Without taking into consideration possible routing errors, there are three
solutions to this problem:
1. Permit duplicates and let the application manage the problem with sequence numbers or
something similar. Note that well written applications that use UMP multicast must always
consider the possibility of duplicate messages.
2. Only use mediated multicast to prevent the duplication that arises from physical layer mul-
ticast. We discourage this approach.

4-20
3. The mediator has at it’s disposal sufficient information in the ONP headers and peerIdentity
documents to decide if two peerNodes are on the same subnet and if multicast is enabled. If
this is true, then the mediator will not deliver messages to one peerNode from the other
peerNode.
If two peerNodes have direct network communication and also use mediators to multicast to those
peerNodes that do not, then we cannot prevent them from receiving duplicate messages. In the end
solution (1) is the simplest to implement.
Now, let’s assume that some members of a CC are running an application that is using multicast. With-
out loss of generality, let’s say this is for inter-CC status messages. Any such peerNode can then con-
nect to those peerNodes in the multicast group for which it has direct network connectivity, and
multicast on the P2P overlay network to these peerNodes, and second, send the message to the hosting
mediator. That mediator will store the message locally in the buffers for those peerNodes it hosts and
who have registered for the service as members of the multicast group. Next, it will forward the multi-
cast message to all mediators in the CC mediator map that have shown interest in this multicast group,
that is to say, for which a hosted peer has published a virtualPort document for that multicast group on
the latter mediator. This is a general description of how multicast works on the structure shown in Fig-
ure 4-12. Note that Peer_1 is a mediator-multicast proxy for Peer_3 and Peer_4. The details of medi-
ated multicast are discussed in the next section.

One CC Mediator_1 Mediator_2


Peer_1 register publish
P1 : UUID P1 : UUID
port UUID P2 : UUID publish P2 : UUID

register

UUID UUID UUID


Peer_3 Peer_4 Peer_2

UUID - MulticastGroup UUID in VirtualPort document

4-21
Figure 4-12. Multicast on P2P Overlap Network

4.2 P2P Mediator Protocols


Because of the underlying, real network limitations we’ve previously discussed, mediators are at the
heart of a P2P system. In the general case that exists on the Internet today peerNodes do not have
direct connectivity, and they require at least a third party, or out-of-band communication to discover
one another. A mediator fulfills this latter requirement on the P2P Overlay network. And, to communi-
cate with one another if they are both NAT limited a 3rd party is required for the average user that has
no idea how to hack a NAT. If one of two peerNodes is NAT bound, then the NAT bound peerNode can
contact the other, but not conversely. It is important to keep in mind our goal. A straight forward, stan-
dardized way to facilitate end-to-end communication with either participant initiating the contact. A
mediator again provides this functionality as a willing helper-in-the-middle. We do note however, that
peerNodes may in fact have direct network connectivity, and in this case, along with the P2P Mediator
Protocols, we will require PeerNode-to-PeerNode (PNToPN) protocols. These will use many of the
same mechanisms discussed below, but will always permit peerNodes to take advantage of the gains
from direct connectivity. It may in fact be the case that administrators of P2P Overlay networks will
not want to relinquish the implicit control that mediators give them, so we must be careful to make
allowances for administrative control over the decision to use the PNToPN protocols. These are dis-
cussed in section 4.3.
Finally, let us imagine that IPv6 has been fully deployed; that NAT’s no longer exist; the end-to-end
communication requirement is fulfilled; and we have the expected billions of systems on the Internet.
Why do we need Mediators? How can one proceed to exchange information between systems that
spontaneously arrive and leave the Internet? First of all, some degree of organization is required even if
it is just several friends communicating their IPv6 addresses to one another (remember that these
address are 16 bytes long), and each of these persons creating a private hostname table for the conve-
nience of name to address translation. This works fine but does not scale; there is no reliable way to
update this hostname table; presence detection is possible but will lack the analysis required to make
sure bandwidth utilization is moderated; ad-hoc behavior is difficult, i. e., a new system wishing to join
up with these systems of some friends has no standard way of doing so. Ultimately this approach leads
to chaos with 100’s of 1000’s of 1000’s of isolated collections of systems unable to inter-communicate.
Well, one can look at the ISP solution as a possibility. First of all one must then pay for the service, and
the question arises, “Can we do as well or even better for free?” Second, we have created a dependency
that is unnecessary for P2P. If you and your neighbor wish to communicate with another without a 3rd
party charging you to do, we drop back to the situation described above. 99.99999% of Internet users
do not want do administer their connectivity and keep local host tables, or create a shared LDAP ser-
vice and worry about that system. Somewhere in the middle of all of this the idea of P2P arrives using
software that creates, for the most part, self-administrated, connected communities where individuals
have the freedom to do as they wish. Clearly, ISP’s or the ISP’s of the future will have a role to play

4-22
here. They can in fact provide, maintain and administer the systems that one uses for mediators, and
many people will be willing to pay a small charge for this service. It will be a commodity where there
is profit in numbers. On the other hand, it is not required, and this is the key. In all cases, we need
mediators, or those friendly systems-in-the-middle that have protocols smart enough to keep it all run-
ning. That is what this section addresses. It as a good idea to review section 3.5.3.1 on the limitations
imposed by the physical layer before continuing.

4.2.1 Mediator Document


Because our architecture must be consistent, and communication must use the ONP stack, like all
other peerNodes, a mediator requires an identity and an associated mediator document. Here security
considerations are relevant. When a peerNode first connects to a mediator there is an assumed trust that
in fact the mediator is the intended system it wishes to use. This is similar to responding to the ques-
tion, “Is that web server from which I am buying an automobile really the one I believe it is.” To
address this issue we again mention CBID’s as a desirable identity solution. The mediator UUID can
be a CBID which at least in the weakest sense requires the contacted mediator to verify it owns the
X509.v3 certificate from which the CBID is generated. This is attackable and these attacks can be
either minimized or prevented. This is discussed in chapter 5.
The mediator also needs a name and a description to be human user friendly. Since we wish to mini-
mize the complexity of the communication software, a mediator requires a collection of virtualPorts to
support the protocols it uses. Next, the mediator’s real transports must be described. All connections to
mediators are managed at the real transport level, and all messages with a mediator as one end of the
connection use the UMP. Therefor, if an application sends an ONP message to another application by
the means of a mediator, then that ONP message will be encapsulated in another ONP message that is
sent directly to the mediator in question. Also, because we will be required to manage our P2P Overlay
Network, a well known management connected community is defined. This CC has a UUID that is
known to all mediators at boot time, and that is also advertised to peerNodes by the means of this doc-
ument. Finally, the mediator document contains the mediator map’s MMMS (see section 4.1.3.2).

4-23
Version Length Lifetime
Source Address
Destination Mediator Address
Connected Community Identity
Multicast Group UUID
Destination Routing Information
Source Routing Information
UMP Data Length
Source Virtual Port
Mediator Router Virtual Port

ONP Message

Figure 4-13. Mediator / PeerNode Encapsulation


Here we also have a bootstrap problem. In general, the transport address or addresses, and the virtu-
alPort used for first contact to at least one mediator must be known by the peerNode. There are many
methods to bootstrap this information. For example, an LDAP directory entry, DNS, web server, email,
another peerNode on the same subnet, etc.The method will be part of the P2P software itself. For now,
let us assume that such an “hello” transport address and virtualPort are known. We call this latter port
the greeting virtualPort. This combination of information is named the HelloTransportInfo.
Note that these values are also in the mediator document for verification. As is discussed in the chapter
5, this document may be signed. The latter along with the CBID and a session based challenge will
assure that there is neither a imposter nor “man-in-the-middle” attack in progress. An example of such
a pair might be:
tcp://129.44.33.8.9510, hello-virtualPort UUID.
Given this introduction we define the Mediator document as follows:
Document type = MEDIATOR
Content tags and field descriptions:

4-24
<mediatorName> Legal XML character string [XML] </mediatorName>
<mediatorUUID> uuid-Legal UUID in hexadecimal ascii string </mediatorUUID>
<description>
<text> Legal XML character string [XML] </text>
<URN> Legal Universal Resource Name </URN>
</description>
<comprotocols>
<real> real protocol URI </real>
</comprotocols>
<virtualPorts>
<greeting> uuid-Legal UUID in hexadecimal ascii string </greeting>
<PNToMed> uuid-Legal UUID in hexadecimal ascii string </PNToMed>
<MedToMed> uuid-Legal UUID in hexadecimal ascii string </MedToMed>
<router> uuid-Legal UUID in hexadecimal ascii string </router>
<MedInfo> uuid-Legal UUID in hexadecimal ascii string </MedInfo>
<multicast> uuid-Legal UUID in hexadecimal ascii string </multicast>
</virtualPorts>
<managementCCUUID> {EDIT is this needed?}
uuid-Legal UUID in hexadecimal ascii string
</managementCCUUID>
<MMMS> Integer in ASCII format </MMMS>

The mediatorName, mediatorUUID and description are as in the peerIdentity document. Just like for
the peerIdentity document we require all communication protocols to be included. Mediators may be
multi-homed with more than one IP address, or use a transfer protocol like http. It is also possible that
two different physical layers are in use and they have different network transports.
Next we have the mediator’s virtualPorts. Each is associated with a protocol that the mediator sup-
ports. A brief description of each protocol follows:
1. Greeting - This is used to establish the initial connection between a peerNode and its medi-
ator, or between two mediators. This protocol is required before any of the following proto-
cols will be accepted. As mentioned above, it is included in this document for verification.
The mediator greeting protocol, section 4.2.2, uses this virtualPort.
2. PNToMed - In section 4.2.3 we describe the PeerNode-to-Mediator communication proto-
col that uses this virtualPort.
3. MedToMed - The Mediator-to-Mediator Communication Protocol. See section 4.2.4.
4. Router - The Mediator Routing protocol, section 4.2.5, between mediators, or mediators
and peerNodes uses this virtualPort.
5. MedInfo - The Mediator Information Query Protocol between mediators, or mediators and
peerNodes is for passing status information like load, and number of connections between
these nodes. Something like SNMP [SNMP Ref] could be implemented here. See section
4.2.6 for the details.
6. Multicast - This virtualPort handles all aspects of multicast. See section 4.2.7.

4-25
Mediator

Greeting PNToMed Router MedToMed MedInfo Multicast

Real Transport port

Peer_1 Peer_2 Peer_3 Peer_4

Figure 4-14. Mediator VirtualPort Connectivity


It is important to note that once these virtualPorts are known through the greeting protocol, then no
further discovery is required. Unlike the application virtualPorts, the mediator virtualPorts can be
directly contacted using the underlying transport information, and without acquiring an associated vir-
tualPort document.
Also, all of these protocols are subject to looming denial-of-service attacks. To confront these attacks
at the front door, each protocol has as one of its responses a challenge that has a global effect on the
attacking peerNodes behavior. It is certainly possible in some P2P configurations that a peerNode can
attack multiple peerNodes or mediators. To this end an effort will be made to broadcast a list of such
attackers to known peerNodes or mediators. Detection is complicated because attackers can hide their
source identity. We have ways to combat this behavior and they are covered in chapter 5. The mediator
greeting protocol is described in the next section.

4.2.2 Mediator Greeting Protocol


The “greeting” protocol is initiated by a peerNode or mediator when it first contacts any mediator.
PeerNodes will be hosted by a single Mediator, but in all probability will require more than one medi-
ator to, for example, appropriately route a message. And certainly, in all but very small P2P overlay
network infrastructures, mediators will contact multiple mediators. Recall that initially all peerNodes
are in the public connected community, and all mediators maintain mediator maps for this same con-
nected community. The ONP / UMP messages used in this protocol will have the public connected
community’s UUID in its appropriate place following the source and destination virtualPorts.
The greeting protocol is for exchanging two documents. For a peerNode to mediator’s first contact, the
peerNode sends an greeting message which contains the text “greetings” followed by its peerIdentity
document. The responses are as follows:

4-26
1. The responding Mediator sends a message containing “welcome” followed by its mediator
document. The receipt of the document acknowledges that the peerIdentity document has
been received; the mediator has placed the peerNode in its public connected community;
and the peerNode’s M-INBOX has MMMS bytes reserved for the receipt of messages.
2. The responding Mediator sends a message containing “redirect” followed by a list of alter-
native mediators to try. Here, the mediator is saying that it is overloaded.
3. The responding Mediator sends a message containing “Challenge” accompanied with a cli-
ent puzzle to be solved. In this case, the next greeting command must contain the solution
and the either the peerNode or mediator document as is appropriate. These puzzles typi-
cally send a hash of N bits, for example, 160 bits of a SHA-1 hash, of J bytes, along with J-
K bytes. The challenge is to find the K bytes that satisfy the hash. Sometimes multiple sets
of the above are sent to increase the difficulty [RSAPUZZLE].
4. The responding message sends a message containing “refused.” In this case, the sender has
been refused service.

Command: Greeting text paramter


text := “greeting”
paramter := peerNode document | mediator document | puzzle solution

Response: type ReturnValue


type := “welcome” | “redirect” | “challenge” | “refused”

If type == “welcome” OR type == “redirect”, then

ReturnValue := {N} L(I), I = 1,...,N


N := 1 if type is “welcome”
L(I) := Mediator document

If type == “challange”, then

ReturnValue := Puzzle to solve

if type == “refused”, then

ReturnValue := Optional explanation.

For the initial contact between mediators, there is an exchange of mediator documents using the same
“greetings” and “welcome” preambles. The mediators use the public connected community in the CC
ONP header field. Also, mediators exchange lists of known mediators to help establish the public CC
mediator map, and the keep-alive average round trip time. Each element of this list will have four
members: (Mediator UUID, Mediator MedToMed Port, Mediator real transport addresses, Maximum-

4-27
CCMediatorGet). Finally, the Mediator-Election-ID is include. This is the identity of the next election
to be held. See section 4.2.4 on the MedToMed protocol for more details.

Greeting Msg

Greeting Msg Challenge

Welcome Msg Solution


PeerNode / Mediator PeerNode / Mediator
Mediator Mediator Welcome Msg

Figure 4-15. Greeting Protocol

Version Length Lifetime


Source Address
Destination Mediator Address
Connected Community Identity
Multicast Group UUID
Destination Routing Information
Source Routing Information
UMP Data Length
Source Virtual Port
Mediator Greeting Virtual Port

"greetings"
Document or
Document and Solution

Figure 4-16. Greeting Message

The above scenario is very ad-hoc in nature. We have not mentioned security issues. For example, has
the peerNode really connected to the mediator it believes it has, and is the peerNode one of those the
mediator supports. It is both possible that peerNode is falling victim to an imposter attack, and that the

4-28
mediator has a list of peers to which its support is limited. While the details of resolving these issues
are in chapter 5, we can say that there a multiple ways to assure the above communication is secure
and that the strength of this security varies as a function of the requirements of the connected commu-
nities involved. At each phase of a protocol’s behavior security questions will arise that must be
addressed.
Once the greeting protocol has completed, both the peerNode and mediator can begin to do what is
necessary to support the multitude of possible P2P applications. The peerNode-to-Mediator communi-
cation protocol enables application level communication within private connected communities and is
discussed next.

4.2.3 PeerNode-to-Mediator (PNToMed) Communication Protocol


As mentioned in chapter 3, section 3.4, a mediator maintains a CC mediator-map for the CC’s of the
peerNodes it hosts, and all of its hosted peerNodes are automatically members of the public CC. The
public CC is used to access public documents that the hosted peerNodes wish to share with all peerN-
odes. The non-public CC’s may or may not have access control policies, and this is CC dependent. One
functionality of the PNToMed protocol is to provide the mechanisms for publishing, subscribing to,
and finding documents in CC’s. There are many ways to implement publish, subscribe and search algo-
rithms, and these are discussed in sections 3 and 5 of this chapter.
An example of a public CC accessible document is a CC document. One needs a bootstrap procedure
to discover CC’s, thus, be able to join them, and this “bootstrapping” can be done in the public CC. We
realize that in some instances CC discovery may avoid using the public CC, and either be done through
CC mediators that are restricted to the CC’s that they support, by placing the CC documents in the P2P
code, using an LDAP directory to get these documents, etc. Still, we need a way to discover CC’s with-
out out-of-band techniques, and we believe for the most part, that CC’s will be discovered by the
means of the public CC and mediators. The mechanism is in place for that purpose.
Next, a peerNode uses this protocol to receive messages that are queued for it on the mediator. These
messages all contain the CC community UUID, and it is the mediator’s responsibility to correctly
deliver them. If the destination peerNode is not in the mediator-CC map associated with this CC
UUID, then the message is not delivered, that is to stay, stored in the peerNode’s INBOX.
To these ends the PNToMed protocol has the following commands:
1. Notify - A peerNode notifies the mediator that it is active, or inactive as a member of a CC,
or is no longer interested in being hosted as a member of that CC, that is, remove. The CC
UUID is the connected community identity in the ONP message header.
2. GetMessage - A peerNode pulls a message from its INBOX on the mediator.
3. Publish - A peerNode publishes meta-data to the mediator in a CC. Note that publish has
many possible implementations but in all cases meta-data describing documents is pub-
lished rather than the documents themselves. We do not say what representation this meta-

4-29
data takes, and for example, a (name, value) pair may be used for indexing and be posted
on the mediator.
4. Query - A peerNode searches for a document. Just as above, we cannot say where the doc-
ument is. Rather, that the mediator assists in finding it. This request has a counterpart which
is the query response.
5. Content Request - A peerNode has sufficient meta-data information from a query for appli-
cation data to request that content from an owning peerNode. It uses this command to
invoke the process to retrieve the content. This request has a counterpart which is the con-
tent response.
6. Mobile Agent - A peerNode can send a mobile agent to be run on a Mediator using this
command. The use of this command is discussed in detail in chapter 7, section 7.3.

4.2.3.1 The Notify Command


We assume that a peerNode using the CC-Protocol has become a member of a CC with the CC UUID,
UUID0. Then, as a member of this CC it can either become an active member, that is to say, wishes to
be both discovered and communicate in the CC, or become inactive. It sends the Notify command to
its hosting mediator at the PNToMed virtual port with Notify and the CC status it wishes to set as the
UMP parameters in the data payload. In response the mediator sends an OK, DENIED, or a CHAL-
LENGE. Each response will have the appropriate reason which is implementation dependent. For us
we use a text description:
1. OK “Notify/Active,” OK “Notify/Inactive,” or “Notify/Remove.”
2. DENIED “CC Quota Exceeded - Try Notify/Remove”
3. CHALLENGE “Solve the following puzzle” which is accompanied by a client puzzle to
solve.
An example of the Notify command is in Figure 4-17.

4-30
Version Length Lifetime
Source Address
Destination Mediator Address
UUID_0
UMP Data Length
Source Virtual Port
PNToMed Virtual Port

"Notify / Active"

Figure 4-17. PeerNode notifies mediator it is active in CC UUID0

When the mediator receives an active CC status from a peerNode, then if it is not a member of the CC
mediator map for this CC, it adds itself to this map, and will notify all mediators in the mediator map
(see chapter 3, section 3.5.3) that it is now a member of that CC mediator map using the MedToMed
protocol discussed in section 4.2.4. We only note here that this notification is done in such a way as to
minimize its impact on the P2P Overlay Network. During this notification process a private copy of the
CC mediator map is built.
With respect to the peerNode Notify/Active command, the mediator does several things to initialize the
CC context:
1. It adds the peerNode to a list of hosted peerNodes for this CC. When this list becomes
empty for any of the reasons below, or by the means of the Notify/Remove command, the
mediator removes itself from the associated CC mediator map, and suitably notifies the
other mediators in the mediator map. When all such mediators have been notified, it flushes
all data associated with that CC. This procedure is discussed in detail in section 4.2.4.
2. For each peerNode a list of the CC’s to which it belongs should be kept. This simplifies
publication, subscription and query mechanisms.
3. It keeps an expiration time for the binding between a peerNode and a CC. This time is
increased as long as the peerNode shows an active interest in this CC. If the interest is idle
for the expiration time, then the peerNode is removed from the list in 1, and its quota
counter in (4) is decremented.

4-31
4. For each peerNode it keeps a quota counter of the number of CC’s to which it belongs. The
mediator certainly has the right to have a quota, and as noted in the “DENIED” response,
can refuse further Notify “Active” commands.
5. It maintains the time between successive notify commands. If the frequency of these com-
mands appears to be a DoS attack, the Mediator responds accordingly as is discussed in
chapter 5.
6. A set of relationships is created to insure the correct delivery of CC messages for this now
active CC. For example, associated with each CC can be a list of pointers into the M-
INBOX for the CC messages this peerNode has received, or the M-INBOX can be union of
multiple, non-intersecting, M-INBOXs, one for each CC. From experience, we believe the
latter approach is preferable because it makes the Notify/Remove command easier to
implement since messages stored for a particular CC may have to be expunged.
7. Structures are created for the publication of, and subscription to content within the context
of the CC. In general being a member of a CC will create a subscription to the CC’s docu-
ments. The mediator does not manage CC’s access rights. These are CC dependent. For
example, when a peerNode publishes a document in a CC, it may only be indexed across
those mediators in the mediator map for that CC, and structures must be put in place to sup-
port this mechanism. On approach is for the list of edge peers in this CC to have a pointer
to the mediator map for that CC.
The Notify/Inactive command has three parameters: Flush, Delete, Keep which are explained below.
Also, this command leaves the above data structures in place and takes the following actions with
respect to messages in this CC:
1. The arrival of this command blocks any further input to the M-INBOX, that is, the storing
of any message that is in progress can finish. But, before storing a message, the task that
stores them tests the CC active state for this peerNode. If that test is false, then no further
messages can be stored in the M-INBOX.
2. Notify/Inactive/Flush - The mediator will send to the peerNode all of the pending messages
in the M-INBOX.
3. Notify/Inactive/Delete - The mediator will delete all pending messages.
4. Notify/Inactive/Keep - This will keep any messages in the M-INBOX until an expiration
time expires. The peerNode may send an additional parameter which is the expiration time
in minutes. The mediator may also have an expiration time and in this case the minimum of
these two times is used.
The Notify/Remove command does the following on the mediator:
1. If there are pending messages in the M-INBOX and the Notify/Remove/Flush command is
received, then these messages are sent to the peerNode. The default action is to delete all
such messages. All messages arriving after this command has been received are ignored.
2. All data structures associated with this peerNode and the CC are expunged.
3. The peerNode’s CC quota counter is decremented.

4-32
M-INBOX(CC's)
counter Peers
CC1 exptime P1
CC2 exptime CC1 CC2
P2
counter P3
CC1 exptime CC1 CC3
CC3 exptime

counter
CC2 exptime CC2 CC3
CC3 exptime
Figure 4-18. Mediator Structures for Notify Command

4.2.3.2 The GetMessage Command


Recall that the peerNode is restricted to receive messages from within one of any of its active CC’s. We
define M-INBOX(CC) to be all of those messages in a given M-INBOX from peerNodes in that partic-
ular CC. Thus, in a getMessage command message sent from a hosted peerNode, the (source peerIden-
tity, CC UUID) pair in the ONP header selects for the hosted peerNode the M-INBOX(CC) to be used.
For this reason the getMessage command has a subcommand, GetMessage/WhichCC, to ask for all the
CC’s for which it has pending messages. In this case, the CC UUID in the ONP header will be all 0’s.
This form of the GetMessage command is defined as follows:

Command: GetMessage/WhichCC

Returns: List of pairs, (CC UUID, N), where N is the Number of Messages in
the respective Virtual M-INBOX, N >= 0.

Given a particular CC UUID, the messages are from one or more peerNodes in the CC. It may well be
that the requesting peerNode would like to select messages from a single peerNode. Applications can
have priorities of which the mediator is unware. To this end, given such a CC UUID, we have the Get-
Message/PeerIdentity command:

Command: GetMessage/”FromPeerIdentity” (Note this is a text string and not


the peerIdentity as a UUID)

4-33
Returns: List of pairs, (peerIdentity, N), where N is the number of messages
pending from that peerIdentity, N >= 0, in the M-INBOX(CC) identified with
CC UUID in the ONP header.

Finally, to get messages from a given M-INBOX(CC) where the peerNode is currently active for this
CC, we use the GetMessage commands defined as follows:

Command: GetMessage[/peerIdentity][N]

Returns: Get 1 or more messages in M-INBOX(CC), or none if M-INBOX(CC) is


empty. If the peerIdentity is present, then only those messages from that
peerIdentity are returned. If N is present then at most N messages are
returned, otherwise the next message is returned.

Version Length Lifetime


Source Address
Destination Mediator Address
UUID_0
UMP Data Length
Source Virtual Port
PNToMed Virtual Port

"GetMessage / FromPeerIdentity"

Figure 4-19. GetMessage/“peerIdentity” Command

4-34
Version Length Lifetime
Source Address
Destination Mediator Address
UUID_0
UMP Data Length
Source Virtual Port
PNToMed Virtual Port

"GetMessage / PeerIdentity 5"

Figure 4-20. GetMessage/peerIdentity/N Command

4.2.3.3 The Publish Command


What we envision for publication is multi-faceted. Realizing that content exchange is extremely
important and that in many cases content must be protected, while in others there will be no access
restrictions, our methodology for publication must address each of these needs, and everything in
between. Also, a means is required to advertise or attract peerNodes, i. e., users, to content. In chapter
3 we discussed connected communities (CC) and the role they play in achieving these goals. They pro-
vide a firewall for their members that is as protective or permeable as these members desire. We noted
that in order to bootstrap this process we need a public connected community, PubCC, where access is
limited to first, CC documents, i. e., one needs to know how to become a member of a CC, and second,
meta-data for other kinds of content. This latter meta-data is how one can advertise, and draw users to
join CC’s. It may be URL’s pointing to web pages with directions and motivation for joining. This is
left to the CC creators and members to decide. On the other hand, the CC’s content for reasons
described above, cannot be directly accessed in the PubCC. The PubCC can be viewed as either the
base of a large bowl that contains all of the content, or the base nodes of a huge content tree.
By the means of the publish command a peerNode makes its documents (see Chapter 3 for document
descriptions) and general content such as jpeg and mpeg files, text files, code in both binary and source
form, etc., available to other peerNodes. Publish is explicitly tied to query. Thus, the publish command
below has a somewhat general description for which the details cannot be supplied until we give spe-
cific examples in section 4.5. We know from experience that it is better not to publish data directly to
mediators, but rather to publish representations or meta-data such as indices to be hashed. In such a
context the data itself, when queried, will be supplied by peerNodes. And, there is some data that must

4-35
be exclusively accessed from the peerNode on which it was created and furthermore, there will be data
that may or may not migrate. For example: The virtual port document must always be accessed from
the peerNode that created it. The reason for this is that if a peerNode desires to no longer permit con-
nections to this virtualPort, it withdraws the document from its local store. If such a document was to
be republished by the peerNodes that retrieved a copy, then synchronizing its revocation becomes
almost impossible because it can be retrieved from any peerNode that published it. Finally, if the expi-
ration date in the document is ignored, and peerNodes keep trying to connect to this non-existent vir-
tual port, they are wasting their cpu cycles. The connections are refused.
We cannot prevent multiple peerNodes from attempting to publish content whose author wishes to be
the exclusive source. On the other hand, we easily can make it the responsibility of the peerNodes who
receive such documents from non-originating sources to use these documents at their own risk. But,
we are not getting into the issue of digital rights management. Rather, we are pointing out in chapter 5
some mechanisms that can be used to let the recipient of digital data recognize that it has come from
the wrong source. Recall that in chapter 3, section 3.3.2.4.1, the virtualPort document has a <source-
Exclusive> field. This contains data that may be used to prove that the source from which the docu-
ment was retrieved is the originating source that has the exclusive right to publish this document. If
this document is published rather than some kind of index, then mediators can respect this field. If
indexing is used, then mediators may not be able to detect a republished document. In this case the pri-
mary responsible party for honoring the originator’s desires is the republisher. But, if the republisher
decides to fraudulently republish the document or content, we provide mechanisms to detect this mis-
behavior at the recipient peerNode. These mechanisms can be either ad-hoc or tightly controlled by
trusted 3rd parties (T3P). Now to the publish command itself.
The publish command has a list, L(i), of elements describing the data to be published. Each of these
elements has as least four fields, F1, F2, F3, and F4. F1 is meta-data that describes the data. F2 is the
publication lifetime of the data in minutes. F3 is the {peerIdentity, virtual port name} = {virtual-
Socket} of the originating peerNode. F3 permits a querying peerNode to contact the originating peer-
Node. F4 is a flag to indicate if this is a system document or application meta-data. Without loss of
generality, we can say F4 is true for system documents, and false otherwise. When the elements L(i)
are published, this will be on one of the mediators in the CC’s mediator map. Each element, L(i), is
published with the originating peerNode’s peerIdentity. Thus, any query for a L(i) will be accompanied
with the owner’s peerIdentity. Note that since the publication is within a particular CC, a mediator
storing any of the publication list element {L(i), virtualSocket} pairs, must also store the source route
taken from the ONP message. The peerNode’s source route may be incorrect because of the dynamic-
ity of the P2P Overlay Network. In this case, the mediator will correct the source route with its current
source route information. Why store the source route? This source routing information is a hint of how
to find the publishing peerNode whose virtualSocket is known from the {L(i), virtualSocket} pair.
Again, routing is covered in section 4.2.5.
F5 and other fields are left to the implementors’ imaginations. For example, F5 might be a credit card
number with expiration dates if there is a cost associated with data publication. Meta-data may be
(name, value) pairs, URL’s, etc. A peerNode must proceed the publish command with a Notify/Active

4-36
command to become an active member of a CC. The CC UUID in the ONP message containing the
publish command must be the CC UUID for the currently active CC, otherwise an error message is
returned. Please note that we are not specifying the specific syntax of the list elements, L(i). Rather, we
are showing the structure of the command. For example, they may be in XML format.

Command: Publish {N} L(1), L(2),..., L(N)


L(i) := List element, {F1, F2, F3, F4} (i)
F1 := Meta-data
F2 := Publication lifetime
F3 := VirtualSocket of originating peerNode
F4 := true | false, true if it’s system document

Returns: OK [M]. If M is not present, then all N documents were successfully


published. Otherwise, 1 <= M <= N.

Response: Error [Text explanation].

Version Length Lifetime


Source Address
Destination Mediator Address
UUID_0
Destination Routing Information
Source Routing Information
UMP Data Length
Source Virtual Port
PNToMed Virtual Port

"Publish {3} L(1) L(2) L(3)"

Figure 4-21. The Publish Command Message


The mediator, M, to which the publication command is sent will hash F1 using one of many well
known hashing algorithms, see section 4.5. H(F1) yields the hash target mediator on which is stored
the triple {L(i), F3, source routing information containing M’s peerIdentity}. How this is used is dis-
cussed in the next section.
It may be that the publishing peerNode wishes to no longer provide the content, and in this case it
needs a means to delete it as a reference peerNode. To accomplish this we have the Publish/Delete

4-37
command. When the hash target mediator receives this command it deletes the triple bound to the pub-
lishing peerNode.

Command: Publish/Delete {N} L(1), L(2),..., L(N)


L(i) := List element, {F1, F2 [,F3]} (i)
F1 := Meta-data
F2 := VirtualSocket of originating peerNode
F3 := credential certifying deletion

Returns: OK [M]. If M is not present, then all N documents were successfully


deleted. Otherwise, 1 <= M <= N.

Response: Error [Text explanation].

The only subtlty is the parameter F3. In many cases republication of content is without restriction. But,
it may be that the originator of the non-system content claimed “source-exclusive” in its virtualPort
document as described above. In this case upon the reception of the publish/delete command with the
F3 parameter, the mediator can retrieve the virtualPort document of the orignator from either is local
cache or the originator. Given some secure relationship between F3 and the virtualPort document, if
the credential is valid, then the mediator will delete all of the references to this content, and refuse any
further republication until the originator republishes the content. The mediator must keep a data base
of sorts of all such retracted content to assure the republication relationship this protocol enables.
Clearly, the data base need not be local. This is an implementation detail.

4.2.3.4 The Query Command


The query command is the counterpart to the publish command. At the highest level how do
peerNodes know about data that might be published in order to discover that it exists, and how to
retrieve it? To enable this process, a general query about the existence of data satisfying search
constraints must be supported. This is application dependent. Are we considering replacing the entire
web content space with a P2P content space? If one thinks about this for a minute or two, it is clear to
the authors that the CC concept overlaid onto any search space, will simplify search. Why? Search is
immediately restricted to that CC, and mediators that support a particular CC may be all that one has to
contact to satisfy search. In particular, we are looking at a fully distributed search rather than a
centralized search. One can imagine the content on the web hashed across multiple collections of
mediators supporting a very diverse set of CC’s. This is a good next step, and we certainly see that
many small steps are required to make this very large next step. The little steps are taken by the initial
P2P applications. These applications will grow a distributed data space over time, and we predict that
P2P systems by their nature will be much more efficient and give better end-user-service, thus bringing
about the next large step. There will be a join of the current website methodology and the P2P system
we envision. The former system can provide the means to advertise with meta-data the URL’s that can
be accessed to discover CC’s and their content. When such a CC is found, the PubCC is used to acquire

4-38
the CC document, and then the user can become a member of the CC to both search for and access
more content. To provide the ability to search for information published in both the PubCC and other
CC’s we have the query command.
Before we define the command we need to say a little more about how to implement the restrictions to
publish and query in the PubCC. Since all published data is meta-data, then there is not a lot one can
do to prevent a bad peerNode from publishing anything in the PubCC. While we cannot prevent this
publication without draconian measures, that are again discussed in chapter 5 on security, we can
restrict the activity in the PubCC in such a way as to non-system document access. The mediators do
this by limiting all peerNode-to-peerNode in the pubCC to publish, query and query response com-
mands. Any other messages are immediately discarded by the mediators receiving them. In particular,
recalling that a peerNode uses the GetMessage command to retrieve messages from mediators, then if
a peerNode is active in the PubCC, the GetMessage command can only return either a query, or a
query response.
While queries are initiated from a peerNode to its hosting mediator with the PNToMed protocol using
ONP / UMP messages, two responses are possible, and they are a function of the type of data being
queried. First, when a peerNode is querying for system documents generated by peerNodes, that is to
say, the documents were published with F4 true, the response must come from the peerNode to which
the document belongs, and must contain either a PeerIdentity, VirtualPort, or Connected Community
document. A query for a system document may be triggered by applications but the queries themselves
are generated by the underlying system software, and the responses, that is to say, the system docu-
ments, are stored in the underlying system, and come from this underlying system independently of the
application that generated the publication. They are not application data. Second, when a peerNode is
querying for application data, the response is meta-data describing the content rather than the content
itself, and is sent to an application listening for a response on an application defined virtualSocket.
In a large P2P system it is possible to have millions of sources for a single piece of content, and thus
millions of pieces of meta-data. This raises many issues with respect to capacity, search and perfor-
mance which are addressed in section 4.5. In any case, the meta-data is sufficient to initiate full content
retrieval which is only possible in a CC that is not the PubCC, is peerNode-to-peerNode, and does not
involve the MedToMed protocol. It involves routing, and possibly using mediators’ M-INBOX’s if the
peerNodes do not have direct connectivity.
With the above discussion in mind the query command has several parameter requirements:
1. Data type - System document or application content,
2. For data type = System document - Search qualifiers are triples, F1, F2, and F3. F1 depends
on the document being queried. F2 is the virtualSocket to which query responses can be
sent. F3 is the source route of the querying peerNode. It is set by the peerNode from its
most recent source route information from its hosting mediator. F1 must be as described
below:
1) For peerIdentity document any one or all of the following content fields:

4-39
a. <peerName>
b. <peerUUID>
c. <description>
2) For the virtualPort document any one or all of the following fields:
a. <vportName>
b. <vportUUID>
c. <vportType>
3) For the the connected community document one or all of the following fields:
a. <ccName>
b. <ccUUID>
c. <description>
For system document searches we note that the response will be the logical AND of the search qualifi-
ers. Also, wild card queries are permitted for the name fields only and all other search qualifiers will be
ignored. If no wild card is used, then search becomes an indexed lookup. We use the terms inter-
changeably, but this context must be understood. No search is required for a lookup.

3. For data type = Application content - Search qualifiers are any reasonable string or combi-
nation there of. There is no restriction on either wild card search or logical combinations of
the search qualifiers. This is clearly application dependent. We will discuss hashing in sec-
tion 4.5, but the actual implementations that are possible are beyond the scope of this
book.

Command: Query, Type, Limit, {N} Q(i), i = 1, 2, ..., N


Type := System Document | Application Content
Limit := upper bound of reponses the peerNode will accept
N := number of search qualifiers
Q(i) := Search qualifier, {Wildcard, F1, F2, F3}(i)
Wildcard := True | False
F1 := Search string
F2 := Query requesting peerNode’s virtualSocket
F3 := Query requesting peerNode’s source route

Returns: Query Response, Type, {M} D(i), i = 0, 1, 2, ..., M, M <= Limit.


Type := System Document | Application Meta-data
M := Number of data elements returned
D(i) := Query response elements, {F1, F2, F3}(i)
F1 := System Document | Application Meta-data
F2 := Owner’s virtualSocket

4-40
F3 := Owner’s source route

A further note about the returned application meta-data. This is the actual meta-data, L(i), that met the
qualified search and most importantly, enough information to contact those peers that are the source of
the content, that is to say, with each D(i) there will be the following information:
1. The owner’s peerIdentity,
2. The owner’s virtual port name and virtual portID,
3. The owner’s source route information.

The above three items are sufficient for the querying peerNode to retrieve the content.

Version Length Lifetime


Source Address
Destination Mediator Address
UUID_0
UMP Data Length
Source Virtual Port
PNToMed Virtual Port

"Query type = System Document,


Limit = 1, {1} Q(1)"

Figure 4-22. Query Command Message Example

4-41
Version Length Lifetime
Source Address
Destination Mediator Address
UUID_0
UMP Data Length
Source Virtual Port
PNToMed Virtual Port

"Response type = System Document,


{1} D(1)"

Figure 4-23. Query Response Message for System Document

The following two figures explain these publication and query mechanisms.

PNToMed
Publish
P1
1
M1
6
System 2
Hashed meta-data
Data 5
7
Query Response M2
8
4
M3
3
P2
PNToMed
Query
Figure 4-24. Publishing and Querying for System Documents

In Figure 4-24. in the context of a CC, CC1 we have the following:


1. P1 using the PNToMed protocol publish command publishes a single system document by
the means of its mediator M1 and the document’s meta-description, L(1).

4-42
2. M1 hashes the meta-data using one of many possible algorithms, see section 4.5. The meta-
data, {L(1), virtualSocket(P1), source route containing M1’s peerIdentity} ends up on the
mediator M2 in a data store associated with CC1.
3. P2 is active in CC1, queries M3 using the PNToMed protocol query command with the
parameters type = system document, wildcard = false, limit = 1, and the query command
data {Q(1), virtualSocket(P2), P2’s source route}.
4. Just as with publish command, P2’s mediator, M3, corrects the source route if necessary,
thus resulting in the command {Q(1), virtualSocket(P2), source route containing M3’s
peerIdenity}. The Q(1) qualifiers are hashed, and this results in sending the query com-
mand parameters and {Q(1), virtualSocket(P2), P2’s source route} to M2.
5. M2, respecting the query command parameters, looks up the system document meta-data
associated with Q(1) in its data store, recovers {L(1), virtualSocket(P1), P1’s source route},
sends {{L(1), virtualSocket(P1), P1’s source route}, {Q(1), virtualSocket(P2), P2’s source
route}} to M1 using the MedToMed protocol query forwarding command. Noting that CC1
is always in the ONP message header, M1 creates an ONP / UMP query command message
for P1, it contains {Q(1), virtualSocket(P2), P2’s source route} as data, and places it in P1’s
M-INBOX(CC1).
6. P1 pulls this message from its M-INBOX(CC1) with the PNToMed protocol getMessage
command, recovers the requested document from its document data store, and forms an
ONP / UMP query response.
7. This is encapsulated in another ONP / UMP message and sent to M3, descapsulated and
placed in P2’s MINBOX(CC1).
8. P2 retrieves this and the requested system document. Here we assume that the routes work.
The routing protocol described in the section 4.2.5 will clarify the details of route discov-
ery.

P1 1
M1 2 M4
{L(1), VirtualSocket(P1), P1's source route}
{L(1), VirtualSocket(P2), P2's source route}

P2 2
1
M2
4 5
Query Response
PNToMed
Query
3
M3
P3
6

4-43
Figure 4-25. Publishing and Querying for Application Content

While queries for system documents must retrieve these documents from their originating peerNodes
as is shown in Figure 4-24, on the other hand, meta-data for non-system content can have any peerN-
ode as its source as the content is retrieved and republished. This implies that there may be multiple
copies of this content in the system. Therefore, queries for application data can have several possible
peerNodes as the ultimate source for this data as is shown in Figure 4-25. Notice that steps 1-4 are
identical to the publish/query steps for system documents.
Assume first, that the same content is on P1, and P2, and that the resulting publish commands resulted
in both M1 and M2 hashing the meta-data L(1) to M4, and second that the query limit = 4 in the query
command parameters, then:
a) In step 5 of Figure 4-25, using the MedToMed query forward command, M4, respecting the query
command parameters, creates the list of meta-data matching the query qualifier, Q(1), and sends
{{L(1), virtualSocket(P1), P1’s source route}, {L(1), virtualSocket(P2), P2’s source route}} to M3.
b) M3 creates an ONP / UMP query response message as in Figure 4-26 below and places in P3’s M-
INBOX(CC1).
c) In step 6 P3, active in CC1, retrieves the message using the getMessage command.

Version Length Lifetime


Source Address
Destination Mediator Address
UUID_0
UMP Data Length
Source Virtual Port
PNToMed Virtual Port

"Response type = Application meta-data,


{2} D(1) D(2)"

Figure 4-26. Query Response for Application Content

Because application content is not a critical aspect of system behavior, and that good meta-data will
also contain creation dates, distributing this data in this manner, because of the capricious behavior of
peerNodes, increases the probability of accessing application content.

4-44
Since the query and query response for application content does not result in the retrieval of the con-
tent, we need a command to explicitly request the content from one of any of the content owners spec-
ified in the query response. Recall, for direct peerNode-to-peerNode communication we will use the
PNToPN protocol and that is discussed in section 4.3. When mediated content retrieval is required,
then the PNToMed protocol is used. The content request command is described in the next section.
4.2.3.5 The Content Request Command
To set our context, if you have fallen asleep after having read the previous section, or if we weren’t
clear enough, recall that a peerNode has made a query for application content and received a query
response with adequate meta-data to retrieve the content in question. There are some possible prob-
lems that might arrive, for example, if the P2P Overlay network has undergone changes that invalidate
the source routing information in the query response meta-data. This is covered in section 4.2.5. In the
example in Figure 4-25, the meta-data was for two possible sources of the content. How the peerNode
decides upon which of these sources it should choose will be discussed in section 4.5 on searching.
But, one can imagine that the meta-data might contain creation dates, and the peerNode will select the
most recently created content. Similarly, it might also have geographic proximity information, down-
load bandwidth, etc. An application can use any of these criteria to appropriately select a content
source.
A meta-data query response has a list element, D(i), of content meta-data for each content that has
been found. The D(i) contain at least the peerIdentity of the source of the content, a virtualPort name,
virtual Port ID, and the source routes used to reach that peerNode. This meta-data is sufficient to con-
tact a peerNode possessing the desired content, and a requesting peerNode requires a command to ask
for, and receive the content. Similarly, the peerNode receiving this request will require sufficient infor-
mation to establish a channel with the requesting peerNode. Thus, the content request command must
contain the meta-data describing the content to be retrieved, a virtualSocket that can be used to estab-
lish an ONP / ACP channel to send the content, and the source route of the requesting peerNode’s
mediator. Continuing with the example shown in Figure 4-25 above, Figure 4-27 shows the content
request / response command overview.

4-45
P1
M1
M4
{L(1), VirtualSocket(P1), P1's source route}
{L(1), VirtualSocket(P2), P2's source route}

P2
M2

Content
Response

M3
Content
P3 Request

Figure 4-27. Content Request and Response

We define the content request command and its associated content response to satisfy these require-
ments. Since we are using mediators as intermediaries for the communication between the requesting
and responding peerNodes, as usual, encapsulated ONP messages will be used here. Given a query
response element D(j) = {L(i), VirtualSocket(P), P’s source route}, we define the Content Request
command:

Command: Content Request, {N} R(i), i = 1, 2, ..., N


N := number of request elements
R(i) := request element, {F1, F2, F3, F4}(i)
F1 := UUID handle to identify content when sent
F2 := L(i) from the D(j) to identify content source
F3 := Requesting peerNode’s virtualSocket
F4 := Requesting peerNode’s source route

Returns: Content Response, {C1, C2}, Content


C1 := F1 from the ith request element (Content’s UUID handle)
C2 := meta-data describing the content and transfer mechanism
Content := the requested content

In the content response, the C2 list member is implementation dependent. It can contain information
like byte length of the data, data transfer protocol, and data description. The latter might use MIME,
for example, Image/JPEG, or even XML. The content request / response command will use in most

4-46
cases the ONP / ACP protocol, since both the requesting and responding peerNodes have sufficient
information in the query response, and content request to establish an ONP / ACP channel between
them. If the ONP / ACP protocol is used, then the data transfer protocol might be ftp-like and push the
content to the requesting destination peerNode taking advantage of the reliability, and guarantee of
delivery features of ACP. The application defines the content transfer mechanism. The following is an
example of the C2 meta-data using XML:

<?xml version=”1.0”?>
<!DOCTYPE 4PL:C2>
<4PL:C2 xmlns:4PL=”http://www.aw.com”>
<Length> 146981 </length>
<TransferProtcol> ONFTP/ACP </TransferProtocol>
<ContentType> Application/Video-Game </ContentType>
<FileName> VirtualGo.jar </FileName>
<MACAlgorithm> SHA1withRC4 </MACAlgorithm>
<MAC> 0xF37691836677913ACE79422335AE199176369AA0 </MAC>
</4PL:C2>

Here the video game “VirtualGo” is being sent to the requesting peerNode using the Overlay Network
File Transfer Protocol (ONFTP). If one sends the content length, and filename, also specifying the
transfer protcol as ONFTP, the receiving peerNode need only create the file VirtualGo.jar, and read
data from the channel until 146,691 data bytes have been received. Note that we have included a Mes-
sage Authentication Code algorithm and MAC itself. The MAC is the SHA-1 hash of the content’s
data that has been encrpyted with a shared secret key. The MAC will permit the receiver to verify the
integrity of the content to guarantee it has not been modified at any time. Even if one is using a proto-
col like SSL.V3 or TLS.V1, it may be that the content has been modified prior to it being transferred,
or after it has been received. While this is super paranoid, it is still a reasonable approach to double
check the integrity of the contents. Often content sits on a data store for a long time. It may be modi-
fied by an intruder and this should be detected before the data is sent. If not, then one can use MAC’s
as indicated above. This is thoroughly described in chapter 5 where we discuss security.
The following is an example ONP / ACP message encapsulated in an ONP / UMP PNToMed protocol
message which is directed to M3, the first-hop mediatator’s PNToMed VirtualPort. Notice that routing
information, P2’s source route, has been included as a destination route in the ONP / ACP message so
that M3 knows the route to P2’s hosting mediator. Recall that this value is in the query response
received by P3. It is a field of the D(1) element. The soure route gives a routing hint for M3 to find a
route to M2. How this is used is a function of the overlay network routing mechanism in place. In one
of our proposed routing mechanisms in section 4.2.5, there will be a high probability that M3 will have

4-47
a direct route to M2, and the total route will comprise exactly two hops. The P2 VirtualPort is also a
field in D(1).

Version Length Lifetime


P3 PeerIdentity
M3 PeerIdentity
Connected Community Identity
UMP Data Length
P3 Virtual Port
M3 PNToMed Virtual Port

Version Length Lifetime


P3 PeerIdentity
P2 PeerIdentity
Connected Community Identity
P2 Destination Route
P3 Source Route
ACP Data Length
P3 Virtual Port
P2 Virtual Port
ACP Data Message
ACP Session Identifier
ACP Sequence Number

"Content Request {1} R(1)"

Figure 4-28. Content Request Command ONP / ACP Message


In figure 4-28 the message itself is a Content Request command and assumes that the ONP / ACP
channel between the two peerNodes has been established as is decribed in chapter 4, section 4.1.
It may be the case that a peerNode might desire to run code on a Mediator. We have taken the approach
that mobile agents are the best way to do so in a P2P Overlay Network. The Mobile Agent command is
discussed next, and it initiates the running of such code.

4-48
4.2.3.6 Mobile Agent Command
As will be pointed out in chapter 7, Java mobile agents are appropriate for P2P Overlay Networks in
many ways. This section defines the command that peerNodes use to launch such an agent and have it
run on one or more mediators as defined in its itinerary, and finally return to the launching peerNode
with the results of the traversed itinerary. The P2P Java Mobile Agent Protocol (P2P-JMAP) defines a
message that contains the necessary and sufficient information to process a mobile agent on either a
Mediator or a peerNode. The interested reader can refer to chapter 7 for the details.

Command: MobileAgent P2P-JMAP Message


P2P-JMAP Message := Message as defined chapter 7, section 7.2

Returns: Accept | Reject


Accept := The mobile agent command has been accepted
Reject := The mobile agent command has been rejected

4.2.3.7 GetManagementData Command


One of the very useful applications of Java Mobile Agents is for the collecting of management data
from running mediators. In order for a peerNode to access this data a command is required. To send
this command the peerNode must be active in the Management CC as is advertised in the mediator
document in the mediator greeting command. The command is as follows:

Command: GetManagementData, DataType


DataType := text name of XML file containing the data

Returns: OK, XML Document | Failed, <error message>


OK := The DataType is valid
XML Document := Management data
Failed := Data not available with optional text error message.

In order for mediator M3 to contact mediator M2 and route the message, and vice-versa, a protocol is
required. This is the Meditator Routing Protocol (MRP), and is described in section 4.2.5. Mediators
do more than route information to one another. As we have seen in this section, publication and query
both require mediator-to-mediator communication. There are also other requirements such as the pub-
lic CC Mediator Map maintenance which is a map of all mediators. These and other meditor-to-media-
tor communication requirements are discussed in the next section.

4-49
4.2.4 Mediator-to-Mediator Communication Protocol
In the P2P system we are proposing mediators play a critical role while in other P2P systems the proto-
cols as applied may treat all peerNodes as equals, all playing equal roles when relegating tasks like
routing, publication and query/response or lookup to these peerNodes. For example: Plaxton [PLAX-
TON] mechanisms have been applied to lookup algorithms like Brocade[BROCADE], Tapestry[TAP-
ESTRY], and Pastry[PASTRY]. Gnutella [GNUTELLA] also defines an egalitarian P2P network but
its chattiness does not play well with our Overlay Network proposal. When these mechanisms are
applied at the extreme ad-hoc end of the P2P spectrum, all peerNodes are part of the lookup schemes,
and treated as equal partners. On the other hand, one can use these algorithms on a subset of more
capable peerNodes as we suggest. Note that in section 4.5 we discuss these algorithms from this per-
spective in detail. Thus, from our point of view, even if the system is completely ad-hoc, sensibly
selecting mediators is essential. In the just mentioned approaches, where any peerNode arriving on the
scene may at least have a minimal subset of the functionalities we will assign to our mediators, this
cannot be part of our system design because:
1. Device capabilities of peerNodes differ. Sensors, mobile phones, PDA’s, personal worksta-
tions, servers, satellites, game consoles, etc. may participate as peerNodes in a P2P Overlay
Network. It seems ridiculous to route traffic through a sensor or a mobile phone for exam-
ple given their hardware and power resources.
2. Bandwidth varies tremendously across the device space. In the case of mobile phones,
operators who control the mobile network where bandwidth is limited, even with 3rd gener-
ation mobile networks, cannot permit heavy routing or query traffic to saturate the available
network capacity to the detriment of their voice and data customers’ user experience. Medi-
ator directed P2P traffic has no place in this wire-less network space. And, it is absurd to
even discuss sensor grids in this context.
3. The system load anticipated on an active P2P Overlay Network even if distributed in a fair
manner using sophisticated routing/query algorithms cannot be supported by all devices.
In our approach mediators are special, and require their own communication protocol to efficiently
maintain, among other things, the CC mediator maps.
It is important to note here that the join of all Public CC mediator maps is a map of all known media-
tors for the given P2P Overlay Network topology. While stability is a hoped for property of mediators
in a solidly constructed P2P Overlay Network, even in this case it is possible for mediators to suffer
hardware failure that remove them from the topology, and mediators must be able to detect and recover
from such failures. We realize that in a revenue producing P2P Overlay Network high-availability,
which can be expensive, is possible, and in this case a mediator failure may go undetected as a second
system takes over the failed system’s role. In fact, with the speed of networks moving towards 100
gigabits per second, we will soon be able to realize high-availability with systems separated by long
distances. We see this as an important step towards constructing stable, reliable P2P Overlay Net-
works.

4-50
On the other hand, supporting small ad-hoc P2P Overlay Networks is also important, and still in this
case, mediators should be carefully chosen to minimize network downtime, and security intrusions.
The latter is the cost of doing business on the Internet and is something that must be a part of even the
least stable systems.
Also, as we have seen in section 4.2.3, we mentioned several times that a mediator-to-mediator proto-
col is necessary to supplement the first-hop PNToMed protocol that is used for the publication of, and
querying for content as well as the notification of a peerNode becoming active in a CC. The latter may
trigger a CC Map update.
Next, we have the mediator startup problem. How do mediators discover one another? First of all, it is
possible to have a very powerful P2P Overlay Network with a single mediator. And recall from section
4.2.1, that hosted peerNodes will have boot time knowledge of this mediator. One might find this to be
a typical P2P Network in a small Enterprise where all peerNodes have direct network connectivity, for
example, TCP/IP, and the single mediator is used for discovery of system documents, content, etc. On
the other hand, if we imagine many mediators, and we want a deterministic solution, that is to say, that
all mediators can find one another given that the underlying real network topology has no partitions,
then preconfigured information, that may be created in real time, is a requirement given the following
constraint:
Mediators cannot discover one another on the underlying real network using that networks’
level-3 network or Internet protocols3.
Here, in the IP world, this means that routers do not propagate multicast beyond the local subnet, and
no two mediators are on the same subnet.
One can propose many ways to preconfigure mediator connectivity among which are the following:
1. Store mediator address information on a website, and put the associated URL(s) in the
bootstrap code
2. Use DHCP to acquire the mediator address information
3. Place known mediator names in the boot strap code, and use DNS to find their network
address information
4. When mediators become active, each can store its network address information in a known
LDAP directory
5. Use email or other out-of-band means to acquire the requisite network address information

In our MedToMed protocol mediators will communicate known mediators to the mediators to which
they are connected. Thus, the Public CC Mediator Map can change dynamically in real time. As is
seen below, this map is the join of multiple maps where each mediator is only required to have a local
knowledge of its Public CC Neighborhood.
Before we get into the details of the MedToMed protocols, we introduce the notion of an hierarchical
mediator tree. When one imagines the possible vastness of a world-wide P2P Overlay network, one
3. On the Internet IP Multicast is an example of a level-3 IP protocol. For reasons of performance, and possible attacks, IP
Multicast is not Internet-wide.

4-51
immediately begins to worry about issues of performance and resource limitations, and asks if there is
a way to localize processing, and only globalize when necessary. This kind of arrangement is done by
the postal and telephone services, and DNS, for example. One has local mail delivery and local phone
calls, and then provides an organization to permit one to extend these services when it is required.
Local “traffic” does not impact “global traffic.” Similarly, DNS provides domain names to help one
navigate the world-wide Internet. In what we discuss below, admittedly, our inspiration comes from
IP.v6 and these just mentioned examples.
To this end we think a three level mediator hierarchy is sufficient. In the IP.v6 sense, level1 mediators
serve a single site. A site is loosely defined. It might be a neighborhood, an individual’s home network,
an enterprise, a small community, etc. Level1 mediators at a given site communicate with each other,
and maintain a site-wide mediator CC-Maps. Most importantly, the publish command is restricted to
level1, and all indexing is limited to a single site. A full P2P Overlay network can be supported by
level site mediators. Inter-site, direct mediator-to-mediator communication is not possible. That is to
say, given site1 and site2, a mediator in site1 cannot connect to a mediator in site2 without an interme-
diary. This may seem restrictive, but we can imagine a single site supporting several hundred thousand
peerNodes. A site can be a good sized city, or a large ISP, each with several level1 mediators, a single,
global enterprise with a private network. To link sites together we use level2, regional mediators.
When Inter-site communication is desired by at least two sites, then each site must have a level2
regional mediator. The joining of two or more sites in this manner forms a P2P Overlay Network
region. Content is publication is never pushed up the mediator tree but CC maps are pushed to level2,
and are region-wide. This latter feature permits region-wide queries, i. e., Inter-site queries across CC’s
is allowed. The CC is a hint to increase the probability of a successful search for content. All level2
mediators in a single region can communication with one another. Thus, if an enterprise or multiple
cities wish to interconnect their P2P Overlay Networks, they will use regional mediators. For example:
An enterprise may span several private networks, and can place regional mediators outside of their fire-
walls to create enterprise wide regions. Again, Inter-regional communication is not possible without an
intermediary.
To enable Inter-regional communication we use level3, root mediators. We call them root mediators
because they are the root nodes of our mediator tree. We considered using the term global mediator
but this can cause come confusion since a single site may have global coverage. Still, in what follows,
the terms global and root mediator will be used interchangeably and should cause no confusion given
this explanation. One would imagine that root mediators will be extremely reliable and powerful sys-
tems with the responsibility of maintaining the joins of the regional, level2 mediators’ CC-maps.
While these communities may number in the millions, each is uniquely described by a CC-UUID
which in a real implementation will be sixteen to twenty bytes. This requires about 20 megabytes of
RAM which is nothing these days.
Another nice feature of an hierarchical structure: The P2P Overlay Network can be built one level at a
time! The analogy is evolutionary, that is, LAN -> WAN -> Internet.
The mediator hierarchical organization is shown in figure 4-29.

4-52
Figure 4-29. Mediator Hierarchical Organization

Now on to the protocols that support the above mediator hierarchy. Let’s assume that at each level the
public CC map is in place. Note that if regional mediators are in place, then as soon as a mediator
comes online, and connects to its regional mediator, the latter system can communicate both those site-
level mediators that are active, that is to say, connected, as well as the CC maps hosted by each of these
mediators. Thus, the level2 and level3 mediators can play a helpful administrative role for P2P Overlay
Network real-time maintenance. But, each site also needs to maintain its active public CC map, and
since neither level2 nor level3 mediators are required, we need protocols for public CC map mainte-
nance that work in both cases. What we are really saying is that each level has the primary responsibil-
ity of local public CC map updates, and that if higher level mediators exist, then certainly after the
initial connection to this higher level, public CC map information can be piggy-backed on those medi-
ator protocols that go up one level and for which a response is expected.

4-53
The first single level, command in the mediator-to-mediator protocol is the mediator keep-alive, or
simply, keep-alive command. We use this command to explain the format of the MedToMed com-
mands.
4.2.4.1 The Mediator Keep-Alive Command
Given a collection of mediators, M1, M2,..., Mn, at a given level, the usual implementations of a keep-
alive procedure are IP ping-like and involve each of the Mi’s connecting to the others periodically in a
fan-out like way. Let’s assume first, that we are on an IP infrastructure. Second, a mediator sends keep-
alive every ten minutes to a known list of mediators, where each list element has the mediator’s IP
address, MedToMed Port, and Greeting Port. Third, that the keep-alive command contains this list.
Therefore, this list reflects those mediators that were either active or came on line during the previous
ten minute interval plus the current ten minute interval. Now, such a fan-out keep-alive process yields a
maximum of O(n2) connections for n mediators. It can be considerably less than O(n2) since if this is
done on a time interval, and in this case every 10 minutes, so that a participating mediator can trim its
list of candidate, keep-alive contacts with the information it receives during its 10 minute wait before
sending the next keep-alive. The other side of the coin is that new mediators may become active and
trimming keep-alive lists in this way may in fact delay discovering a new mediator. In any case, for n =
100, n2 = 10,000 connections. If this is uniformly spread over an interval of 10 minutes, then there will
be continuously about 100 connections every 6 seconds, or one connection every .06 seconds. This is
unreasonable and a better mechanism is required.
We suggest using a keep-alive token-ring by imagining n mediators, 1, 2,..., n, in ring topology, that is,
the successor of M(j) is M(j+1), and the successor of M(n) is M(1). To begin with, given n mediators, n
>= 2, only 2n-3 connections are required. This is a significant savings over O(n2) (197 versus 10,000
for 100 mediators). If we have n mediators in a ring sorted by mediator UUID, every such ring needs a
keep-alive initiator. The initiator is always the mediator with the small UUID. The keep-alive interval
is a boot-time, configuration constant like the MMMS. Ideally, there are two possibilities, i. e., we do
not want to address denial-of-service, keep-alive attacks here. First, there is only one mediator, it need
not send itself keep-alive. Second, there is more than one mediator in the ring, then there is an initiator,
M(1). How is this determined? Let’s assume we have a stable token ring, and a mediator, M(j), comes
onboard, and has knowledge of at least other mediator. If it does not know another mediator, then sys-
tem is misconfigured. M(j) connects to another mediator, M(k), sends a greeting command. Recall that
the greeting command “welcome” response contains a list of known mediators and the keep-alive aver-
age round trip time in seconds (KARTT). M(k) does the following:

When the mediator, M(k), receives a greeting


command from a new mediator, M(j), M(k)
initiates a keep-alive command with M(j) as an
member of the well ordered list of existing
mediators.

4-54
In this way, M(j) must receive a keep-alive in less than 3 times the KARTT. We use the constant 3 here
to be generous. In networking it is better to error on the side of patience. If a keep-alive is received,
then M(j) is a participant in the token ring. If M(j) is the “UUID smallest” member, then M(j) initiates
the next keep-alive. If a keep-alive is not received by M(j), then noting that M(j) is inserted in a well
ordered list, this means that we have two possibilities:
1. A M(t) before M(j) crashed while holding the keep-alive token:
M(j) has a problem. M(j) knows of another mediator from the greeting, and sends a greet-
ing to restart the process.
2. All mediators crashed. In this case, M(j) must patiently probe until at least one other medi-
ator that was known to M(k) restarts.
How do we make sure the token is always alive?
1. Given n mediators, and M(s), 1 <= s < = n, assume M(s) has the token. M(s) must succeed
in connecting with at most one of its successors taken in order.
2. If all of M(s)’s successors are down, then M(s) waits at least one minute, and retries.
3. Let x = the keep alive interval seconds. Then if any mediator, M(t), does not receive the
keep-alive token in y seconds, y = x + (1.5 x KARTT), a non-zero random number, r, is
generated, and waitTime = r MOD y, is calculated. M(t) waits for waitTime seconds, and
then initiates a keep-alive. This might cause a flurry but it will be short lived, and after this,
all will return to normal.
4. Assume M(t) receives the keep-alive token from M(r). Then if there exists Mediators, M(s),
r < s < t, M(t) removes these mediators from the mediator token-ring list. In this way the list
only reflects active mediators.
Figure 4-30 demonstrates the basic rule when a mediator with UUID 317 joins a token ring through the
mediator 409: The mediator 409 initializes a keep-alive command with mediator 317 as a member of
the ordered list of existing mediators.

4-55
UUID= 067

083
626

409 163

join

keep-alive
317 192

284

Figure 4-30. Mediator Keep-alive Token Ring


With the above discussion in mind, we define the mediator keep-alive command as follows:

Command: Keep-Alive, Token holder, Active mediator list, L


Token holder:= Mediator UUID of the sender.
L:= Mediator UUID’s of each active mediator, M(i), i = 1, 2,...,N

Version Length Lifetime


Source Mediator Address
Destination Mediator Address
UUID_0
UMP Data Length
Source Virtual Port
MedToMed Virtual Port

"Keep-Alive 409 067,083,163,


192, 284, 317, 409, 626"

Figure 4-31. ONP / UMP Mediator Keep-Alive Command Message

4-56
A side-effect of this command is the maintenance of the Public CC Mediator Maps on each mediator.
Mediators also need to maintain CC maps. This is done with the mediator-to-mediator CC map com-
mand discussed in the next section.
4.2.4.2 Mediator-to-Mediator CC Map Command
In section 3.5.3.1 we introduced the CC Mediator Map, and discussed its consequences. If it is neces-
sary for the reader to recall these concepts, now is a good time to at least skim that section. A part from
the keep-alive command which permits mediators to keep track of one another, the maintenance of CC
Mediator Maps is central to the P2P system we are proposing. To be brief, a Mediator’s CC Mediator
Map is a list of all known mediators, itself included, that host edge peers who have been active in a
given CC. Also, recall that content that is not a system document as well as any publication, queries or
retrievals of that content are constrained to be CC specific. Therefore, given a particular CC, having the
most up to date CC Mediator Map as possible is extremely important. The command described in this
section accomplishes these real-time updates.
To keep this easy at first, let’s assume there is a stable collection of mediators at a given level, and also
that there are no CC maps, that is to say, no edge peer has yet become active in a CC. We are starting
from zero. We also assume we have a hashing algorithm that is used across all mediators. Many such
algorithms are possible. Our only requirement is that the hash results in a consistent ring topology.
Examples these hash algorithms are discussed in section 4.5. Without loss of generality, let peerNode
p1, which is hosted by mediator M1, become active in CC1. M1 hashes the CC UUID, thus indexing a
unique mediator, say M7. Using the CC Map Command it sends the CC UUID along with its Mediator
UUID to M7. M7 thus, in this fashion, keeps a list of all mediators that host edge peerNodes having
been active in CC1.
In a response to M1’s CC Map Command, M7 will return the complete CC1 Mediator Map including
M1 as a new member. Note that M7 need not host a single edge peerNode for CC1. Assuming that the
CC1 Mediator Map retrieved from M7 is non-empty, none of the members of this map are aware of
M1’s presence. Also, mediators leave Mediator Maps after they no longer have active edge peerNodes
and a suitable time-out has expired. This time-out may be as long as a week. We have two approaches
for notifying the old members of this map that there are new members, or that members have been
deleted:
1. Periodic notification: In the general case, all mediators in the list must be notified of the
changes to the CC Mediator Map since the last update occurred. In our example, M7 on a
periodic timer notifies all but the most recently added member, M1, of the addition of M1
to the CC Mediator Map.
2. Immediate notification: With any change in the mediator map M7 immediately notifies
mediators in the CC Mediator Map except the mediator that effected the change. In our
example M1 is not notified.
It is extremely important to understand that all changes in the CC Mediator Maps resonate through the
the level at which the mediator resides. At the site level this effects publication, queries and routing.

4-57
For higher levels only routing is effected. These changes cause momentary instabilities as the system
does what is necessary to recover:
1. Some routes will be invalidated if a mediator leaves the level either gracefully or by a fail-
ure.
2. Since content has been published with a hash algorithm of some sort, changing the size of
the site level CC Mediator Map changes some of the variables in the hash algorithm used
for both publication and query. In fact, during the period of instability different mediators
may be using different variables in the algorithms. There are ways to deal with this and as
mentioned they are discussed in the appropriate section as mentioned just below.
3. If a mediator leaves, then its hosted peerNodes need to rehost themselves by finding an
alternative mediator, and similarly, when a mediator is added, not only will new peerNodes
attach themselves to this mediator, but also, depending on load, some peerNodes may aban-
don their current mediator for better service on the new mediator.
What is done to return the system to a state of normalcy will be discussed in detail in the appropriate
sections. For example, what is required for routing under these circumstances is covered in section
4.2.5. The primary goal of this section is to describe the commands associated with the changing state
of the CC Mediator Map. We have four such commands.
The first command adds a new mediator to a CC Mediator Map as described above.

Command: CCMap/Add CC UUID, Meditator UUID

Returns: CC Mediator Map, SHA-1 Hash


CC Mediator Map := {N} L(1), L(2),...,L(N)
L(i) := List element, {F1}(i)
F1 := Mediator UUID of ith Mediator in the CC Mediator Map
SHA-1 Hash := SHA-1 Hash of the CC Mediator Map

The second command deletes a mediator from the CC Mediator Map under the conditions we just
described.

Command: CCMap/Delete CC UUID, Meditator UUID

Returns: Success | Failure


Success := The designated mediator was not in the CC Mediator Map associated
with the CC UUID

Failure := Either the CC Mediator Map does not exist on this mediator, or
the Mediator UUID is not in the selected CC Mediator Map.

4-58
The third command is used to notify members of CC Mediator Maps the current local states of these
the Maps. These notifications may be done either periodically or immediately upon state changes. But,
a periodic notification is important to maintain consistent CC Mediator maps at each level.

Command: CCMap/Notify {N} L(1),L(2),...,L(N)


L(i) := {F1, F2}(i)
F1 := CC UUID
F2 := SHA-1 Hash of CC Mediator Map bound to CC UUID

Response: Success
Success := Notification was received

When a Mediator receives a notification, the idea is to compare the SHA-1 hashes with its own copies
of the CC Mediator Maps, and if there is a difference, then request updates to the changed maps. To
accomplish this the forth and final CCMap command, CCMap Get, is used. In this command there is a
maximum number of CC Mediator Maps that a mediator can “get.” Recall the MaximumCCMediator-
Get constant is included in the Mediator Greeting Response. Its value is the maximum number of
mediator map copies that will be returned.

Command: CCMap/Get {N} L(1), L(2),...,L(N), 1 <= N <= MaximumCCMediatorGet


L(i) := CC UUID(i)

Returns: {N) L(1), L(2),...,L(N), 1 <= N <= MaximumCCMediatorGet


L(i) := List element, {F1, F2, F3}(i)
F1 := CC UUID
F2 := CC Mediator Map, {M) M(1), M(2),...,M(M)
M(i) := Mediator UUID
F3 := SHA-1 Hash
SHA-1 Hash := SHA-1 Hash of the CC Mediator Map

These four commands are the complete set required to maintain CC Mediator Map consistency at a
given level in the mediator hierarchy. Inter-hierarchy, mediator-to-mediator communication is required
for routing and is discussed in section 4.2.5.
The final thing we describe for same-level mediator-to-mediator communication is for publication and
query support.
4.2.4.3 Mediator Publication and Query Forwarding Commands
In this section we assume a familiarity with the ideas discussed in sections 4.2.3.3 and 4.2.3.4 on pub-
lication and query as part of the PNToMed protocols. What must be handled here is how the mediators

4-59
communicate with one another to complete these protocols. We’ll first discuss publication. It is impor-
tant to repeat that documents are exclusively stored at a single site, and publication never gets pushed
upward to a regional mediator. To this end, assume peerNode P1 is hosted by Mediator M1 as shown in
figures 4-24 and 4-25. P1, using the PNtoMed publish command sends meta-data describing a docu-
ment to M1. M1 using the system’s hash algorithm determines the mediator on which this meta-data is
to be stored, and requires a MedToMed protocol command to push the meta-data to the destination
mediator, say, M2.

Command: Push/MetaData Meta-data


Meta-data := {L(1), source PeerNode virtualSocket, source routes}4

Returns: Success

The receiving mediator stores the meta-data, and is the unique source for this meta-data at the site
level. Note that multiple peerNodes can have copies of non-system content, and thus multiple meta-
data representing the same content can be stored at a mediator. The only difference may be the source
PeerNode VirtualSocket identifier.
The other side of publish is query. Again, queries for system documents are handled differently than
queries for general content. Recall first that queries for system documents are forwarded to the media-
tor that hosts the source of the content. And second, that queries for other content result in the meta-
data for that content being returned to the querying peerNode. To accomplish this, one command is
required. This command may have two responses, one to a meditor and the other to the querying peer-
Node.
The command defined below forwards the query to the mediator that has the meta-data. This latter
mediator is called the Next-Hop Mediator. This command is used for all content queries. It is sent from
the mediator hosting the edge peerNode that initiates the query to the mediator that has the query’s
indexed meta-data. Thus, it is called the Query “first-hop” command. We will repeat some of the defi-
nitions from section 4.2.3.4 for clarification.

Command: Query/First-hop Type, Limit, Search Qualifier


Type := System document | Application content
Limit := Upper bound of the reponses a peerNode is willing to accept
Search Qualifier := {Wildcard, F1, F2, F3}(i)
Wildcard := True | False
F1 := Search string
F2 := Query requesting peerNode’s virtualSocket
F3 := Query requesting peerNode’s source route

Returns: Type, Message


Type := System document | Application content
4. The explicit contents and description of the meta-data are found in section 4.2.3.4.

4-60
Message := Forwarded | Failed
Forwarded := Queried forwarded to Next-hop Mediator
Failed := An error occured while attempting for forward query

The purpose of the above response is to tell the sending mediator that the next-hop mediator success-
fully dispatched the command. It either looked up the meta-data belonging to the search criteria, found
a search match in the case of a wildcard search, or successfully forwarded a wildcard search. If none of
the previous are true, then the Query/First-hop command failed. If the latter occurs, then a query
response like the following is send by the hosting mediator to the querying peerNode by the means of
the querying peerNode’s M-INBOX. Now we describe the second response.
Again, for both system content and application content the next-hop mediator is where the meta-data
describing the content has been hashed. But, if the query is for a system document, then recall that the
query is sent on to the owner of the document by the next-hop mediator, and the owning peerNode will
send a response to the peerNode making the query.
If the query is an application content lookup, and not a wildcard search, then the next-hop mediator
responds to the querying peerNode by the way of its first-hop mediator with a query response. On the
other hand, if this query is a wildcard search, then the first-hop mediator initiates such a search across
the site level mediators. How this search is done is implementation dependent, can be extremely
expensive, and must be used with caution.
In both cases, a query response is returned to the querying peerNode’s M-INBOX. As a final note,
while system document queries take place in the Public CC, it is worth repeating that all application
content queries are in the context of a non-public CC.

Response: Query Response | Failed


Query Response := Type, {M} D(i), i = 0, 1, 2, ..., M, M <= Limit.
Type := System Document | Application Meta-data
M := Number of data elements returned
D(i) := Query response elements, {F1, F2, F3}(i)
F1 := System Document | Application Meta-data
F2 := Owner’s virtualSocket
F3 := Owner’s source route

Looking at the above, we see that for application content, the peerNode initiating the query is returned
a limited choice of peerNodes to contact in order to retrieve the content. Otherwise, a system docu-
ment is returned from the document owner.
The next sub-section describes both our final Mediator-to-Mediator command as well as another use of
the PNToMed command. The former provides a mechanism for mediators to alert one another of
changes in the P2P Overlay Network topology, and the a way to send urgent messages to hosted peer-
Nodes.

4-61
4.2.4.4 Mediator Alert Commands
While the behavior of a P2P Overlay Network may be purely ad-hoc, and the algorithms and protocols
we describe do their best to adapt to the dynamism of these networks, it is still necessary to provide a
means to smoothly shutdown mediators. Next, as discussed in the next section on routing, we need a
means that mediators can use to notify one another of changes in the routing topology. To satisfy these
requirements we provide mediator alert commands. These commands may arrive out-of-band and must
be appropriately handled by the receiving mediators.
The first alert command is used to gracefully take a mediator offline, or to notify the hosted Node
(mediator or peerNode) that it is overloaded, or going offline. Two explicit notifications are required:
1. In the case that is going offline, tell its next upper level mediator to remove it from the that
level’s mediator host bindings. The use of these bindings is covered in the next section.
2. Tell the immediate lower level mediators or peerNodes that it is going offline, or is over-
loaded.
Each of the above causes different actions to be taken. In the second case, the orphaned nodes must
find an alternative mediator, and the mediator under duress or going offline will give a list of suggested
mediators that can be used if it knows of any. The command is below:

Command: Mediator Alert/Offline List_of_alternatives


List_of_alternatives := mediator documents.

When a site-level mediator is alerting a hosted peerNode, then these alerts are set aside in the peerN-
ode’s alert queue. On a peerNode’s its next getMessage command, the alert must precede and data
related message. PeerNodes must be aware of this possibility and be ready to act on these alerts.
Another action taken by a mediator that is going offline is to remove itself from the mediator keep-
alive token ring. Thus, a mediator planning to go offline waits for the arrival of the token, removes
itself from the list of mediators in the token, and then passes the token to the next mediator in the list.
It is then decoupled, and the other mediators recover from its removal as described in section 4.2.4.1.
The second alert command is to notify a peerNode or another mediator about source route changes.
How the route changes are triggered will be described in the next section. Here, we can assume that
one mediator finds its new source route and notifies it to other nodes. The action to respond to this alert
is using the new route for the source routing information field in the ONP message header, when creat-
ing new ONP messages. The following is the command format:

Command: Mediator Alert/RouteChange New_Route


New_Route := mediator documents.

The last alert command is defined for the worst case when the route can not be resolved by a mediator
and the message cannot be delivered. The peerNode who sends the message doesn’t have power to

4-62
overcome this. Rather it must wait until a route resolution is possible. It is the hosting mediator’s
responsibility to look for an alternative route, and again, this procedure will be discussed in the next
section. Here, we give the format of this alert command:

Command: Mediator Alert/Undeliverable Information


Information := Text explanation

In chaper 5, section 5.3.5.2 we introduce mechanisms for mediators at a given level to detect if any of
their mediator peers are mishaving. These mechanisms use voting among themselves to do so. The
next command permits mediators at the same level to vote.
4.2.4.5 Mediator Voting Command
Mediators require mechanisms to detect, and isolate sources of misbehavior among themselves in
order to keep a secure mediator site-level running. One of the mechanisms we use is voting and for the
details you are again referred to chapter 5, section 5.3.5.2. We present enough of the voting procedure
to define the voting command. To begin with an election place has been selected. All votes are cast by
sending a ballot to the election place that is identified by a mediator UUID. Each mediator has been
verifying the behavior of all of its peers in a way that is transparent to them. The end result of this ver-
ification is the selection by each mediator of its choice for bad mediator of the verification period. The
bad guy is also identified by its mediator UUID. When the voting period begins each mediator votes by
placing the bad guy’s UUID in a ballot and sending it to the mediator election place. Each such elec-
tion has a unique numeric identifier that is monotonically increasing, and starts with 1. The initializa-
tion of this ID is by the means of the Mediator Greeting command. With this in mind we define the
mediator voting command:

Command: MediatorVote Ballot


Ballot : = Election-ID, Mediator-UUID, Signature-Algorithm, Digital Signa-
ture
Election-ID := Unique number identifying this election
Mediator-UUID := UUID of chosen bad mediator
Signature-Algorithm := Signature Algorithm used to sign the vote
Signature := Digital Signature

Response := Vote Received | Error


Error := Invalid Signature | Unknown Mediator-UUID

When the voting period is done, then each voting mediator requests a list of all of the votes from the
election place. When the votes are received by the requesting mediator, it validates the signatures and
counts the votes. This differs from standard voting where the counting is done at the election place and

4-63
the results are then seen by the voting population. The election place is a center for collecting votes,
and redistributing them on request. To retrieve the votes the MediatorBallotRequest command is used:

Command: MediatorBallotRequest

Response: L(i), i = 1,...,N | Error


L(i) := Ballot for this vote

Note that a mediator sending this command has no idea of how many mediators voted, and they elec-
tion play may try to cheat. How this is handled is again discussed in chapter 5, section 5.3.5.
In the next section we provide a mechanism for mediators to launch mobile agents amongst them-
selves.
4.2.4.6 Mediator Mobile Agent Command
The mediator Mobile Agent command permits mediators to run mobile agent services amongst one
another. Mobile agents can run entirely within the context of the mediator map or may have been ini-
tially sent from an authorized peerNode, and then, if possible, return to that same peerNode. As men-
tioned in section 4.2.3.6, the P2P Java Mobile Agent Protocol (P2P-JMAP) defines the message used
for mobile agents. The details of mobile agents are thoroughly discussed in chapter 7.

Command: MediatorMobileAgent, P2P-JMAP Message


P2P-JMAP Message := Message as defined chapter 7, section 7.2

Response: Accept | Reject


Accept := Mobile Agent command accepted for processing
Reject := Mobile Agent command will not be processed

In the next section we will describe how the system documents and content are returned. There is an
underlying delivery mechanism to which we’ve alluded throughout this chapter. It is the Mediator
Routing Protocol that assures us messages arrive to an edge peerNode’s M-INBOX.

4.2.5 Mediator Routing Protocol


Recall figure 4-29, and the hierarchical organization of mediators and CC’s. At the base level we have
the site mediators, at the next level regional mediators, and finally our global mediators. A route is an
address that contains a peerIdentity for each level in the hierarchy that the message being routed must
traverse. All routes have lifetimes that can be viewed as the maximum number of mediators a message
can traverse while being delivered. If our network is stable, then the maximum number of mediators
such a message must visit on its route to the ultimate destination peerNode is six. Given the context of
our design, we expect the stability of mediators to increase as we climb the site hierarchy, and thus, the

4-64
instability is expected to reach its maximum at the site-level. Six is not a sufficient lifetime. We believe
a value between 8 and 10 is sufficient. If the lifetime exceeds this value, then as is mentioned below, an
undeliverable message is sent to the source peerNode, and the message is discarded. The point is not to
sastisfy extreme situations where instability is the rule, but rather to build a P2P Overlay Network that
is usuable. This does not prohibit simulations of these networks where testing extremes is important to
validate the algorithms we propose. Initially, routes must be discovered and we need a discovery mech-
anism.
When a peerNode first contacts its hosting site-level mediator, it then receives the mediator document,
and this document contains the Router VirtualPort. Before sending any messages to other peerNodes a
peerNode must send a RoutingInfoRequest to its hosting mediator to request its current route, that is to
say, the route that any peerNode will use to send a message to this peerNode. In particular, a route con-
tains the global-, regional-, and site-level Mediator PeerIdentities along with the CC-UUID and PeerI-
dentity of the requesting peerNode. The CC-UUID is required to correctly deliver the message to the
destination peerNode on its hosting mediator since a M-INBOX is partitioned by CC’s.
Recall from section 4.2.3 that when any document’s meta-data is hashed, the source routing informa-
tion of the peerNode that owns the document is stored along with the meta-data on a site-level media-
tor at the owning peerNode’s site. Since a VirtualPort and PeerIdentity document are both required
before a peerNode can be contacted, any peerNode attempting such a contact will already have a rout-
ing hint. It is only a hint because the route may have changed since the document’s meta-data was
hashed. How such routes are resolved when they do change is discussed just below.
4.2.5.1 Requesting Routing Information
Notice the similarity of this scheme with IP.v6 routing as discussed in chapter 3, section 3.2. This is
intentional because the authors feel that any P2P routing protocols that are to succeed on the Internet
must try to interface with existing, and well thought out IETF standards. Hierarchical routing is there
to, among other things, minimize routing table size, route administration, and routing information
updates across the Internet, and in the same manner, across our possibly global P2P Overlay Network.
Another idea behind hierarchical routing is that the higher the routing level the more stable the media-
tors at that level. Site-level mediators are imagined to be the least stable. When a site-level mediator
crashes, or goes offline in a controlled fashion, then new routes must be established by its hosted peer-
Nodes. We assume that from either configuration information, or information passed from the media-
tor hosting a peerNode to that peerNode when it gracefully goes offline, that an alternative mediator
can be selected. In the latter case, on a regular interval peerNodes contact their mediators for current
routing information, and alerts (see section 4.2.8). A mediator shutting down will do two things:
1. Refuse new peerNode connections, and offer alternative mediators to those peerNodes
attempting to connect.
2. Post route changes and alternative mediator choices as an alert that will piggy-back on any
message that a peerNode requests.

4-65
As part of a RoutingInfoRequest response, at every level, at least one alternative mediator’s contact
information will be included if such a mediator is known to the hosting mediator:
1. A peerNode RoutingInfoRequest to its site-level mediator will receive the global-, and
regional-level Mediator PeerIdentities used to route back to it along with alternative site-
level mediator information.
2. A site-level mediator RoutingInfoRequest to its regional-level-mediator will receive the
global-level PeerIdentity used to route back to it along with alternative regional-level medi-
ator information.
3. A regional-level mediator RoutingInfoRequest to its global-level-mediator will only
receive the alternative global-level mediator information.
These responses will also cover the case of a mediator failure at any level. If multiple mediators at a
given level fail, then the hosted peerNode/mediator has no choice but to wait until alternative mediator
is again online. There is nothing more than can be done in this case.
To this end, the first command of our routing protocols is the RoutingInfoRequest. Note that the Hel-
loTransportInfo found in this command is defined in section 4.2.1.

Command: RoutingInfoRequest Request-level


Request-level := peerNode | site-level | regional-level

Reponse: RoutingInfoResponse
RoutingInfoResponse := peerNode | site-level | regional-level

peerNode := Route1, Alternative1(i), i = 1,...,N


site-level := Route2, Alternative2(i), i = 1,...,N
regional-level := Alternative3(i), i = 1,...,N

Route1 := global-Mediator peerIdentity, regional-Mediator peerIdenity


Route2 := global-Mediator peerIdentity

Alternative1(i):= HelloTransportInfo of an alternative site-level mediator


Alternative2(i):= HelloTransportInfo of an alternative regional-level medi-
ator
Alternative3(i):= HelloTransportInfo of an alternative global-level media-
tor

As with any routing information protocol, peerNodes make routing information requests on a timer.
We suggest a peerNode send a RoutingInfoRequest every thirty minutes when idle, or less frequently
when active. The latter is appropriate because alternative routing information alerts may be piggy-
backed on every message a peerNode requests from its hosting mediator using the PNToMed/getMes-
sage protocol. Thus, active peerNodes need not send RoutingInfoRequest until they are idle.

4-66
4.2.5.2 Resolving Source Routes
A given peerNode’s source route is the route another peerNode on the P2P Overlay Network uses to
send messages to the given peerNode. It is the peerNode’s complete P2POverlay Network address.
Unfortunately, unlike the IP.v6 layer, these peerNode addresses can be unstable. The only thing that is
fixed is the peerNode’s PeerIdentity.
Therefore, every peerNode using the RoutingInfoRequest has a dynamically determined source route.
It’s dynamic in the sense that changes in the routing hierarchy can cause a given source route to
become invalid. How do we deal with this situation?
A peerNode, P1, sends an ONP message to another peerNode. The site-level mediator, M1, hosting the
source peerNode, P1, immediately knows if P1’s source route is correct with respect to its current
knowledge of the routing hierarchy. Yes, the hierarchy may have changed since M1’s most recent
RoutingInfoRequest has been made, but this is not important here. If the source route in the ONP has
changed, then M1 will change the source routing information in the ONP message header, and at the
same time post a route change alert in P1’s Public-CC INBOX. P1 will receive this alert the next time
P1 does a getMessage, or sends a RoutingInfoRequest.
We have two possibilities for routing this message at P1’s site-level mediator:
1. It is to be delivered at the same site-level. In this case, no further changes in the source
route are necessary because the only important source routing information is the hosting
mediator’s peerIdentity in the source route.
2. It is to be delivered up the hierarchy. Then we have the following:
a. M1’s regional-level mediator, R1, is up. The message is sent to R1.
b. R1 is down. M1 sends a message undeliverable alert to P1, noting that R1 is invalid, and
begins the process necessary to establish a new regional-level mediator using the regional-
level mediator documents received in the original mediator greeting command. Recall in
section 4.2.2 that in the initial “welcome” response received when the mediator M1 con-
tacted R1, M1 received a list of alternative regional level mediators. If there are no such
alternatives, then M1 needs to periodically probe for the resurrection of R1, otherwise, M1
sends a “greeting” to a regional mediator candidate from the alternative list and proceeds as
is described in section 4.2.2.
Let’s now assume that R1, M1’s regional-level mediator, receives the message. This again leads to two
possibilities:
1. It is to be delivered at the same regional-level. If the source global-level mediator is incor-
rect, then a route change alert is placed in M1’s Public-CC INBOX to propagate the global-
mediator change as quickly as possible to M1. R1 then corrects the route and delivers the
message.
2. It is to be delivered up the hierarchy, and G1 is R1’s global-level mediator. Again, we have
two possibilities:

4-67
a. G1 is online and M1 can contact G1. The message is sent to G1, and no further source
changes are required.
b. G1 is down. R1 sends a message undeliverable alert to M1 to point our the source routing
error, when M1 notes and then sends an undeliverable alert as above to P1. R1 then looks
for alternative Global mediators in the same that that the R1 did in the above discussion.
For global-level mediators, if a message gets to a global level mediator, then the source route back to
the originator of the message is correct. There is nothing further to do.
What does a recepient of a source route change alert do? The recepient simply corrects its local source
route for inclusion in any subsequent ONP that it sends. And, what does the recepient of a message
undeliverable alert do? In this case, the destination route is incorrect. This is covered in the next sec-
tion.
The above procedures will resolve source routes as a message is being delivered and are shown in Fig-
ure 4-32.

G1

R1 R2

M1 M2

P1

Same level delivery cases


Upper level delivery cases

Figure 4-32. Resolving Source Routes

4.2.5.3 Resolving Destination Routes


How do we discover and recover from destination routing failures? Before we discuss this we need to
introduce the notion of mediator-level, destination route hashing. The routing hierarchy has three lev-
els of hosting: The site-level mediators host peerNodes; the regional-level mediators host site-level
mediators; the global-level mediators host regional-level mediators. At each mediator-level coherence

4-68
is maintained by the Mediator Keep-Alive command. We need another form of “keep-alive.” This
keep-alive is used to find the current mediator host binding of either the peerNodes or mediators
hosted by a given level. Such a binding is a pair, {hosted Node, host mediator} where “hosted Node” is
one of either a peerNode, site-level Mediator, or regional-level Mediator.
To explain this concept, we select a mediator site-level, S0, with one or more mediators, and each such
mediator hosting one or more peerNodes. Furthermore, an ONP message has arrived to S0 through the
routing hierarchy. Thus the route is of the following form:
Global-level mediator PeerIdentity = G1, Regional-level mediator PeerIdentity =R1, Site-
level Mediator PeerIdentity = M1, CC-UUID, destination PeerIdentity = P1.
Here, M1 is at the site S0, and may or may not be hosting P1 because these bindings are dynamic. All
we know is that the ONP message must be delivered at this site or it is undeliverable. If the ONP mes-
sage arrives to the destination site-level mediator, M1, and that mediator indeed hosts the destination
peerNode, P1, we are done. But, what do we do to recover from a situation where either the site-level
mediator is unreachable at this site or the peerNode is no longer hosted by the site-level mediator in
the destination route?
We introduce what we call mediator host binding to help us recover from these situations. To simplify
what we wish to explain, initially we will stay at the site-level mediators. When a peerNode connects
to its site-level mediator, the site-level mediator hashes the {peerIdentity, mediator peerIdentity} such
that the mediator-host-binding pair is stored on the mediator determined by the value of Hash(peerI-
dentity). This, like all hashed information has an expiration time-out. Now, if a peerNode notes that its
hosted mediator is no longer responding, then it formally abandons this mediator, and reconnects if
possible to one of its alternative mediators. The Hash(peerIdentity) procedure is redone across the site-
level mediators.
Three things are possible to cause a peerNode to redo its mediator host binding. First, the mediator
hosting the peerNode has crashed, second the mediator is overloaded and redirects the peerNode to a
less loaded site-level Mediator as is described in section 4.2.4.4, and third, the peerNode is physically
partitioned from its mediator by a physical layer failure. Let ONP1 be an ONP message from a peerN-
ode, P1, to a peerNode P2, that originated at P1’s site-level. Furthermore, suppose that P2 has lost con-
tact with the mediator M2 in the destination route, and P2 is now hosted by M3. For the first case,
where M2 has crashed, ONP1 will be undeliverable from a mediator at the same level, that is to say,
from the mediator, M1, hosting P1:
1. M3 has rehashed {P2’s PeerIdentity, M3’s PeerIdentity}. Hash(P2’s PeerIdentity) hashes
to a unique mediator, M. M may or may not be the same mediator to which P2’s previous
mediator-host-binding hashed since the mediator token-ring is dynamically determined. If
it is the same, then the mediator-host-binding is replaced.
M1 cannot contact M2. M1 then computes Hash(P2’s PeerIdentity), and appropriately
looks up M3 in its mediator-map for the CC in question, and delivers message. Thus,
whether the meditor to which the Hash(P2’s PeerIdentity) is stored is the same or different,
the lookup of P2’s mediator is uniquely resolved.

4-69
2. The destination mediator, M2, is still up, and M1 delivers the message to M2. But, M2 is no
longer hosting P2. M2 computes Hash(P2’s peerIdentity), and recovers P2’s mediator-host-
binding. Here we have two cases:
a. P2’s mediator-host-binding is {P2’s PeerIdentity, M3’s PeerIdentity}. In this case M2
forwards the message to M3.
b. P2’s mediator-host-binding is {P2’s PeerIdentity, M2’s PeerIdentity}, i. e., M2 looked up
the old binding which resolved to itself, and is incorrect. As part of the hashing algorithm,
M2 then sends a delete {P2’s PeerIdentity, M2’s PeerIdentity} to the source of the invalid
value, and then does another lookup. The hash algorithm is guaranteed to return {P2’s
PeerIdentity, M3’s PeerIdentity} if it has been hashed. M2 forwards the message to M3.
Figure 4-33 shows the above two cases.

P2 P2
{P2, M2} {P2, M3} {P2, M2} {P2, M3}

M2 M3 M2 M3

M1 M1

P1 P1

case 1 case 2
Mediator is down
New route to P2
Figure 4-33. Resolving Destination Routes
The next possibility is that the message is being delivered to site S0 from its regional-level mediator,
R1. If M2 is up, then the message is delivered to M2. This will reduce to case 2 if M2 is no longer host-
ing P2. Otherwise, case (1) above applies as long as some mediator is up at the site S0. Here, R1 for-
wards the message to any mediator, say M0 at site S0, and M0 proceeds exactly as M1.
Similarly, both the regional-level and global-level mediators maintain mediator-host-bindings. In the
first case, when a site-level mediator connects to its regional level mediator, the {Site-level Mediator
PeerIdentity, Regional-level Mediator PeerIdentity} pair is hashed across the regional-level mediators.
And, in the second case, the {Regional-level Mediator PeerIdentity, Global-level Mediator PeerIden-
tity} is hashed across the global-level mediators. The behavior for the delivery of an ONP message is
identical by viewing the mediator levels from the perspective of the site-level discussion.

4-70
As the dynamic behavior of the P2P Overlay Network imposes new destination routes, the above
mechanisms provide a way for the nodes on this network to self-adapt. Independently of these
changes, once a message arrives to its destination site-level, it usually has at most one hop to reach the
hosting mediator of the destination peerNode. If the network is stable, it arrives to the hosted peerN-
ode. Still, the dynamicity, rehashing, etc, places a load on the mediators that we must try to reduce.
There are procedures that can be followed to help reduce this burden:
1. When a peerNode receives an ONP message from another peerNode, the following two
things must be done:
a. The sender’s source route has been made current as it traversed the P2P Overlay Net-
work. Noting that this source route is how to get back to the sender of the message, the
receiving peerNode must always update its destination route to the sending peerNode from
the source route in the ONP header.
b. The destination route in the ONP message is the receiving peerNode’s most current
source route. This too must be updated. When a peerNode is idle it uses periodic Routing
Information Requests to update its source route.
2. When a peerNode connects to its hosting mediator, it republishes the indices of meta-data
describing the documents it wishes to make available to other peerNodes in the set of con-
nected communities in which it is active. The older indices have timeouts, and will be
flushed. This is clearly subject to DoS attacks that are discussed in chapter 5.
This finishes our section on routing. We next look at procedures mediators use to do load balancing.

4.2.6 Mediator Information Query Protocol


Although mediators maintain constant contact with the keep-alive token ring, the necessity arises for a
mediator to query other mediators for information concerning their current system load and resources,
and their stability to help with mediator load balancing in conjunction with the mediator redirect
response. Since mediators may duplicate their published meta-data on one or more other mediators for
redundancy in case of system failures, certainly it is reasonable to be able to test the integrity of this
data to assure that the storage of this data is reliable, and we discuss this in Chapter 5. It may be that a
mediator suddenly finds itself at an aggravated state of resource exhaustion, and must continually
reject greeting requests by sending mediator redirects. A mediator in such a situation wants to select
the best possible candidates for redirection of greeting commands.
Since load statistics of mediator neighbors are not always current, an reasonable approach for candi-
date selection is to do the following:
1. A mediator would look at its collected load statistics from the keep-alive token ring data,
select one or more candidates from this list based on a “least loaded algorithm.”
2. Query these candidates to see which among them, if any, is still suitable for including in the
redirect response.

4-71
Recalling from our discussion on the keep-alive token ring that we collect load related data on the
number of hosted peerNodes, or next lower-level mediators; Message traffic; And stability. The first
two statistics use UNIX-like uptime values, for example, the number of hosted peerNodes over the last
minute, 5 minutes, and 15 minutes. There are many ways to interpret this data. One possible algorithm
is to keep averages of the averages over time, and at critical moments, look for minimum values and
the trend. The former can be gotten from an increasing numerical sort of each of the three columns,
and the latter from the calculated second derivatives of the 3 point curves recalling that negative sec-
ond derivatives imply a decreasing trend [MATH].
Stability must always be taken into account. If a system has the minimum loads, and also the minimum
uptime, this is not a good candidate. To this end, the best candidates are selected and again queried to
see if their current load statistics reflect the analyis. If they do, then they become alternative mediators
that are appropriate for the mediator redirect command.
With the above discussion in mind we have the mediator information protocol’s loadstat command:

Command: LoadStat

Response: Host Load, Message Traffic, Stability


Host Load := N1 N2 N3
N1 := Number of hosted peerNodes or mediators over the last minute
N2 := Number of hosted peerNodes or mediators over the last five minutes
N3 := Number of hosted peerNodes or mediators over the last fifteen minutes

Message Traffic := N1 N2 N3
N1 := Number of messages/second over the last minute
N2 := Number of messages/second over the last five minutes
N3 := Number of messages/second over the last fifteen minutes

Stability := Total consecutive minutes of uptime

Where below L(1) is a document’s meta-data, see section 4.2.3.3, the next command asks the media-
tor on which the complete meta-data has been stored to do a hash of this meta-data and return the
hashed value. It is important to note that all such hashes may be Message Authentication Codes where
the hash is encrypted with a shared secret as is discussed in chapter 5.

Command: IntegrityCheck Meta-data


Meta-data := {L(1), source PeerNode virtualSocket, source routes}

Response: Hashed Value of the above Meta-data

When the sender receives the responders hashed value, it compares it to its own. If the comparison is
incorrect, then this raises some issues:

4-72
1. Does the responding mediator have hardware problems,
2. Is the responding mediator lying, that is to say, trustworthy,
3. Does the receiving mediator have hardware problems?
The responses to the above issues are not easy. Certainly, an administrator, if one exists, should be
notified by the usual means: Email, logfile messages, etc. The mediator could retry the command to see
if an identical negative response is received. A tricky recepient might return a correct answer this time,
and so one must keep histories of this kind of data. This leads us directly to a trust by reputation model
that one might create to in fact prohibit untrustworthy mediators from participating in the P2P Overlay
Network. This is discussed more completely in our security section. In the trust by reputation model,
do note that a suspicious mediator can be dropped but only by voting where each vote is signed with a
private key, and where all mediators have the appropriate public keys stored locally [LOCKSS]. Again,
this is discussed in chapter 5. Do note that such a method cannot be attacked if the public keys are
obtained in a secure way. What we are saying is that an attacker cannot vote others out of the mediator
group even if it can initiate such a vote.
The next command on this port is used to implicitly poll mediators at each level for the quality of their
responses to standard MedToMed queries: Routing, and keyed content lookup. The details covering
the use of this command are found in chapter 5, section 3. The command is simple. A mediator proxies
a lookup for information that it has hashed by the means of a second mediator. The recipient of the
lookup cannot distinguish the lookup from a normal lookup. We note that the selection of a proxy is
random, and the command is used on randomly defined lookup intervals.

Command: ProxyPoll Validation_type Data


Validation_type := Route | Content
if Validation_type == Route, then
Data := UUID of hosted sub-level peerNode | Mediator
else
Data := Content hash key

Response:
if Validation_type == Route, then
response := UUID of same level mediator that is the destination route
else
response := L(I), I = 1,...,N
L(I) := UUID of mediator that hosts the sub-level mediator that has
access to the content | UUID of peerNode that is source for the
content.

Let’s review what happens when a route is up-leveled. This discussion can be applied to hashed con-
tent because the underlying mechanisms are identical. Similarly, the behavior is the same at each level,
and without loss of generality, we describe the regional level use of the above command. An important
thing to notice is that the ultimate mediator that responds to the proxied poll cannot differentiate the

4-73
query to which it is responding from a normal query. The proxying mediator may or may not behave
correctly. But, the below test can isolate this misbehavior since it will be applied with multiple ran-
domly selected proxies. Given the assumption that most mediators want to cooperate, if the outcome of
almost all of the proxied queries for a fixed route are valid, and among those invalid responses we find
the same bad guy, then that is well noted and handled as in chapter 5, section 3.
Each regional-level mediator supports some number of site-level mediators, and is thus a downward
route for each of these nodes. In the following discussion the R and S subscripted symbols are UUID
based. When a site-level mediator, Sj, first contacts a regional-level mediator, say R1, with the greeting
command, the regional mediator adds the site level mediator to its list of supported mediators creating
its {hosted node, host} bindings (see section 4.2.5.3), {S1, R1}, {S2, R1}, ... , {SN,R1}. Then it hashes
each such binding across the regional-level with Sj as the key which resolves as routes to the Sj through
R1.
During the proxy polling validation, R1 randomly selects Rs and sends a ProxyPoll route command
with a hosted site-level mediator UUID, Sk, as the data to Rs. Rs does the hash of the key Sk, H(Sk) =
Rt, and sends a destination route request to Rt. Rt responds with the value Rx and Rx is sent to R1. If x
equals 1, all is well. Otherwise, both Rs and Rt are suspect. The important thing is that Rt cannot differ-
entiate this proxy polling from normal routing requests. How these results are judged is again dis-
cussed in chapter 5, section 3.

{Sj, R1} Rt
{Sk, R1}
Rx destination route request
{Sj, R1} Rx
{Sk, R1} R1 ProxyPoll Rs H(Sk)=Rt

Sj Sk

Figure 4-34. Proxy Polling


A simple approach for unconvering misbehaving mediators is where each mediator keeps a suspect or
bad guy list of mediators that seem to misbehave. Other mediators will require this list to help make
mediator-level, global judgements concerning which mediators really are bad. For this purpose we
have the Bad_guy_list command:

Command: Bad_guy_list
Response: L(I), I = 1,...,N
L(I) := UUID of mediator that is considered to be bad

4-74
A simple histogram over the number of mediators to which this command is sent can be kept and used
to judge the final outcome. Each time the mediator is judged to be bad, its associated bad guy count
which is initially 0 is increased by 1.
Well behaved mediators will never be on the bad list unless someone is lying. Again, we assume that
almost all mediators are cooperative. Lying will be detected by a community vote. The community vot-
ing commands are described above in section 4.2.4.5.
4.2.6.1 Validation of Site-level, Mediated PeerNode System Documents
There is one aspect of P2P data validation that neither integrity checking nor the above polling
schemes address. That is how can we validate the query/response protocols for system documents.
Here, the ultimate source of such a document is always its creator. Thus, we need a mechanism to
assure that the content requested is the content received. We assume that a site-level mediator hosts at
least one peerNode that is trusted. How this trust is established is covered in chapter 5, section 3.
Again, one assumes that most peerNodes will cooperate. Almost all are well behaved and pretty much
would like to send the bad guys to jail for life. They’d do anything to help corner one of them. That is
the attitude we are taking here.
To help isolate any such culprit a mediator generates N phantom peerNodes. They only exist from the
perspective of the mediator and cannot be distinguished from the real peerNodes by other mediators.
They can belong to any CC, and each has a phantom M-INBOX, the Ghost-INBOX. The mediators
generate queries from these phantoms onto their own site. These queries are for documents belong to
one or more trusted peers. Note that the mediator knows on which mediators the keys for these docu-
ments have been hashed since it hashed them and keeps a record of these actions.
Recall how such a query proceeds. It is a good idea to look at figure 4-24 before proceeding. Periodi-
cally, our mediator generates the above queries from the phantoms. What is expected is that the
responses from the trusted peerNodes will arrive in the Ghost-INBOX’s within the expected time inter-
val assuming that average query round trip times are kept on each mediator. There are two possible
behavioral failures for a invalid response:
1. The mediator to which the document meta-data is hashed is bad.
2. The mediator to which the route back to the querying mediator is bad. Recall that the medi-
ator in (1) looks up this route to forward the query to the trusted peerNode in question.
In case (1) the querying mediator will try several queries across its collection of phantoms for the same
meta-data. Thus, since different destination mediators are used for the routing lookups, and the suppli-
ers of the routes are unaware of a test going on, if the mediator is case (1) is bad, then this will be
detected over time. Most mediators are cooperative.
For case (2) the mediator tries different system document meta-data with the same phantom as the
source of the query. Here then, we are varying the mediator in case (1) and nailing down the source of
the destination route. Again, over time, if the results are bad, the router is suspect. If the router is well
behaved, this prevents the bad guy in case (1) from singling out routers and attempting to make them
appear bad.

4-75
Note, that misbehavior is recorded for an ultimate decision that is made by a vote among trusted medi-
ators as is discussed in chapter 5, section 3.
In our next section we look at how our P2P Overlay Network can simulate real transport multicast to
simplify communication where content is directed at multiple destination peerNodes.

4.2.7 Mediator Multicast Protocol (MMP)


In section 4.1.4 we gave an overview of multicast behavior on the P2P Overlay Network. If the reader
is not familiar with that section, then now is a good time to review it. Among other things, concepts
like Multicast Group, and their UUID’s are defined there. In all cases, P2P Overlay Network Multicast
is an application requirement and applications drive multicast communication. We might have an
application like a PChat program, that supports a P2P chat room. PChat would create a PChat Virtual
Port document and register it to its hosting mediator using the MMP Register command. The registra-
tion is within a CC. Thus, five values must be registered to permit the peerNode to be a unique commu-
nicate endpoint in the P2P Multicast Group:
1. PeerIdentity
2. The local Multicast VirtualPort
3. Head-End-Caching ON | OFF
4. Multicast Group UUID
5. CC UUID

When the mediator receives a registration, if necessary, it creates a multicast group, and adds an entry
for the registering peerNode. Next, the mediator needs to propagate the existence of this group to all
mediators that also host peers that are registered in this same group. To accomplish this the MMP
Propagate/Add command is used. The behavior is analogous to the Mediator-to-Mediator CC Map
Notify command described in section 4.2.4.
To be brief, the {Multicast Group UUID, {PeerIdentity, peerNode Multicast VirtualPort}} pair is
hashed as Hash(Multicast Group UUID) yielding a unique mediator, say M1, that is called the Multi-
cast StarNode, in the site mediator map. Note that this latter mediator need not be a member of the CC.
Then the {name, value} pair is sent to M1. M1 first responds to the mediator that sent the propagate
command with a list of all such pairs that it has already stored. Second, it sends the above {name,
value} pair to all members of the multicast group using the MMP Update/Add command. Third, it
updates the next higher level in the hierarchy in a similar way.

4-76
R1

Update
Propagate M1 Propagate

M2 M3

Register Multicast StarNode Register


peer peer

Figure 4-35. Multicast registration, propagation and update


When the Update command is up-leveled, the next level up mediator must have a hook into the imme-
diate lower level to multicast messages downward. Therefor, the up-level Update/Add command needs
only to happen once, and that is when the multicast group is created. In this example, M1 sends {Mul-
ticast Group UUID, M1 Mediator PeerIdentity} pair up one level. Thus, in this way, the existence of a
multicast group will be known across the entire P2P hierarchy and communication between any two
peerNode members is possible. However, this does not imply that multicast messages will be propa-
gated everywhere. No Godzilla-grams are possible. CC’s provide the necessary control here.
The other side of registration is unregistration. To support this we have the MMP Unregister command.
The behavior is exactly the same. There are a dual “Propagate/Remove” and “Update/Remove” com-
mands that unregister the mediator when it no longer hosts a peerNode in the multicast group in the
CC. These commands are also up-leveled. When there are no more members of a multicast group, then
the next higher level is updated as above.
The goal is to be able to multicast messages within the multicast group in a given CC. Imagine our
PChat application with an active chat room, and multicast group. Suppose P1 sends a multicast mes-
sage to its mediator, M1, using the MMP. Here, we are using the ONP/UMP. M1 first looks to see if
any of the other peerNodes it is hosting are also members of the multicast group. If so, then it places a
message in each such peerNode’s M-INBOX(CC). Next, it propagates the message to all known medi-
ators at its site level that are also members of this multicast group. Each mediator recipient delivers the
message to all of the peerNodes it hosts that are also members of the multicast group. The latter medi-
ator does not propagate the message any further unless head-end-caching is indicated for the multicast
group in the registration command. If the head-end-caching state is ON, then the message will also be
propagated to the StartNode at each level.
As a final step, M1 up-levels the message to its hosting regional mediator using the MMP. The regional
mediator will propagate the message to any other regional mediator that is also a member of the multi-
cast group. Similarly, the initial regional mediator recipient up-levels the message to its global media-
tor that again propagates the message in the same fashion. It is important to note that only those
mediators at each level that support down level members of the CC, and the multicast group will

4-77
deliver the messages downward. In this case, they are delivered to the Multicast StarNode that stores
and propagates the multicast group membership information at its level, that is to say, there is a Multi-
cast StarNode at each level of the mediator hierarchy. The Multicast StarNode has the duty to multi-
cast the message across its level to the known mediator members of the multicast group, and again, if
head-end-caching is ON, then it also caches the message.
With the above discussion in mind we can now explicitly define the MMP commands in the same order
in which they were introduced in the above discussion:

Command: Register F1 F2 F3 F4
F1:= {Multicast Group UUID, {PeerIdentity, peerNode Multicast virtualPort}}
F2:= Head-End-Caching ON | OFF
F3:= Suggestion cache lifetime if Head-End-Caching ON
F4:= Registration idle lifetime in seconds

Response: OK | Failed

There are many reasons for failure, among which are:


1. The peerNode is not an active member of the CC, i. e., the PNToMed Notify command has
not been sent,
2. The peerNode is already registered,
3. The mediator resources cannot support the registration. Mediators may in fact have upper
limits on the number of peerNodes that can be concurrently members of a Multicast Group.
The cache lifetime is a hint about how long a StarNode should keep a multicasted message. This hint is
only a suggestion, and certainly StarNode’s can have their own cache maintenance policies. A sensible
policy is that longer cache lifetimes are used at the site-level because these mediators are closest to
those peerNodes that require the multicast data.
The idle lifetime is there to permit the mediators to garbage collect idle Multicast Group registrations.
If an application is idle for a sufficiently long time so that the registration has expired, then any attempt
to send a multicast message will be denied until a re-registration takes place.
Next the Multicast StarNode will propagate the registration:

Command: Propagate/Add F1 F2 F3
F1:= {Multicast Group UUID, Mediator PeerIdentity}
F2:= Head-End-Caching ON | OFF
F3:= Registration idle lifetime in seconds

Response: OK {N} L(i) i = 1,..., N | Failed


L(i) = Mediator PeerIdentity

4-78
The Mediator PeerIdentity is sufficient because the receiving Mediator must be already aware of the
Mediators at the same level or shortly will be made aware by the Mediator Keep-Alive token ring. The
head-end-caching state is present so that the receiving mediators can respond to cache requests from
their hosted peerNodes.
The command will fail if the receiving mediator has reached a resource limit like (3) above. Similarly,
a mediator can request that the Multicast StarNode remove it from the multicast group. This will
always happen when a mediator is no longer hosting peerNodes registered in that group:

Command: Propagate/Remove F1
F1:= {Multicast Group UUID, Mediator PeerIdentity}

Response: OK | Failed

The only reason for failure is if the sending mediator is not a member of the list of mediators support-
ing this multicast group at this level.
Finally, the Multicast StarNode is responsible for up leveling the registration to its next level mediator.
This latter mediator is called the Multicast-First-Contact mediator:

Command: Update/Add F1 F2 F3 F4
F1:= {Multicast Group UUID, Mediator StarNode PeerIdentity}
F2:= Head-End-Caching ON | OFF
F3:= Suggestion cache lifetime if Head-End-Caching ON
F4:= The idle lifetime in seconds of this information

Response: OK | Failed

As with the previous commands, failures will be for redundant registration or resource limitations.
And, the Update/Add command has its dual for unregistration:

Command: Update/Remove F1
F1:= {Multicast Group UUID, Mediator StarNode PeerIdentity}

Response: OK | Failed

The above commands take care of propagating the existence of a multicast group and establishing the
pathways through the P2P Overlay Network for CC-based multicast communication. Next we define
the commands for sending multicast messages. Note that sending a multicast message from a peerN-
ode to a mediator or from a mediator to the same level mediator is functionally the same, i. e., these
messages are ultimately placed in the MM-INBOX’s of the peerNodes registered at each such recepi-
ent mediator.

4-79
Since the originating mediator of a MMP message sends this message both across its level to other
mediators supporting the multicast group, and up to the next level mediator, the latter mediator needs a
way to notify the originating site that it’s Multicast StarNode is not responding to multicast downward
propagation. To this end a special message is added to the OK response for the send command in this
case.

Command: MulticastSend S U M
S:= Optional sequence number that is required if head-end-caching is ON
U:= Optional UUID of originating peerNode if head-end-caching is ON
M:= UMP/ONP Encapsulated message

Response: OK [F1] | Failed


F1:= Your Down level starNode mediator is not responding

Here there are several reasons for a failed send:


1. The sending peerNode is not a registered member of the multicast group. Here the failed
response comes from the peerNode’s hosting mediator.
2. The receiving mediator does not support the multicast group.
3. The receiving mediator’s resource limitations have been exceeded.
4. The sending peerNode is the only registered member of the multicast group. This is true if
there single mediator in the multicast group, and that mediator has only the sending peerN-
ode as a registered member of the multicast group.
The above send is for ONP/UMP messages which for the most part are unreliable. Certainly, they may
be sequenced if redundant messages cause a problem for the applications involved. If one looks more
closely at the 3rd reason for failure, it may be that implementations of the MMP will try and compen-
sate in some way for the inability to deliver a message along the way. Usually, retransmission and reli-
ability are managed by the sending and receiving endpoints. With ONP/ACP this is done by the P2P
system software itself much like TCP/IP. For multicast this becomes the application’s responsibility,
and separate application protocols can be written to manage P2P voice or video streaming that might
use the MMP.
Still, we feel it is necessary to introduce a head-end-caching scheme as we have done to minimize the
network impact of restransmitting missing multicast messages. To accomplish we have the following
command that permits a peerNode to request missing multicast messages from its hosting mediator.
This command is used by both the PNToMed and MedToMed protocols:

Command: MulticastCacheRequest S U
S:= Sequence numbers that are required
U:= UUID of originating peerNode

Response: OK [F1] | Failed

4-80
F1:= StarNode mediator is not responding

The hosting site-level mediator that receives this request from a peerNode will ask the site-level StarN-
ode using the same command for the requested cached messages. If the StarNode does not have all the
cached messages, then it up levels the request to its regional level mediator appropriately adjusting the
sequence numbers for the cache hits it fulfils. At the same time, if it has cache hits, it immediately
sends these messages using the Multicast Send Command to the requesting site-level mediator. Similar
actions occur at the regional and global levels. Note that MulticastCacheRequests are never propagated
down level. If the global mediator does not have a cache hit, it discards the request.
One thing that is important to note about our architecture is that CC’s coupled with hierarchical routing
and multicast group registration creates virtual unicast-like channels for the delivery of multicast mes-
sages. One can view each site level mediator as an abstraction of a subnet and multicast messages are
never delivered until they reach the subnet to which registered, hosted peerNodes are connected. Thus,
we can have two site level mediators separated by regional and global mediators and there will be a
single virtual channel connecting these two site level, endpoint mediators. The slight variation on the
subnet theme is that at each such mediator endpoint the downward delivery to the site level goes
through the StarNode Mediator for the multicast group ID. This is seen in the following figure where
M3 is the StarNode Mediator:

G1

R1 R2

M1 M3 M2

M1
P1 P2

P1

Multicast Virtual Channel


Figure 4-36. Mediator Multicast Virtual Channel
We close this section with a discussion on single points of failure. Such failures can occur at each level
in the mediator hierarchy. We’ve already discussed how one recovers from same level and up-level

4-81
mediator failures in our section on routing. What is of primary concern here is the loss of a StarNode
Mediator since it is at the center of multicast activity:
1. It maintains the integrity of the same level multicast group maps.
2. It is the means by which multicast messages are propagated downward.

How can a StarNode Mediator go out of service? There are multiple possibilities among which the
best is going away gracefully or in a planned fashion. This is covered in section 4.2.4.4. We must addi-
tionally add two extra steps:
1. At the site-level where the Multicast Groups are created, the mediators that host peers that
created Multicast Groups must rehash the Multicast Group UUID. This, in effect, creates a
new StarNode Mediator at the site level, and this propagates upward with the Update/Add
command.
2. At the upper levels, the remaining Multicast-First-Contact mediators must rehash as in (1)
above.
The loss of a StarNode Mediator can also be discovered by the means of the Keep-Alive Token ring.
The recovery mechanism is identical to the above. Finally, a mediator multicasting to the next higher
level receives the “F1” field in the response to the MulticastSend command. This is handled exactly
like the Keep-Alive Token Ring failure.
This completes our discussion of the Mediator Multicast Protocol. We next look at PeerNode-to-Peer-
Node protocols.

4.3 P2P PeerNode-to-PeerNode Protocols


In the introductory paragraph to section 4.2 we noted that at times peerNodes may want to take advan-
tage of the fact that they have direct network connectivity. There are a couple of perspectives on this. It
may be that administratively this is not permitted. For example, if copyrighted data may be exchanged
for personal use, then the administrators of the network may do three things to assure copyright protec-
tion. First, private connectivity on the P2P Overlay Network must be disabled. Why? If, for example,
TLS is enabled, then any two peerNodes may fake the meta-data so that the underlying content seems
innocent enough, and subsequently send copyrighted content to one another on an encrypted connec-
tion. If the data itself has no built-in protection against reuse by the means of Digital Rights Manage-
ment, then there is not much that can be done from the point of view of the network to prohibit this.
Second, all peerNode communication must be mediated. This permits mediators to play the role of
copyright police, and to look closely at the data that is being sent to make sure it is being done so
legally. Note that here TLS is disabled, all sent data can be logged for later reexamination, and that the
source and destination PeerIdentities are known. This brings us to the final point, third, all PeerIdenti-
ties must be regulated and assigned under administrative control. Otherwise, two peerNodes can create
new PeerIdentities, transfer content, and then abandon these Identities for other new ones. This can be

4-82
done in such a way that one will never know who the communicating peerNodes were. PeerNodes can
operate in stealth mode.
On the other hand, leaving the world of the paranoid, there are many kinds of applications that one can
do more effectively with a P2P Overlay Network, and which do not involve copyrighted data. For
example, ad-hoc collaboration for developing software, design documents, articles, etc. can easily be
done on P2P networks. The information may be proprietary, and its dispensation is controlled by the
application itself; Instant messaging, and chat rooms are more natural and more easily managed on a
P2P Overlay Network; In the same sense one discover the availability of participants for meeting,
invite the participants, and hold a meeting using P2P in a way that is private and again more efficient
than in the usual centralized, client/server model; As a final example, content sharing networks of sci-
entific data like journals and papers can be created, and access to this data controlled within an organi-
zation. Thus, we have a P2P-Digital-Library with no single point of failure and which is self-
administrating, etc.
Clearly, it makes good sense for peerNodes to take advantage of the fact that they have direct network
connectivity whenever it is permitted. This lessens the load on mediators, and maximizes the commu-
nication bandwidth between peers. To enable direct communication we need a way of establishing
direct contact after the two peerNodes have discovered one another via the mediators. We remind the
reader that peerNodes never locally hash meta-data, and that all meta-data, and thus it’s associated
content, must be discovered via the previously defined publish/Query protocols.
As we currently have defined things, peerNodes never require the real transport information in the
PeerIdentity document. All communication is mediated, and while the VirtualPort and PeerIdentity
documents are required to form a ONP message that fulfils the end-to-end communication require-
ments, the real transport information in the PeerIdentity document is never used up to this point. Recall
that the PeerIdentity document contains the secure transport identity, and the virtual port document
defines the communicate type bound to the VirtualPort, that is to say, either unicast, unicast secure,
multicast or multicast secure.
How do two peerNodes discover they can communicate in a non-mediated fashion? As soon as they
have enough information to communicate, they know the complete route to try. Is this sufficient? No,
even if the peerNodes have identical site-level mediators, they may be separated by NAT or a firewall.
Therefor, the only choice is to attempt to communicate using a real transport, and this transport is from
the peerIdentity document. This document is acquired using the mediated publish/query mechanism.
So, to establish a non-mediated connection we have a special PNToPN protocol. To make this request
a peerNode requires both the VirtualPort and PeerIdentity documents. This is only used to transfer real
content. And remember, this is still an ONP message on the P2P Overlay Network. This network has
two communication endpoints. The command is as follows:

Command: RequestDirectConnection

Response: OK | Refused

4-83
What happens? It is possible that the real transport cannot be used. In this case, the real transport con-
nection attempt will fail, and thus, mediated communication is used. On the other hand, if the real
transport connection succeeds, then the receiving peerNode’s Overlay Network Level 2 code sets ONP
direct communication state to active. Thus, a connection is established binding the two virtual sockets
so that all communication on this connection is direct. When the OK response is received, the sending
peerNode’s Overlay Network Level 2 code acts identically. Note that this is invisible to the applica-
tions involved.
If a Refused response is received, then even if direct communication is possible, mediated communica-
tion must be used.
This finishes the discussion of P2P Overlay Network Communication Protocols. We now move on to
the protocols that govern connected community behavior.

4.4 P2P Connected Community Protocols


Before reading this section we suggest the reader at least browse section 3.4.2 on the CC document.
Also, as this section proceeds, we will mention various attacks against which our protocols must be
shielded if the goal is to create a secure P2P Overlay Network. In all cases, resolutions to matters of
security are discussed in chapter 5. Otherwise, the CC protocols are centered around membership,
where membership may be entirely ad-hoc and unrestricted, or require authentication or some kind.
In order to become a member of CC, the CC must exist. Existence is easy to satisfy. It means that it is
necessary for a peerNode to create a CC document and publish it using the publish command in the
PNToMed protocol suite. Recall from section 4.2.3.3 that the publication of such a system document
actually publishes enough information to contact the document’s creator and thus the creator of the
CC. PeerNodes then discover the existence of a CC by one of many possible ways. Certainly wildcard
searches in the Public CC for CC documents are feasible but not really desirable because of the impact
on the P2P Overlay Network performance. Some networks may disable wildcard search. Otherwise,
there are many out-of-band methods to find a CC’s description, name and CC-UUID. This can be fol-
lowed by a straight forward lookup of the CC Document in the Public CC.
Once this document is acquired, a peerNode can send ONP messages with the CC-UUID in the header
because the creator’s source route is obtained along with the acquisition of the document. Recall that
all system documents must be retrieved from their owner. While this seems rather severe for CC docu-
ments, it does permit us to secure access to the CC’s content. How? Imagine that one of the policies in
the CC document involves authentication in order to receive the CC’s content, and that the credentials
necessary to authenticate must originate from the CC owner. This does not prevent another peerNode
from creating a clone of a CC, but it will prevent this clone from distributing any of the original CC’s
content unless the content is stolen.

4-84
In any case, with the possession of the CC document, we can then freely query for content within this
CC given our infrastructure. Querying gives one a pathway to a peerNode source of the content but
may not yield access rights to the content. First, we may have to authenticate ourself to retrieve the
content from one of its many possible sources. This requires an authentication mechanism to enable
the authentication process. The CC document’s policies can include access to an authentication service
used by a particular CC. It is true that unless our documents secured in some fashion, for example, they
are signed by a trusted 3rd party, they can be forged, and the authentication service may be bogus. Yes,
at each step an attack is possible.
A peerNode introduces itself into a CC as a new member using the CC membership command. It is by
the means of this command that a peerNode will use any authentication mechanism demanded by the
policies in the CC document. Becoming a member of a CC is analogous to joining a club. There are
membership requirements, special club membership photo identity cards may be passed out, and the
card may have to be presented to use any of the club’s services. With the above in mind and noting that
the local peerNode has the CC document and the creator’s source route in hand, we now define our
CCMembership command:

Command: CCMembership/NEW F1
F1 := VirtualPort Document to contact the peerNode requesting membership

Returns: OK [Stamp]/ REFUSED

The above appears to be trivial, and in one case it is. If the membership policy in the CC document is
ANONYMOUS, then the only requirement to access content in the CC is the presence of the CC-
UUID in the ONP header. In this case the local P2P system notes this in its running software, and
returns OK. The command is essentially a NOP. Furthermore, if the running software attempts to
access content in a CC that supports ANONYMOUS access, and access is refused, this inconsistency
must be reported to the application because something is broken and must be fixed.
On the other hand, we have policies to guide the execution of the CCMembership command, and they
use the command’s F1 parameter. We assume the channel to the owner of the CC can be opened, and if
not, the REFUSED is the return value as a local default. These policies are in the CC document have
the following values and descriptions:
1. REGISTERED - Here all that is required is that the requesting peerNode contact the Viru-
alPort using the ONP/ACP protocol. This gives the owner enough information to register
the peerNode in any way it desires. The owner has the latter peerNode’s PeerIdentity docu-
ment. The owner may return an optional registration stamp as a validation of registration. If
this is the case, then all requests for content to CC members must be accompanied by this
stamp. The stamp does not involve strong security and is not for authentication. And, yes,
access to content may in fact be possible without registration or the stamp depending on the
peerNode member’s implementation of this policy. The policy is one of “good faith.” The
owner is interested in keeping track of those peerNodes that access the peerGroup’s con-

4-85
tent, and doing so with a minimal effort. And, well written, “good faith” software will not
permit a peerNode to access content belonging to such a CC unless the registration has
been locally marked as OK. “Good faith” automatically brings to mind content exchange
based on trust by reputation. We have a discussion of trust based on reputation in chapter 5.
2. RESTRICTED - The two peerNode’s then have enough information to open a communica-
tion channel between themselves to enable whatever mechanism the CC owner desires.
What happens at this point is out-of-band and application dependent but clearly a credential
must be given to the requesting peerNode to enable further access to the CC’s content.
Command closure is again system/application dependent. The only thing that is required is
that whatever mechanism is used leaves behind as its final statement OK or REFUSED. For
example, a JAVA class can be downloaded, loaded by a local class loader and run. What the
class finally does is return a “String result” which can be tested in the following statement:
if (result.equals(“OK”)) { ...} else { ...}
The above is clearly symbolic, but points out what is necessary to finish the command. It is important
to point out some possible member policies which are CC dependent to guide implementors of this
protocol.
When membership is granted a credential given by the granting peerNode. This credential may have a
limited lifetime as well as privileges associated with it. And, clearly, the credential must be presented
to other members along with any request for content. If for example, the credential has expired, then
the request for content will be refused and the membership must be renewed. As renewal applies to
RESTRICTED membership, the following command is used in this context:

Command: CCMembership/RENEW F1
F1 := VirtuaPort Document to contact the peerNode requesting membership

Returns: OK / REFUSED

The responses are exactly the same as in (2) above. Also, if privileges are associated with the creden-
tial they can be increased or diminished as a function of the behavior of the member. This is a way of
controlling bad behavior and rewarding good behavior, and as such, is a useful tool for a CC. There are
many possibilities for the use of privileges and they depend on the P2P Overlay Network application.
It is important to note that from the authors’ perspective that mediators are credential and privilege
neutral. These are CC issues to be handled by the members. Given this point of view, one can still
imagine situations where mediator performance might be sacrificed at the expense of adding more con-
trol to the mediators’ functionality. This is an implementation issue.
We have not addressed the problem of membership revokation. Rather, we have presented an easier
solution for RESTRICTED membership and that is to have an expiration date in the membership cre-
dential which forces the member to use the CCMembership/RENEW command. Why not address
revokation? Revokation is extremely difficult without centralization. If we imagine a CC using the
RESTRICTED policy with thousands of members, and one membership must be revoked, then without

4-86
some form of centralization, this is a nearly overwhelming task. Every member must learn that the bad
peerNode’s credential has been revoked. If one requires contacting a central authority, for example, the
CC creator, prior to granting content access, then sure, it can be done, but this is a heavy performance
hit. One can also distribute revokation across multiple CC members to simplify the task but this is still
a burden. The decision for putting in place a mechanism for the immediate revokation of a peerNode’s
membership could be based on value of the content. Here we are hinting at copyrighted content that
has commercial value. This is not the only case. A CC might have personal content that has an intrinsic
value to the CC, for example, high quality images, music, and software. While there is always legal
recourse for these problems, one would like to be able to stop the abuse of membership priviledges
immediately. One might also wish to ban a member that is violating membership rules, and causing
emotional duress in the CC. Traditionally, the latter can be dealt with on an individual bases. Good CC
software should provide methods of doing what is similar to caller ID. Here the bad peerNode is iden-
tified by the software and the user simply turns off access for this individual or application.
This section completes the design aspect of P2P Overlay Network. The infrastructure is now well
defined and implementable. This chapter up to this point includes transport message formats, commu-
nication protocols, a mediator hierarchy, mediator functionality and supportng protocols, content pub-
lication and queries, routing, peerNode to peerNode communcation, multicast on the overlay network,
and CC protocols.
In this chapter we have discussed the use of hashing algorithms without giving specific examples. In
the next section we select some typical algorithms that are in use today on P2P networks, and apply
them to our own infrastructure.

4.5 P2P Hashing Algorithms in the Context of Our P2P Overlay


Network
In the previous sections, we proposed a generic P2P overlay network topology to enable a peerNode to
efficiently reach another peerNode, by the means of mediators. Also, to reach the reasonable degree of
scalability, we have introduced a connected community and heirarchical mediator organization. As
mentioned before, the document hashing algorithm plays a critical role in the routing and lookup
engine, and we will review and compare several existing hashing algorithms in this section. The moti-
vation is not to rank these algorithms, instead, any of them can be a building block for the P2P overlay
network proposed in this book.
At the Early stages of P2P network deployment, the discovery of a piece of content relied on either
centralized services, such as Napster which used 160 servers5, or flooding-based mechanisms, such as
Gnutella. The centralized scheme is vulnerable to single points of failure, and the average cost of the
flooding-based scheme is to send O(N/2) messages for each lookup. The worst case is O(N). N is the
5. Was Napster really P2P? The authors think not.

4-87
number of peerNodes in a P2P Overlay Network. As more research groups became involved in this
field, more sophisticated approaches were introduced. Many of these approaches support a Distributed
Hash Table (DHT) to store the hashed values (hash keys) of documents somewhere in the network.
Given a hint from the key, a peerNode can easily find the document with a reasonably low cost. These
algorithms are not only a benifit to scalaibility but also fault-tolerant, and provide the self-organizing
feature for a P2P Overlay Network. These benifits have been proved by a number of projects such as
Plaxton, Chord[CHORD], Pastry, Tapestry, CAN[CAN], etc.
Basically, among the various projects, the hash key generation can be summarized as below:
hash key = SHA-1(Data);
SHA-1 yields 160 bits (20 bytes) of data, and at times not all of these bits are used. In this case we will
write
hashkey = SHA-1j (Data), where j is the j most significant bits of SHA-1(Data).
Then, various projects have various ways that one can use to "assign" the hash key to a certain peerN-
ode in the P2P Overlap Network. In general, when a data lookup happens, the lookup will be sent to a
limited number of peerNodes, usually a maximum of O(log N) for N peerNodes, to find the ultimate
peerNode that knows about the content.
In our infrastructure we are only concerned with mediators. This minizes the search space for both
content lookup and routing. In our first example we will use the Chord algorithms to hash the meta-
data of published content across the space of site-level mediators to which the peerNode possessing the
content belongs. Please reread section 4.2.3 to review publish and query.

4.5.1 Prefix-Based - Plaxton, Pastry, Tapestry, Chord


The earliest prefix-based hashing algorithm might be Plaxton. Although it was not designed for the
P2P enviornment, its basic idea has been widely used since its intent is to locate distributed objects
with “fast access ... where access requests are satisfied by a copy close to the requesting node.” Thus,
if a peerNode with UUID 12345, receives a lookup request with key 12678, it forwards the lookup to
a peerNode whose UUID matches the key’s first 3 digits, such as 12690. Then the peerNode 12690
finds the next peerNode with first 4 digits matching the key such as 12673. At last, the peerNode
12678 can be found. Pastry, Tapestry and Chord are extensions of the ideas in the original Plaxton
paper.
As discussed in section 3.2.2.2, each of our N site-level mediators has a peerIdenity that is calculated
using an algorithm that guarantees uniqueness. To exhibit the power of these algorithms, we apply
Chord to our mediator infrastructure.
In the Chord research each node is assigned an m-bit identifier using SHA-1(IP Address). The impor-
tant thing is to pick m to be large enough so that no hashing collisions occur across the site-level medi-
ators. Noting that SHA-1 yields 160 bit hashes, to this end we use a SHA-1 based, psuedo-random

4-88
number generator to create 128 random bytes, i. e., the peerIdentity = SHA-1128(128 random bytes).
Thus, each mediator has a 16 byte, unique peerIdentity. These peerIdenties are ordered in a mononti-
cally increasing sequence, M0,..., MN.
Now, we have content with its associated meta-data, and we wish to hash the meta-data, F1, to yield a
Chord key. This key gives us the mediator, Mk on which we will store the {key, {L(1), virtualSocket,
source route information}} binding. The Chord key k is defined as follows:
k = SHA-1(F1) MOD 2128.
Then for some j, 0 <= j <= N, we will have Mj <= k < Mj+1. And, if k = Mj, then the triple is stored on
Mj, otherwise it is stored on Mj+1. In this manner, the N mediators form a ring like our keep-alive
token ring, and given a stable network, content query takes exactly 1 lookup per mediator level.
An important feature of Chord is the distribution of keys across the space of mediators. This is called
consistent hashing and provide a load balancing scheme. The Chord key distribution is given by
(1 + ε)K/N = MAXKeys, the maximum number of keys per node, where K is the number of
keys, and N the number of nodes. The upper bound on ε is O(log N).
Thus for 64 mediators, and 1,000,000 keys MAXkeys = 140,625.
The number of required lookups is a deviation from the Chord ring of all peerNodes, rather than just
all mediators. For Chord, if one has N peerNodes, then a lookup will take at most O(log N) tries, and
every peerNode have to store information about O(log N) other nodes. It ends up that both approaches
keep the same size tables. For example given 32,000 peerNodes, Chord requires 15 entries per table
per peerNode, and in our site-level tables have one entry for each mediator, and as mentioned below
this would maximize at about 15 mediators of average processing power. Again, we must emphasize
that the mediator lookup is 1 in our approach. If the reader looks at section 4.2.4, one sees that we
assume a Chord-like ring there.
Again, in a real enterprise, we may indeed have only 3 or 4 mediators supporting several thousand
peerNodes. The cost is a $cost to purchase expensive, high-end systems to support viable enterprise
strength P2P Overlay Networks. But, even for small overlays of several hundred peerNodes, say for
example, a community or neighborhood P2P Overlay Network, 2 or 3 low-end systems as mediators
will be sufficent to support the community. We are not looking at millions of peerNodes on a single
P2P Overlay Network where each peerNode is responsible for storing keys and doing routing. Why?
We can easily imagine cellular telephones and PDA’s as peerNodes, and they cannot be for many rea-
sons used in this fashion: Battery powered, minimal resources (CPU, memory), bandwidth is low and
costly because it will be wireless and shared, etc.
Before finishing this section we must address what we do when a mediator goes offline in either a
planned fashion or by crashing. First, our keep-alive token permits the mediators to maintain a list of
active mediators along with their PeerIdentities no matter how the ring suffers the loss of a mediator.
Given content, the issue is a little bit different since with catastrophic failure, and no redundancy

4-89
hashed keys get lost. Looking at the last issue first, without redundancy, a mechanism must be put in
place to force other mediators to rehash all meta-data whose keys have been lost. If this approach is
preferred, than a list of the {key, meta-data, mediator peerIdentity} bindings must be kept on the origi-
nating mediator that has hashed the meta-data for the peerNodes that it is hosting. We keep the key in
this list so we do not have to continually rehash, and we also have a link to the values that are stored
along with the key. When the keep-alive token ring’s token arrives at such a mediator, then it will rec-
ognize the mediator has gone offline, and then will rehash the meta-data, thus redistributing the keys to
reflect the current mediator topology, and updating its {key, meta-data, mediator peerIdentity} bind-
ings. How is this different from a mediator, say Mk, notifying other mediators that it is going offline?
1. The token entry for this mediator is tagged that it is going off line.
2. Each mediator receiving the token notes (1) and waits for another token cycle.
3. Mk in the meantime copies it’s keys to its successor, Mk+1.
4. Mk waits for the token, marks its entry as shutdown, forwards the token, and shuts down
5. Thus, all of Mk’s successors upon receipt of the token update their local {meta-data, Mk
peerIdentity} bindings, and the hash key distribution remains consistent.
One more form of rehashing takes place. The peerNodes that are hosted by Mk must rehash their con-
tent after reconnecting to new mediator. This will update the source route in the hashed {L(1), virtual-
Socket, source route} triple that is bound to the content’s key as mentioned above.
Finally, what do we do when a new mediator arrives, and joins the ring. First of all, assume the keep-
alive token has made a full cycle. Before the cycle we had the mediator peerIdentity (MPID) ordering
MPIDk-1 < MPIDk for Mj-1 and Mj, and we can assume a new mediator, M0, arrives with an MPID
such that MPIDk-1 < MPID < MPIDk. Thus, after the full cycle, all mediators are aware of M0’s pres-
ence, and in particular, Mj. Mj must copy some of its key bindings to M0. Which ones? Precisely those
keys, k, for which MPIDk-1 < k <= MPID, i. e., those that would have hashed to M0. The originating
mediators also must update their {key, meta-data, hash target MPID} bindings for those keys, k, such
that, MPIDk-1 < k <= MPID is true. In this fashion we recover from site-level ring member drops and
joins.
A final issue we need to address is how to lookup content across the mediator hierarchy. This is
explained in our next section.

4.5.2 Up-level content hashing


In the previous section we discussed site-level hashing only. This begs the question about efficient con-
tent queries across the entire mediator hierarchy. In order to accomplish this task we are required to
have a scheme for infrastructure wide hashing. To this end we do the following:
1. At the site level when a peerNode first hashes its content, its hosting mediator a part from
hashing the meta-data as described above, also sends the triple {key, hash target site-MPID,
CC} to its regional mediator using the MedToMed protocol.

4-90
2. Upon receipt of this data the regional mediator hashes this triple across the region level in
the same way as is done at the site-level.
3. Similarly, the regional mediator using the MedToMed protocol sends the triple {key, hash
target regional-MPID, CC} to its global mediator.
4. Finally, the receiving global mediator hashes this information across the global mediator
view in a fashion similar to what is done at the regional and site-levels.
Let’s note here that up-level mediator to mediator communication as well as regional and global hash-
ing will use a variation of the MedToMed publish command where only the data being published is
changed.
Given the above distribution of hash keys, how does a peerNode use lookup to find its target content?
This is achieved by following procedure:
1. If the lookup fails at the peerNode’s site-level, see section 4.2.4.3, then it is passed up to the
regional mediator level by a variation of the MedToMed query forwarding command.
2. Recalling that all queries for non-system documents take place in a specific CC, then if this
is a query for non-system content and the CC is not supported by another regional mediator
in the receiving regional mediator’s CC-Map, then we go to step (4) below.
3. The receiving regional level mediator does a lookup at the regional level using the key for
which the lookup failed at the site-level. If this lookup succeeds, then the query is passed
downward to the site where the content is located by the hash target, regional level-media-
tor. Here, the query is sent to the regional-level mediator that originally hashed the key, and
thus hosts the site where the site-level mediator that up-leveled the content hash element
described above. It is possible that at all levels in the hierarchy there are multiple choices
for the targeted content. We select one of the many possible targets and we never propagate
the search downwards. The lookup will either be satisfied or fail, and in either case, an
appropriate response is sent to the querying peerNode. If the regional lookup failed then we
go to the next step.
4. Here, the receiving regional mediator sends the query upward to its global mediator.
5. The global mediator following exactly the procedures in steps (2) and (3) proceeds with the
lookup. Failure at the global level for either the lack of CC support in the global CC-Map or
key lookup will terminate the lookup in process.
If we implemented across the hierarchy the Chord hashing mechanism as described in the previous
section, then the worst case cost for a lookup is 6. This cost is incurred whenever a query is sent
upward to the global site-level. Similarly, the maximum cost of key insert and key delete is 3. The
delete uses the publish delete command. We should also again mention that all published meta-data as
a publication lifetime in minutes to eliminate stale data. If the data is not republished, then when this
lifetime expires, it is expunged.
The above query procedure is described in figure 4.37.

4-91
hash G2 lookup

G1 G3
{key, R2, CC} lookup
hash
R1 R2 R3
{key, M2, CC} lookup
hash
M1 M2 M3

lookup

P1 P3

Figure 4-37. Hierarchy-wide Content Lookup


This concludes our discussion on hashing. We again note that they are many other algorithms that one
can apply besides Chord such as CAN as we mentioned in the introduction to section 4.5. CAN also
has a maximum lookup cost of O(log N) if the dimension of the CAN space is (log N)/2 [SURVEY].
CAN spaces are different than Plaxton prefix based algorithms. For a complete discussion see [CAN].
Gnutella has a lookup cost of O(N) and although it is important as an early P2P system, is not one the
authors would chose to impose on its mediator hierarchy.

4.6 More 4PL Examples


In order to bootstrap our system, all peerNodes and Mediators must send mediator greeting commands
to one another (see section 4.2.1) in UMP/ONP messages. If the mediator greeting command is
accepted by the destination mediator, then the response contains the mediator document. This docu-
ment provides all that is necessary to establish further real and P2P Overlay Network communication
with the mediator. Our first 4PL example sends a greeting command and receives a response. This is
admittedly a simplification of the real code that must be written but it does show the precise function-
ality.
The greeting command is the first command that a peerNode sends. As a consequence, the peerNode
requires some preconfigured information to make first contact. This is briefly discussed in section
4.2.1. Since we are sending an UMP/ONP message, we must fill the header with known destination
mediator peerNode information, and also, we have to know the mediator’s real transport address that

4-92
manages incoming P2P Overlay Network messages. Thus, we assume that each peerNode possesses a
prioritized list of configuration information for each known mediator. An example might be:
greeting://uuid-mediatorPeerIdentity.uuid-greetingVirtualPort
tcp://<IP address>.<TCP Port>
Then, at boot time the peerNode iterates this list attempting to send a greeting command to each medi-
tor. The iteration is terminated as soon as a “Welcome” response is received. Noting that a challenge
may be required to complete the command, we have the following 4PL code, where first the mediator
opens its greeting virtual port, and second, a peerNode send a command to this virtual port:

// This is a mediator greeting command daemon.


// We respond to greeting command by sending a mediator document,
// mediatorDoc.
// If we receive multiple, successive mediator commands from the
// same peerNode, we will slow it down by sending a puzzle to which
// it must return a solution before receiving a mediator document.

// 1. Create our listening virtual port document since it is required to


// create a virtual port.
// We assume we have already created our peerIdentity document, pidDoc.
VirtualPortDocument greetingPort = newDocument(VIRTUALPORT, pidDoc, “greet-
ingListener”, unicast);

// 2. Given our document, we can now create a listening virtual socket


VirtualSocket greetingListener = newSocket(greetingPort);

// 3. Listen for incoming greeting commands


VirtualChannel listenChannel = listen(greetingListener, UMP);

LOOP
BEGIN “Greeting Listener”;

UMPMessage greetingMsg = ump_receive(listenChannel);

// We have a greeting command


PeerIdentity pID = getPeerIdentity(greetingMsg);

// See if this guy is on our hacker list.


// This implies the current “greeting” must contain
// a puzzle solution.
IF (onHackerList(pID)) THEN BEGIN
Boolean rightAnswer = correctSolution(greetingMsg, pID);
if (NOT rightAnswer) THEN BEGIN
// Tell OS to refuse all further connection attempts
setRefuseFurtherConnections(pID);
CONTINUE;

4-93
END;
// Solved the puzzle. Remove from the evil peerNode list.
// A correct solution implies that in the frequency test below
// we will have:
// frequency LESSTHAN maximumFrequency
removeFromHackerList(pId);
END;

// Let’s check our history list to see how often this peerNode
// greets us
int frequency = getMessagesPerMinuteFrom(pID);

Boolean hacker = FALSE;


GreetingCommandResponse puzzle = null;

IF (frequency GREATERTHAN maximumFrequency) THEN BEGIN


// The puzzle difficulty directly proportional to the
// square of the frequency.
response = makePuzzle(frequency);
placeOnHackerList(pID, puzzle);
hacker = TRUE;
END ELSE
hacker = FALSE;

// Send our response to the source


// First we make the data to be included in the UMP message
if (NOT hacker)THEN
response = makeGreetingResponse(mediatorDoc);

// a. Extract the source from the message


VirtualSocket greeting_source = extractSocket(UMP, SOURCE, greetingMsg);

// b. Create a channel with the source as the destination VirtualSocket.


VirtualChannel responder = newChannel(greetingListener, greeting_source,
UMP);

// We send our greeting response


ump_send(responder, response);

// Close the channel


ump_close(responder);

END “Greeting Listener”;

Now that we’ve looked at the 4PL description of the mediator side of the code, let us consider the
peerNode 4PL greeting command code. Again, assume the peerIdentity document is accessable as the
global variable named pidDoc. This will use a procedure call. We are using a function call because if

4-94
there is a failure, then the system will need to respond to this state. It will in most cases wait a while,
and try again finally invoking the user to handle the situation. For example: The P2P Overlay Network
is down. Please try again later.
An intelligent system will try to minimize user interaction. Perhaps, give them an interesting game to
play, stream live music or perhaps show a movie if such such features exist. “Sorry you network is
down. Would you like to watch one of these films? Etc ...”

// Procedure sendGreetingCommand:
// Returns a MediatorDocument on success,
// Otherwise, returns NULL
//
MediatorDocument sendGreetingCommand()
BEGIN “send greeting command”
// Create a random local socket used to send a greating command
VirtualSocket sendSocket = newSocket();

// Counter for known mediators


int entryNumber = 0;

// Returned mediator document


MediatorDocument medDoc = NULL;

// We need a puzzle solution if we are challenged


// Initially, we have no puzzle to solve, so we create an
// empty puzzle solution which effectively does not include
// a solution in the greeting command.
PuzzleSolution puzzSol = newPuzzleSolution(NULL);

LOOP
BEGIN “Send Greeting LOOP”

// We get the next mediator configuration information entry


VirtualSocket nextMeditorSoc = fetchNextConfiguredMediator-
Socket(entryNumber);

// We may have exhausted our known mediators


IF (nextMediatorSoc EQUALS NULL) BREAK;

// Create a channel to send the greeting command


VirtualChannel sendChannel = newChannel(sendSocket, nextMediatorSoc,
UMP);

// Get the send command data


GreetingCommand cmd = newGreetingCommand(puzzSol);

// Send the greeting

4-95
ump_send(sendChannel, cmd);

// Wait for the greeting response


GreetingReponse response = ump_receive(sendChannel);

// Received a response?
IF (response NOTEQUAL NULL) BEGIN “Reponse”
// We have three possible responses:
// 1. Welcome?
ReponseType type = getGreetingReponseType(reponse);
IF (type EQUAL “Welcome”) BEGIN “Get Document”
medDoc = getGreetingDocument(reponse);
BREAK;
END “Get Document”;

// 2. Redirect? This mediator is busy and knows


// a better choice that will accept us.

IF (type EQUAL “Redirect”) THEN


BEGIN “Get Redirected Mediator’s Document”
MediatorDocument tmpDoc = getGreetingDocument(reponse);
// We insert this mediator’s Socket
// as the next entry in our iterated list
insertMediatorSocketAfter(tmpDoc, entryNumber);
END “Get Redirected Mediator’s Document”
ELSE BEGIN “Received a challenge”
// SOme mediators will challenge all greetings
Puzzle puzz = getPuzzle(response);

// This can take some time


puzzSol = solvePuzzle(puzz);

// We retry and leave the entryNumber the same


CONTINUE;
END “Received a challenge”;
END “Reponse”;
// Either no reponse or a redirect was received
entryNumber = entryNumber + 1;

END “Send Greeting LOOP”;

// If the medDoc is NOT NULL we are done


// Let our caller deal with the problem
RETURN medDoc;

END “send greeting command”;

4-96
Our next example queries for a peerNode’s VirtualPort document. Please at least browse section
4.2.3.4 if you are no longer familiar with this use of the PNToMed protocol. The initial step will create
a Query Command, and send this command to its mediator. Recall, that since this is a query for a sys-
tem document, the query is done in the Public CC, and the response comes from the peerNode that
published the document.

// SystemDocument lookupSystemDocument:
// Returns the SystemDocument or NULL
SystemDocument lookupSystemDocument(String documentType, String search-
String)
BEGIN “lookup”
// Create the query command
// First we set the constants
String type = “System Document”;
int limit = 1;
int N = 1;
Boolean wildCard = FALSE;
String F1 = documentType + searchString;

// We require a virtual socket to send the request


// Note: We will create a listening channel to receive
// the response using this virtual socket
VirtualSocket sendSocket = newSocket();

// We also need our source route(how to get back to us)


SourceRoute routeHome = getSourceRoute();

// Now we construct the command. There are lots of parameters and


// these should be accumulated into a structure or class in a real
// programming language
QueryCommand query = newQueryCommand(type,
limit,
N,
wildCard,
F1,
sendSocket,
routeHome);

// We need to acquire the PNToMed mediator’s listening socket


VirtualSocket PNToMedListener = getMediatorVirutalSocket(PNTOMEDIATORPRO-
TOCOL);

// Next, we create a channel for sending and receiving


// We assume that we can connect to our mediator.
VirtualChannel queryChannel = newChannel(sendSocket, PNToMedListener,
UMP);

4-97
// Send the query command
ump_send(query, queryChannel);

// Now wait for the response


QueryResponse response = ump_receive(queryChannel);

// See if we have a response


// NULL response means either we did not contact the mediator,
// or the remote peer that created the documented was not contacted
IF (response EQUAL NULL) Return NULL;

// OK, at least there was a response.


// See if any data is returned. It is possible that nothing
// was found, that is, the mediator lookup failed,
// the next hop mediator did not respond, or
// the next hop mediator could not contact the document owner
IF (getQueryResponseCount(response) EQUAL 0) Return NULL;

// Have the document, extract and return it.


SystemDocument sysDoc = getSystemDocumentFromQuery(response);

// close the channel


ump_close(queryChannel);

// Return the document


Return sysDoc;

END “lookup”

Finally, a slightly more complicated procedure is to query for and receive content that is not a system
document. This is identical to the above procedure call until the response is received. A query for con-
tent returns the information necessary to contact one or many possible content sources to receive the
content itself. We will iterate through the list until we make contact, and then we proceed to retrieve
the data. The list of sources can contain information such as geographic proximity, average throughput,
etc. as part of the return content meta-data. We do not make these decisions in the 4PL code since they
are really implementation dependent.

// Continuing from the above example:


// Now wait for the response
QueryResponse response = ump_receive(queryChannel);

// See if we have a response


// NULL response means either we did not contact the mediator,
// or the the mediator where the meta-data is stored was not contacted
IF (response EQUAL NULL) Return NULL;

4-98
// We have meta-data that contains a list of peerNodes
// that are content sources. Extract and return this meta-data.
// We assume the response is well defined, i. e., matches the protocol
// definition in section 4.2.3.4
ApplicationMetaData appMetaData = getApplicationMetaData(response);

int i = 0;
int N = getMetaDataCount(response);

// We need a virtual socket to connect


VirtualSocket localSocket = newVirtualSocket();

LOOP
BEGIN “retrieve data”
IF (i GREATERTHAN N) THEN
// We are done and did not find the content
BREAK;
ELSE
BEGIN “extract element”
// Extract data that permits connecting with content source
ApplicationMetaDataElement e = appMetaData[i];
VirtualSocket remote = getVirtualSocket(e);
SourceRoute routeToSource = getSourceRoute(e);

// Next, tell our system that we have a route to the peerNode


// to which we wish to connect, i. e., open a channel
systemUpdateRoute(routeToSource);

// Now we open a channel to the source. Recall, for ACP that


// this creates a connection
VirtualChannel contentChan = newVirtualChannel(localSocket, remote,
ACP);
// See if open succeed
IF (contentChan EQUAL NULL) THEN
BEGIN “close and continue”
acp_close(contentChan);
i = i + 1;
CONTINUE;
END “close and continue”;

// We have a connection. Try and retrieve the content:


// We need to build a content request
SourceRoute routeBack = getSourceRoute();
ContentRequest conRequest = newContentRequest(e, localSocket, route-
Back);
// Send request to the content source
acp_send(contentChan, conRequest);

4-99
// Get our response
ContentResponse conResponse = acp_receive(contentChan);

// See if we have a response


IF (conResponse EQUAL NULL) THEN
BEGIN “close and continue”
acp_close(contentChan);
i = i + 1;
CONTINUE;
END “close and continue”;

// Have the reponse which means the remote peerNode is


// willing to send us the content. Recall the response
// contains the transfer mechanism, the content type, and
// the content size in bytes.
// We generalize and receive the content as a byte stream,
// assuming we have a handler for each data transfer type.
ByteStream content = newByteStream(getContentSize(conResponse));
IF (receiveContent(getTransferType(conResponse, content)) == NULL)
THEN BEGIN “close and continue”
acp_close(contentChan);
i = i + 1;
CONTINUE;
END “close and continue”;

// We have the content so we are done


acp_close(contentChan);
BREAK;
END “extract element”;
END “retrieve data”;

With the above 4PL example we are finished with chapter 4, our discussion with ONP protocols. We
have covered a lot of ground in this chapter. We defined our overlay network transport layer; the medi-
ator topology and protocols; the peerNode-to-Mediator protocols; the peerNode-to-peerNode proto-
cols; the connected community protocols; and demonstrated real world, hashing algorithm examples.
The mediator protocols are a complete collection that support an hierarchical infrastructure to yield
optimal performance, minimal content lookup cost, and address publish, and query; the keep-alive
token ring; the CC Mediator Map; Hierarchical routing; mediator information query for load balanc-
ing; and multicast across the mediator hierarchy.
We believe that the hierarchical infrastructure is necessary for a world-wide P2P Overlay Network. It
will localize activity and contention to the site level mediators except when content is desired off-site.
And, in this latter case, the hierarchy keeps the off-site access cost to a minimum. Clearly, for us, medi-
ators are at the heart of any P2P network of reasonable complexity. Certainly, small systems are suit-
able for the role of mediators if the P2P network is small, say on the scale of an neighbor-wide P2P

4-100
network. But, we also want P2P to move into the enterprise where reliability will be the norm that
must be met for success.
The mediator hierarchy we have suggested is very close to the IP.V6 structure. This, as we have said
before, is intentional because we feel that for P2P to be a success it must ultimately, and sooner better
than latter, be based upon IETF Standards. There is an Internet Research Task Force Research Group
(IRTF RG) on P2P [IRTF RG]. This is encouraging as a first step towards the standards we desire.

4-101
Chapter 5 Security in a
P2P System

When we begin to think about security and P2P Overlay Networks, and in par-
ticular, ad-hoc P2P networks with no real centralization, we must take a leap
from the accepted, in place, on-the-Internet, security practices into the
unknown. There are potentially billions of peer nodes, some related, and some
not, all possibly vulnerable to attack in a multitude of ways: Impersonation
attacks and thus identity theft by unauthorized or falsely authorized parties;
Invasion of privacy and all that carries with it; Loss of data integrity; And repu-
diation of previous transactions, “Hey, no way, I did not say that!” We imagine
the equivalent of anti-matter, a complete negation of the fundamental princi-
ples of security, or the anti-secure net. Those among us with a strong interest
in the secure net, and making P2P not only an accepted but preferred way of
both doing business in the Enterprise as well as protecting the personal pri-
vacy of the innocent users of P2P software require a toolbox with sockets, and
a socket wrench that is capable of applying the torque that is appropriate to
each scenario we wish to secure. While first we will describe the basic under-
pinnings of Internet security, our ultimate goal in this chapter is a description of
such toolbox.
5.1 Internet Security
The origins of the Internet date back to about 1962. What we know as the Internet arguably had its
beginnings close to 1983. While the RFC’s for TCP/IP, UDP/IP, ICMP/IP were published in 1981,
TCP/IP was not adopted as the ARPANET standard until 1983, and at this time the ARPANET split
into the ARPANET and MILNET. This partitioned the network into non-military and military usage,
respectively. Security was not an initial concern for the post 1983 ARPANET that became the Internet.
General use and growth of the Internet since that time has shown the Internet’s vulnerability to attack,
and as a consequence, the IETF has taken a very strong stand on security as is stated in RFC3365. Note
that the “Danvers Doctrine” mentioned just below is from the 32nd IETF meeting in Danvers, Massa-
chusetts in 1995. From RFC3365 we have the following quote:
“The overwhelming consensus was that the IETF should standardize on the use of the best
security available, regardless of national policies. This consensus is often referred to as the
“Danvers Doctrine.” Over time we have extended the interpretation of the Danvers Doctrine to
imply that all IETF protocols should operate securely. How can one argue against this?”
Since this time, engineers, researchers, standards bodies’ members along with company policy makers
have paid more and more attention to security issues and during same time there has been an explosion
of cryptographic algorithms and their implementations in both software and hardware, or what we call
“guard-where.” The IETF has demanded that all IETF protocols must include security as part of their
implementations. This across protocol requirement impacts all of us when we use the web. This does
not imply that the today’s Internet is completely secure. It is not. Protocols and their implementations
have bugs, for example, SSL.v3 and TLS were severely compromised in February of 2003 [REF BAD
ORACLE AND CHINESE REMAINDER THEOREM]; Microsoft’s operating systems are attacked
daily by virus putting individual users’ systems and privacy at risk, etcetera. This is not to say that the
effort so far is in vain. It is not. Rather, it is very difficult to accomplish, and our Internet transactions
are almost always secure when we use HTTPS along with its X509.V3 certificate management on the
Internet. Still, we must mention that clever attackers can spoof this system because of oversights in
browsers and the vulnerability of some Internet sites [REF]. The following innocuous attack was
described to us by Vipul Gupta of Sun Labs:
“This note describes an attack that lets a malicious individual “steal” the authentication token
belonging to a legitimate user when the latter uses a public browser (such as those at airports
and conference floors). Once the authentication token has been stolen, the attacker can mas-
querade as the legitimate user. The attack does not require any changes to the underlying hard-
ware, operating system or browser software on the public web terminal. That is, it is not your
typical “trojan horse” attack. Rather, it exploits the fact that browsers used on public terminals
do not provide any mechanism to separate the trust beliefs of different users. More specifically,
if A tells the browser to trust a certificate X signed by some entity Y (where Y is not one of the
root CAs built into the browser), that certificate is automatically (and silently) trusted when
presented by an HTTPS server visited by a subsequent user B. The attack works as follows:

5-2
First, the attacker uses the public web terminal to access his own HTTPS-enabled web site,
which contains a self-signed bogus root certificate for www.bigbank.com (say). Most browsers
will simply ask if that certificate should be trusted for SSL use. The attacker presses OK. This
causes the browser to cache the bogus certificate and treat it as a trusted certificate for server-
side authentication. [Some browsers require that the certificate signer be added to its listed of
trusted CAs before the certificate is trusted. The attacker can accomplish this easily by generat-
ing a self-signed certificate and setting up a URL which presents it with the MIME type appli-
cation/x-x509-ca-cert].
Next, the attacker waits for an unsuspecting user to attempt logging-in to his account at
www.bigbank.com. When this happens, the attacker redirects the user’s traffic to his own
server. This can be accomplished in a variety of ways. One possible approach is to monitor the
user’s DNS request for resolving www.bigbank.com to an IP address and send the IP address
of the intruder’s server in response. DNS, by default, is unauthenticated and secure DNS is not
widely deployed. Known weaknesses in ARP and ICMP may also be used to accomplish traf-
fic redirection. However, the simplest way is to reconfigure the browser’s built-in proxying
mechanism so that all HTTPS traffic for www.bigbank.com is directed to the attacker’s own
server. This redirection works even if the legitimate user types in an IP address rather than a
hostname for www.bigbank.com. Exploiting the HTTPS proxy mechanism simplifies the
attack a lot by eliminating timing issues that could be tricky, e.g. in order to spoof DNS or ARP
replies, the attacker needs to time them carefully. The attacker no longer needs any information
about routers/DNS servers used by the host running the browser.
Since the client browser thinks it is talking to www.bigbank.com,and the SSL session uses the
bogus certificate for www.bigbank.com (issued by the attacker) which the server has already
been configured to trust, SSL’s server-side authentication mechanism is effectively bypassed.
Once the intruder has inserted himself in the communication path, he can simultaneously start
a new login session to the real server. The login screen is passed onto the victim who will sup-
ply the appropriate authentication token. The response is carried over an SSL protected chan-
nel to the attacker who can use to captured token to masquerade as the legitimate user. At this
point, the legitimate user can be simply disconnected, by giving some innocuous error mes-
sage, such as server too busy, etc. It is just as easy for the attacker to continue relaying infor-
mation back and forth between the user and the real server to lull the user into thinking that
everything is still ok.
Note that stealing a user’s identity is significantly worse than merely monitoring keystrokes or
saving what gets downloaded into a public terminal, since the intruder can more flexibly
choose what he/she wishes to see. The severity of this attack is compounded by the fact that
most banks use fixed passwords rather than one-time-passwords for user authentication so the
attacker can continue to masquerade as the legitimate user until the user changes his/her pass-
word. It is quite easy to configure existing browsers as described and quite difficult to observe
this attack. As a precautionary measure, users should delete any previously cached certificates
at the browser before attempting to initiate an HTTPS connection. However, if the attacker
installs his self-signed certificate in the browser’s list of trusted root CA certificates, then there
is no practical defense against this attack. Until all public browsers have been “hardened” to
resist such (mis)configuration, the safest bet is to only use a browser one can trust, e.g. a

5-3
browser on one’s trusted personal computer. Keep in mind, however, that a personal computer
that has ever been left unattended could have had its list of root CAs altered.”
Given the above examples it is clear that one must proceed with caution when doing secure transac-
tions on the Internet, and in particular, it is unwise to use public kiosks unless one is very browser
savvy. Now, how do we proceed to secure the P2P Internet which is by its very nature more difficult to
control? Since the Internet security techniques are the best standards we have for the moment, and the
underlying cryptographic algorithms are secure, we can adapt these techniques and protocols to the
P2P Overlay Network as a starting point. The most important thing for us is to honestly point out along
the way the vulnerabilities of the methods we discuss.
P2P is new, important, and we believe one day will be the dominant way in which the Internet is used.
Therefore, we as authors, software architects, marketeers, engineers, and managers have a responsibil-
ity, and this responsibility is perfectly stated by Lenny Foner of the MIT Media-Labs:
“Those who design systems which handle personal information therefore have a special duty:
They must not design systems which unnecessarily require, induce, persuade, or coerce indi-
viduals into giving up personal privacy in order to avail themselves of the benefits of the sys-
tem being designed.”
To this end, we will first review some of the basic security techniques without burdening the reader
with the details of the security algorithms, systems and standards. These will be well referenced for the
curious. Instead, we will show how to apply these procedures to create the building-blocks for a
secure, P2P Overlay Network.

5.1.1 Principles of Security


There are multiple facets to security, and in order to secure the Internet these must be addressed. For us
a secure Internet goes beyond the principles we are about to discuss. Not only is securing an individ-
ual’s use of the Internet a goal, but also the systems that are used must also be protected from many
kinds of attacks. Surely, denial of service (DoS) attacks fall within the scope of security, since it is the
illegal use of private property. DoS attacks on the Internet are exactly the same as someone jamming
the radio or TV waves. Security guards and the police protect private property, “secure the premises
from unwanted intrusion,” and the same must be done with Internet servers and services. They too
must be “secured” from unwanted intrusion. So, not only will we address the principles of security and
their application to the Internet and thus P2P Overlay Network security, but also we will discuss,
among other things, DoS attacks and what one might be able to do to prevent or minimize them.
Although a totally security-perfect-design is nearly an impossible mission, it is not a day dream to
achieve a high degree of protection. The designers must carefully consider the following basic princi-
ples of security in the software they intend to implement since software that is initially designed to be
secure is much easier to strengthen or debug than software to which security must retrofitted:
1. Confidentiality - Data that is either locally stored, in transit between two devices, or
remotely stored can kept private. For example, eaves dropping on conversations is impossi-
ble.

5-4
2. Authentication - The person with whom you are communicating, or the web-site, or device
to which you are connected is provably who or what it claims to be. Spoofing is impossible.
3. Integrity - Data that is either locally stored, in transit between two devices, or remotely
stored cannot be modified without this modification being detected. One cannot order ten-
nis balls from a web-site and receive wine.
4. Non-repudiation - One cannot later take back a transaction that has been previously com-
pleted. “No, I did not bet on the fifth game of the world series with you!” when in fact I did.
Data is made confidential by using encryption. There are many well known symmetric cryptographic
algorithms in use today, for example: RC4, 3DES, AES, and Camellia[REFS]. One begins with a
shared secret or key, and plaintext: The plaintext passes through one of the above algorithms using the
key to encrypt. The resulting ciphertext is then symmetrically decrypted using the same key and algo-
rithm resulting in the plaintext.

Shared secret key Shared secret key

Plaintext Symmetric Cipher Ciphertext Symmetric Cipher Plaintext

Figure 5-1. Symmetric Encryption / Decryption


Authentication is familiar to most of us since we have all logged-in to systems, i. e., authenticated our-
selves with a secret password. The intent of the password is to assign to the user specific privileges
given this proof of the users’ identity. Similarly, those among us who use passwords understand their
vulnerability to attack, the dictionary attack for example, and thus they need to be sufficiently long and
contain non-alphabetic characters to prevent them from being easily guessed. Some enterprises
demand that their employees’ passwords have a lifetime, and must be changed when this lifetime
expires. There are much stronger, cryptographically based identities that can be used and we discuss
these a little later.
Data integrity is accomplished by hashing the data. A hash algorithm is a one-way function applied to
a data source which results in fixed number of bytes, for example, SHA-1 always hashes the source to
20 bytes. Good hash algorithms have the additional property that they never hash different source data
to the same result, and thus, are said to be collision free. A minor modification of the source usually
causes a large change in the “hash value.” The typical Internet hash algorithms are SHA-1 and MD5
[REF]. Bruce Schneier describes a hash as, “Breaking a plate is a good example of a one way function.
It is easy to smash a plate into 1000 tiny pieces. However, it is not easy to put all of those tiny pieces
back together into a plate.”[REF] If Sam has a file, and if Joan claims she also has the same file, then
Joan’s proof of possession is to send the correct hash value to Sam.
Finally, non-repudiation is accomplished by the means of digital signatures. Here, for the most part,
public key algorithms like RSA and DSA [REFS] are used. Sam makes an Internet bet with Joan on a

5-5
tennis match. The loser will buy the winner dinner at Bistro Elan, an excellent Palo Alto restaurant. To
sign the bet, the time, date of the match, and players’ names are encrypted by Bob with his private key
and sent to Joan. When the match is done, Joan realizing that she has won, decrypts the digital bet with
Bob’s public key, and presents the bet to Sam. Sam cannot deny having made and lost the bet.
Integrity needs to be strengthened if the data is vulnerable to attack. For example, Joan can have a file
on her system along with it’s hash value. Someone may modify the file, recompute the hash, and this
data tampering may go unnoticed if the perpetrator can replace Joan’s hash value with his own. How
does one get around this? Message Authentication Codes, or MAC’s are often used. Here, Joan will
encrypt the hash value with a symmetric algorithm and a secret key. Now, it is impossible to replace
the hash value without knowing the secret key. A MAC is a simple algorithm using a symmetric cipher
and a one way hash.
How do we secure data on the Internet? The IETF has a set of ever evolving protocols which use com-
binations and variations of the above techniques to insure Internet security. Remember, the Internet
protocols are concerned with securing the transport of data, that is to say, the data that is in transit
between two devices, and the implementations of these protocols must include the ability to authenti-
cate both endpoints, and guarantee both the data’s integrity and confidentiality. There are many such
protocols, and the most commonly used are HTTPs along with X509.v3, the Hypertext Transfer Proto-
col using SSL.v3 or TLS, and the Internet Public Key Infrastructure Management. Any of us who has
done a financial transaction on the Internet or looked at private data on a web-site will use HTTPs and
X509.v3. It is important to note that neither SSL.v3, nor TLS, nor X509.v3 are technically bound in
any way HTTP. These can be used to secure many Internet protocols. A few examples are the Internet
Message Access Protocol (IMAP), the Simple Mail Transfer Protocol (SMTP), and the Lightweight
Directory Access Protocol (LDAP). We discuss many of these issues and ideas in the next section on
Internet Security Standards to lay the foundation for the building blocks for securing the P2P Overlay
Network.

5.1.2 Internet Security Standards


The Internet Security Standards have as a fundamental goal to put in place protocols that all systems
can use to interoperate. The IETF does not invent cryptographic algorithms. Rather it uses those that
have become accepted by the community of cryptographers as being the state-of-the-art in that field to
create secure protocols for the Internet. If two systems are to communicate on the Internet, then proto-
cols that define standards for the format by which cryptographic data can be exchanged are required.
To explain this let’s look at an X509.V3 certificate whose format is later shown in figure 5-2. Each
field in such a certificate is encoded using the Abstract Syntax Notation One (ASN.1) and the Distin-
guished Encoding Rules (DER) [X.208(see rfc2459 for reference)]. For the signature fields
PKCS#1[RFC2313] is additionally used. These are complicated, but nonetheless, well defined and
widely deployed protocols. And, happily, there exists a great deal of open source code that implements
X509.V3 certificate creation and parsing [Bouncy Castle... etc] which makes a security engineer’s job

5-6
a lot easier. Let’s pop up the stack a little bit and see how these certificates are used. Then we will
explain some of the underlying cryptographic algorithms that are also employed.
Central to everything that is done with certificates in what we describe are the ideas of identity and
trust. Suppose two entities, Joan and Sam, wish to communicate securely with one another on the
Internet. A necessary step is authentication, and authentication requires a unique identity. There are
cases where mutual authentication is required and others where only one party must be authenticated.
For many Internet purchases only the server or selling party is authenticated. In secure chat rooms, or
banking transactions both parties are authenticated. X509.V3 Certificates can be used as identities to
authenticate parties. If Sam is a bank, and Joan wants to authenticate Sam, then Joan requests Sam’s
X509.V3 certificate. Having Sam’s certificate is not sufficient since this is a public document. What is
necessary is a way to prove two things: First the contents of the certificate are valid, and second, Sam
indeed sent the certificate to Joan on the network.
Traditionally, validating the contents of Sam’s certificate is done using a trusted third party. That is to
say, a third party grants Sam a certificate and signs the certificate. When Joan receives the certificate,
she has a way to verify the signature of the trusted third party and thus the contents therein. To accom-
plish this scenario we need a way for the trusted third party to sign the certificate and for Joan to verify
this signature. Here a public-private cryptographic key pair [REF] along with a digital signature algo-
rithm are employed.
What is magic about public key cryptography? When a public-private key pair is generated it has the
wonderful property that if the public key is used to encrypt, then the private key can be used to decrypt,
and vice-versa. Therefore, if Joan possesses a public key and some information encrypted by the corre-
sponding private key, then Joan can decrypt information that was encrypted. In a digital signature algo-
rithm, a one way hash is made of the certificate, and this is signed or encrypted by a private key. To
verify the signature the public key must be used. Also, public keys are public information. RSA and
Diffie-Hellman public and private key pairs are computed using very large prime numbers and expo-
nentiation. These algorithms work because in order to break them one would have to factor products of
very large prime numbers which is computationally impossible for sufficiently large prime numbers.
For example, 1024 bit RSA is cryptographically strong because it uses two pseudo-prime numbers
whose product is 1024 bits. The underlying details are explained in [REF - Applied Cryptography].
To keep things simple, let’s assume first that Joan’s possesses what is called a root certificate that has
been issued by the trusted third party and which also has the same trusted third party as the subject.
This means two things: The root certificate contains the trusted third party’s, or subject’s public key,
and Joan trusts the issuer that in this case happens to be identical to the subject. We say she has a cer-
tificate that has been self-signed by the trusted third party. Second, that Sam’s certificate has been
signed by the trusted third party with its private key. Thus, upon receiving Sam’s certificate, Joan
extract the issuer field, notes that she has the issuer’s root certificate, extracts the public key from this
latter certificate. Then using the digital algorithm which is described in Sam’s certificate, verifies the
signature of Sam’s certificate. If the signature is indeed the signature of the trusted third party, then the

5-7
contents of Sam’s certificate are valid. As mentioned above, this is not sufficient to authenticate that
the sender, the party at the other end of the wire, is Sam. How is this done?
Sam’s authenticity can be proved if Sam has the private key of the public-private key pair. This is
Sam’s secret much like a password is a secret. If Sam gives away his private key, or his system is
hacked and the private key is stolen, then all bets are off. There are ways to revoke Sam’s certificate if
this is the case, and Sam is aware of the intrusion. This is identical to someone stealing a credit card
but a little more dangerous since credit card theft is almost always immediately reported, while a
hacked system may go unnoticed for a long time. A Certificate Revocation List (CRL) is used by the
software, and of course, as soon as Sam becomes aware of the intrusion he is issued another certificate,
and the old certificate ID is added to a CRL.
How does Sam prove he has the private key? This depends on which protocol you are using but the
idea is always the same: Joan generates a random sequence of bytes and sends them to Sam. Sam will
encrypt the random bytes with his private key and send them back to Joan. Joan will decrypt ciphertext
using Sam’s public key from his verified X509.V3 certificate. If this is the random sequence of bytes
she sent, then this is proof that Sam possesses the private key. But, authentication is not enough if a
connection secured in the sense of the “security principles” is desired to send private data. This brings
us to TLS. TLS uses X509.V3 certificates for both public key exchange, and authentication. What
extra precautions does TLS take to assure the communication between Joan and Sam is secure, that it
cannot be attacked?
We are not going to cover all of the details. This is not treatise on TLS. Rather we discuss just enough
to understand the careful thought that is used in constructing secure algorithms, and why it is unwise to
invent a new protocol without a standards body review by security experts besides oneself. Even best
experts cannot anticipate all possible cryptographic attacks on their protocols as witnessed in the dis-
cussion in section 5.1 just above. For those who are interested in all of the details of the workings of
SSL and TLS, we refer you to the excellent book by Eric Rescorla on this subject, “SSL and TLS:
Designing and Building Secure Systems.”
The TLS protocols has two phases: The opening handshake, and the secure exchange of data. The
handshake can run in several modes, and we assume that both the certificate signature and the certifi-
cate’s validity period are verified.
Joan, the client will initiate the TLS handshake by sending a client hello message to Sam, the server.
This message contains among many other things, the TLS version Joan is using, a session identifier, a
random number generated by Joan, and a list of the cipher suites that Joan supports. A typical cipher
suite is a combination of algorithms. For example: 1024 bit RSA public key, RC4 symmetric encryp-
tion, and 160 bit SHA-1 as the one way hash.
In response to the client hello, the server sends a server hello response. This response includes Joan’s
TLS version, a server generated random number, and a selected cipher suite from the client supplied
list. In our example, the server follows its hello message with its X509.V3 certificate.

5-8
Next, the client computes a premaster secret, encrypts the premaster secret with the server’s public
key, and sends this to the server. Both the client and server then use the exchanged random numbers to
generate the master secret. Note that for the server to successfully compute the master secret it must
have the private key. This is critical, and permits both the client and server to verify they have both cal-
culated the same master secret and exchanged the same handshake information. If the handshake suc-
cessfully completes, then the server is authenticated since it will fail if the server does not possess the
private key. A slight variation on this is that the server can also request the client’s certificate and in
this case both parties are authenticated during the TLS handshake.
Once the handshake is completed, TLS data records can be exchanged. Here the application data is
among other things, sequenced, encrypted and MAC’d. Wow! And this discussion has ignored 99% of
the details of the TLS protocol.
While RFC 2459 defines a standard format that specifies how to issue a digital certificate to an individ-
ual or organization, there is also a standard protocol for a requesting a certificate from what is called a
Certificate Authority (CA). This protocol is PKCS#10[REF] which defines the format of a Certificate
Signing Request that is sent to a CA. Assume an individual X wants to apply for a certificate from a
CA. First, a public-private key pair is generated. Then, as with X509.V3 certificates, PKCS#10
requires ASN.1 DER encoding. Here, in the latter format, the subject distinguished name, the signature
algorithm, and the public key are signed with the private key, and sent to the CA. It is the CA’s job to
validate the signature with the included public key, and check X’s identity. The identity check has mul-
tiple levels specifying different classes of certificates from casual personal use to government use.
After the identity check, the CA signs the certificate to prevent modification.
Figure 5-2 shows the common elements of a X509 V3 certificate. These are the version of X509 used
to generate the certificate, the issuer’s distinguished name, the period of validity specified by a start
date and a final date, the subject’s distinguished name, the public key, the signature algorithm, and the
signature itself.

5-9
Format An Example
Version (v3) Version: 3
Serial number IssuerDN: O=www.jxta.org, L=SF, C=US, CN=CA
Signature algorithm id OU=open source
Issuer Start Date: Thu Aug 21 12:48:46 PDT 2003
Validity period Final Date: Wed Aug 21 12:48:46 PDT 2013
Subject SubjectDN: O=www.jxta.org, L=SF, C=US, CN=fish
Subject public key info OU=Jxta Platform
Issuer unique identifier Public Key: RSA Public Key
optional

Subject unique identifier Signature Algorithm: SH1WithRSAEncyption


Extentions
Signature: signature
Signed by Issuer

Figure 5-2. X509 V3 Certificate


On the Internet there are many recognized certificate authorities among which we find Verisign,
Entrust.net and GlobalSign are three examples. P2P can be more flexible with its choice of CA’s and
we discuss this in the next section.
As mentioned above, the certificate validation process requires the CA’s root certificate, and in this
manner, the CA is the common trusted third party. If the certificates are organized as a hierarchical tree
structure, as shown as Figure 5-3, locating the root certificate is done by “climbing” up and down the
tree. Here, if Joan needs to verify Sam’s certificate that is issued by CA4, Joan trusts the root certificate
- CA0 in the tree. We have a chain of certificates where:
CA1 issues CA4 and Sam is the Subject of CA4; CA0 issues CA1; CA0 issues CA2; and
CA2 issues CA6 and Joan is the Subject of CA6.
Given the above chain of signatures, CA0 is the root certificate that permits Joan to traverse the tree to
verify Sam’s certificate, CA4.

5-10
CA0

CA1 CA2

CA3 CA4 CA5 CA6

Sam Joan
Figure 5-3. Hierarchical Certificate Tree
The above elements and processes are the bases for Public Key Infrastructures (PKI). These PKI’s are
centralized in the sense that CA’s are administrated, centralized points of access for both acquiring, and
revoking X509.V3 certificates. PKI’s can be stored in an Enterprise’s corporate LDAP directory.
For most Internet TLS or SSL.V3 sessions, the PKI is usually quite simple. In our example, if Joan is
using Sam’s server, then Joan will have CA1’s root certificate on her system, and Joan trusts this root
certificate. The latter certificate actually arrives embedded in her web browser’s binary and there are
no explanations. CA’s pay those who create the Web browser applications to place their root certifi-
cates in the binary. If Joan is the average Internet user, then Joan really does not trust CA1. In fact, she
is neither aware of CA1’s existence nor the fact that Sam’s certificate is going to arrive to her system to
be verified using CA1’s root certificate. The Internet is really run by blind trust because how the secu-
rity operates is totally hidden from users. When Joan gets into her car, she knows there are air bags,
and that in the case of an accident, they inflate. We need analogies. If Joan gives her credit card num-
ber to Sam.com to purchase Children’ books, she doesn’t even know what the little “lock” means or
when HTTP suddenly changes to HTTPs, what that means. This is something we must address when
we open up systems to Internet commerce using, for example, the standard Web browser or P2P. Secu-
rity now lives in the Engineers’ domain, and we must begin to express the vulnerabilities of users’ sys-
tems in terms users understand so they can make intelligent choices about what and what not to trust.
A positive side effect of the viruses that have been rampant on the Internet, attacking and literally
destroying users Windows’ systems, is that users now have a heightened awareness of the capabilities
of malicious individuals on the Internet. We must exploit this, rather than trying to pretend that the
next patch is the final solution. What a joke! We say to a user that the system being used for Internet e-
commerce is vulnerable to identity theft if a “hacker” gets one’s credit card number. This will make
them nervous until you mention that the probability of breaking into a secure conversation using the
Internet standards is about the same the probability of an asteroid hitting their house in the next 10 sec-
onds! On the other hand, noting the kiosk attack in the introduction to this chapter and that DNS spoof-
ing is possible, that is, Joan does a DNS lookup of www.childrensbooks.com and receives the IP
address of Joe evil’s system, begins a secure TLS session with Joe evil’s system, Joe has a certificate
signed by one of the well know CA’s, or maybe Joe has a self-signed certificate and sends his root cer-

5-11
tificate to Joan with that little notification popping up on Joan’s system, saying do you trust “World
Wide Secure’s” root certificate. She always says yes because she has no idea of what this means. Joan
should always click the NO button when this happens. And there we are, Joe Evil get’s Joan’s credit
card number and at the same time opens an HTTPs connection to www.childrensbooks.com and
bridges this transaction. Joan is unaware of Joe Evil until she receives her next month’s credit card bill
with a purchase of one case of Domaine Romanee Conti wine! Joe Evil loves great bourgognes and
they are an expensive habit. OK. Enough said.
The fact that Internet is vulnerable and hiding it from users in order to keep the economic machinery
running is an error. It is easier to explain the problems of eaves dropping, and identity theft to users,
and to explain what they should look for whenever they make an Internet purchase. Joe Evil will have
trouble hiding his URL from the browser unless the browser has a bug. Joan should always look at the
final URL through which the transaction is being made. ISP’s have a particular responsibility to pre-
vent DNS spoofing on their networks. They must continually look for such activity. This is no different
that the highway patrol looking for drunk or reckless drivers. Then note that when anyone buys some-
thing with their credit card in most stores, or restaurants, a store clerk, a waiter, etc., has a copy of their
credit card number and in the United States this number can be used to make an illegal purchase since
in general PIN numbers are not required like they are in France. This is incredibly insecure. Joan lives
with that insecurity, and can also live with similar problems on the Internet.
Back to P2P and PKI’s. On P2P networks there may be many variations on the PKI theme that are less
centralized and may require a social sense of trust. These variations are discussed in the next section.

5.2.Reputation Based Trust in P2P Systems


It is easy enough for each peer node to be its own certificate authority, create its own root and service
certificates, and to distribute the root certificate out-of-band or in some cases in-band. Then continuing
with security toolbox analogy, different sockets for different scenarios, one then uses transport layer
security to insure two way authentication and privacy. Then again, one cannot help think about Philip
Zimmermann, PGP, and “Webs-of-Trust.” This is surely another socket that can be used by small com-
munities of peers to assure that the public keys that they distribute can be trusted with some degree of
certainty based on the reputation of the signers.
If we imagine small groups of peers on a purely ad-hoc P2P network, for example, a family, then either
mom or dad might be the certificate authority, place their root certificate on each family member’s sys-
tem by infra-red, eyeball-to-eyeball communication, and yes, if a certificate is signed by the CA, you
trust it or else. One more socket for our toolbox.
Finally, without actually using a recognized CA, one can apply even more torque to tighten the secu-
rity on a P2P network. Select one or more well protected and trusted systems, and give to them certifi-
cate-granting authority. These systems are unlike standard CA’s in the sense that they are peers in the

5-12
P2P Network, albeit, special peers. Each peer using these CA’s boots with the appropriate root certifi-
cates, and acquires a signed certificate from one of the CA’s using a Certificate Signing Request. Fur-
thermore, to acquire a certificate the peer must be authorized perhaps by using an LDAP directory with
a recognized, protected password. Here, the CA can also use a secure connection to a corporate LDAP
service to authorize requesting peers. In the end, each of the above scenarios, each socket in our myth-
ical toolbox, is a not so mythical.

5.2.1. Trust as a Mirror into Real World Relationships


While the hierarchical, structured PKI builds trust relationships between the users and CA’s, as
described in the previous section, these trust relationships can also be formed using the model of Webs-
of-Trust, for example, Pretty Good Privacy (PGP) [REF]. In the PGP world, assume that Joan wants to
verify Sam’s public key and she doesn’t have direct contact with Sam. What Joan can do is to look for
the person that has signed Sam’s key. For example, if Mary has signed Sam’s key, then Mary trusts
Sam. And Joan finds that she has signed Mary’s key before, then the trust relationship can be estab-
lished between Joan and Sam through Mary, as shown as below:
Joan <--------- Mary <---------- Sam
There may be other paths from Joan to Sam, and the paths can be joined or contain loops. This flexible
and decentralized key management is the general concept for the “web-of-trust”. Furthermore, to
quantify the trust relationships, trust has been given levels or values for calculation and evaluation pur-
poses. Some trust levels can be from not trusted, marginal trust to complete trust. There are also other
approaches involving more sophisticated schemes, such as heuristic trust calculation using probabilis-
tic laws [REF Germano Caronni’s paper], and trust without secrets, for example, secret keys, using a
P2P voting scheme [LOCKSS].
When a decentralized trust model is implemented on a P2P topology, trust between peers begins to
mirror those real-world relationships with which users are familiar, and permits software engineers to
craft interfaces to the underlying trust model that are more understandable for those users who are non-
technical. Trust also becomes a social contract with social implications for the participants. One can
have friends, enemies, jealousy (why does Joan trust Sam and not trust Carl?), cliques and all those
human qualities that make our world such an interesting and unpredictable place. Each such peer will
develop a reputation among its peers, which can be the basis for P2P trust relationships.
In some trust or reputation models, such as those used in Free Haven and Publius projects [REFS], the
degree of trust is calculated with parameters such as performance, honesty, reliability, etc., of a given
peer. If a peer cheats at playing cards, for example, the peer might be deprived of its ability to authen-
ticate and join another card game. But for the group of people interested in cooking, the above mea-
surement is too biased towards personal risk, and not content, and will be of little use. Hence, for the
cooking group, the trust should be biased towards data relevance, or the quality of recipes. We don’t
necessarily have a personal relationship with the chefs in the restaurants we like. It’s the food or data
that counts. For us trust has multiple components or factors, and we should also look at a factor of trust
which is based on the connected community’s interests, or content relevance. In order to collect the

5-13
information on a CC’s “interests,” the trust should be evaluated both on the data and the peer, as
described as in Poblano [REF].
Poblano is a mechanism for computing a content provider’s reputation given the goals or interests of
the connected community to which the provider belongs. Thus, if Sam joins Joan’s cooking connected
community and quickly begins to offer content about tennis matches, his reputation will abruptly
decline, and his popularity or the frequency that his content is accessed will bottom out. Sam may as
well resign from the connected community because he will be ignored. Thus, in Poblano Sam’s reputa-
tion is based first on the content he provides, second on his personal reputation much like in PGP. If
Judy does not know Sam and Judy knows Joan, and Joan recommends Sam, then if Judy trusts Joan’s
recommendations, she will trust Sam, albeit possibly minimally, and access Sam’s content. Judy can of
course make her own judgements, and may in fact change her opinion of Joan’s recommendations
given Sam’s content. A third Poblano factor is the risk one takes in accessing Sam’s content. Here, risk
means the reliability of Sam’s system: Uptime, bandwidth, does Sam have content that contains
viruses, etc.
Poblano introduces the idea of cooperation threshold based on a peer’s Poblano reputation. This
threshold measures how cooperative a peerNode in a connected community is given the peer’s reputa-
tion, importance with respect to the user, risk and overall value of the peer’s content. It is important to
note that the user’s input is a significant aspect of this threshold. Finally, Poblano defines a trust spec-
trum realizing, as we point out in the introduction to this section, that different strengths of trust are
appropriate for different situations. A trust spectrum thus permits the use of situational trust which is
how real world trust operates.
We conclude noting, that in the end P2P Networks will really serve the user, and as such the noble goal
is to create a model for trust to which users, real ordinary users, can relate. This must then reflect trust
as it is experienced in their real-world model, and trust based on reputation points us in this direction.
There is current research being done on this topic in the P2P community. If one thinks about trust as a
spectrum, then how does one engineer solutions in this context? And, in fact, is a purely ad-hoc, repu-
tation based trust system as trustworthy as a highly centralized, authority based system? These are the
questions we address in next section.

5.2.2. A Reputation Based Trust Spectrum


In an interesting paper, “Identity Crisis: Anonymity vs. Reputation in P2P Systems,” [REF-Sergio et
al], the authors take it upon themselves to compare reputation based systems where one’s identity is
ad-hoc, to centralized reputation based systems where one’s identity is tied to one’s real-world identity.
In the former, one’s reputation is determined by one’s peers, and in the later one’s reputation is tied to
registration on a centralized authority where sometimes very personal information must be revealed to
measure an individual’s trustworthiness. The authors conclude that both of these reputation based sys-
tems work, and are preferable to reputation-less systems. And, along with us, agree that in reality there
will be a trust spectrum: “Though we have concentrated on two distinct identity models, many practi-
cal solutions fall in a spectrum between them.”

5-14
Let’s assume that at the very core of building trust relationships is the ability to create, and distribute
public keys given a peer-generated, private-public key pair. This is in the spirit of PGP. As mentioned
before, in the traditional Internet security paradigm, certificate creation is done using certificate
authorities (CA) whose signature appended to a certificate guarantees the certificate’s content for any
recipient that has access to the CA’s public key. In most cases this latter public key resides in a root cer-
tificate on the recipient’s system. In a P2P network we cannot require every participating peer to
acquire, i. e., pay for, a CA signed certificate in order to implement, for example, P2P TLS. In fact, we
want P2P, zero-dollar-cost certificates. Users of the system should not be required to pay to have a PKI
and the security associated with it.
P2P zero-dollar-cost certs will initially mean exchanging self-signed certificates, or certificates signed
or cosigned by a trusted 3rd parties much like it is done with PGP. At the same time we do not want to
prohibit the very strong security that is currently used in the Internet. So, we are proposing a trust spec-
trum that has official Internet, CA signed certificates near one endpoint, and self-signed certificates
near the other. Both cases require the distribution of root certificates to verify the issued certificates. It
is the issuer and method of distribution of the root certificates that determine at what point on the trust
spectrum a particular method falls. The tendency has been to believe that the ad-hoc distribution of
root-certificates is less trustworthy than the formal, Internet CA system. We think this assumption is
incorrect.
If in fact, this distribution is in-band and ad-hoc, then yes, the system is attackable with difficulty. On
the other hand, there are out-of-band methods of distribution that are extremely secure: Infrared, eye
ball-to-eye ball; Mom’s peer node as a CA Mediator in a family based connected community that has
both certificate signing authority, an open source LDAP server such as can be acquired from
openLDAP.org, and all members of the CC having been given passwords and LDAP schemes by
Mom. In a similar fashion, Mom beams her root cert to each family member from her Palm Pilot, and
family members then acquire their own certificates from Mom’s Mediator over TLS using PKCS#10
along with their password for authentication. Here, we have formed a very powerful, and secure con-
nected community. In the above scenario Mom need not be aware of the technical underpinnings of the
process she is enabling. Rather, Mom is securing her family’s P2P household, adding locks, preventing
eaves dropping, inoculating against viruses, and enabling alarm systems. Putting security under the
covers with these kind of metaphors create real expectations with respect to the possible break-ins.
As we will see below, there are more points on the spectrum, each with its own degree of risk. This lat-
ter risk may be alleviated in each instance with out-of-band distribution or authentication using Crypto
Based Identifiers and the “cocktail protocol” which is discussed a little later on in this chapter. At the
centralized end of our spectrum we include full Internet, CA’s.
At what spectrum point of trust a connected community chooses to communicate is up to the partici-
pants in that community. We have no power over what function a peer performs in the network. Any
peer can join a connected community and offer its services. This includes recognized CA’s. The con-
nected community members will assign a level of trust to that peer, and to one another. In the following
figure we represent a trust spectrum based on X509.V3 certificates as identities. At the left we find ad-

5-15
hoc, possibly anonymous identities and at the right more centralized identities. It is important to note
that as we move from left to right all applications below and to the left of a certificate issuer are appro-
priate for that issuer’s signed certificates:

Self-signed Co-signed CC Members Mediator Centralized


as CA as CA CA
PGP-like
Chat Chat - IPR Financial
Discussions Games Transactions
Figure 5-4. The Trust Spectrum
One might question the value of self-signed certificates. After all, users of these certificates may be left
open to the imposter-in-the-middle attack. If Sam receives an in-band copy Joan’s self-signed root cer-
tificate, Sam has no way to guarantee that in fact he received the root certificate from Joan, and con-
versely, for Joan. The intruder, Jeanne, can be in the middle of a conversation seeing everything in
clear text, and having given her “faked” self-signed, root certificates to both Sam and Joan, pretending
to be both of them. Since Jeanne possesses both Sam’s and Joan’s root public keys, her presence is
undetectable. Let’s assume that Joan is retrieving content from Sam, and Sam is thus playing the role
of a server and Joan a client:

Joan <---- TLS Handshake ----> Jeanne <---- TLS Handshake ----> Sam
1) Joan has Sam’s root certificate with his public/key replaced with Jeanne’s public key and
signed by Jeannes’ private key.
2) During the handshake Jeanne receives Sam’s server certificate, and forges a new one
inserting a public key from a public/private key pair she has generated, and signing it with
her root private key.
3) Jeanne sends this to Joan instead of Sam’s real server certificate and the handshake
between Jeanne and Joan can successfully complete.
4) Jeanne also completes a handshake with Sam using Sam’s real public key from his server
certificate.
5) The handshakes complete and the transfer of data begins. Jeanne decrypts data sent in
either direction and re-encrypts similarly with the key material provided for each side of the
two serialized TLS sessions.

5-16
While it does take a great deal of effort to steal Joan’s and Sam’s peer identities, it can be done using
advertised, public information and information acquired as the imposter-in-the-middle. Such an
intruder can fully participate in a connected community using this stolen role. This is a risk that some
people may be willing to take for chat rooms, and is clearly not as big of a risk as blindly giving one’s
credit card to a waiter in a restaurant. The former can be completely avoided by out-of-band transfer of
root certificates, or in-band transfer of root certificates accompanied by CBID’s and the cocktail party
protocol.
Thus, we argue that for a certain class of applications, this behavior will be perfectly acceptable as
long as the above threats are clearly understood by the users. Again, for example, a family can form a
connected community, and do secure instant messaging among themselves. The underlying messages
will be private, secured with TLS using, for example, 1024 bit RSA, 128 bit RC4, and SHA-1. The
family may not worry that an imposter might try and watch their conversations. This is a cost/risk deci-
sion whose risk is extremely small, and it would be unusual for a hacker to dedicate the time and effort
required to get in the middle of the family’s network connections. One might have to break into an
ISP’s network in general and this is tough too.
If the imposter-in-the-middle attack is an unacceptable risk, and P2P zero-dollar-cost certificates are
desired, we then can move to a more secure spectrum point by exchanging certificates in person using
infrared, or floppy disks, as we’ve already mentioned. And certainly, in a family sized connected com-
munity, this is achievable and very secure.
We can increase security with cosigned root certificates which are more difficult to forge since if Joan
requires that all certificates she uses be cosigned by Mary, has Mary’s public key on her system, and
initiates a private communication with Sam. Mark is the imposter in the middle, and forges Sam’s cer-
tificate that is cosigned by Mary, Mark will also have had to forge Mary’s certificate that is resident on
Joan’s system.
Still, if this is not secure enough, then connected community members can delegate certificate signa-
ture authority to selected members of a community. This is similar to applying the PGP “Web-of-
Trust” to P2P. Traditionally, if Joan wants to acquire a signed certificate from Bob who is a CA, she
generates her public and private key pair. Next she sends the public key, algorithm parameters and per-
sonal identification to Bob all signed by her private key. Bob can then verify the public/private key
pair. But, this may be a replay attack1, and Bob needs evidence that the system on the other end pos-
sesses the private key. (*) The latter can be accomplished by Bob generating a 20 byte random number
as a challenge. He sends it the other end of the connection. The other end encrypts this random number
in the private key and sends it back to Bob for verification. If the verification is successful, then the
other end owns the private key but we still need proof it is Joan. At (*) Bob could have phoned Joan,
and ask, “What is your favorite color? Please encrypt that in your private key making sure the typing is
all in UPPERCASE and send it to me.”

1. Someone might be snooping the network, save the session, and then replay it to Bob pretending to be Joan unless non-
replayable evidence of private key possession is also sent to Bob.

5-17
Here’s another possibility. Suppose the software is from open source, for example, www.jxta.org. The
connected community creator, Alan, could have down loaded the sources, generated jxta.jar and addi-
tionally added his peer group’s rendezvous’ X509.V3 root cert to this jar. The latter rendezvous acts as
a CA for the peer group. Then using his MAC G4 Alan would generate several CD’s each containing
the jxta.jar, and deliver them to the peer group members by registered mail. Each member then installs
the software. While the software is being installed, Alan creates LDAP accounts for each peer group
member, and assigns a default password. This password is communicated to the peer group members
by telephone. Each member then connects to the rendezvous with TLS using the root cert in the binary
to verify the rendezvous certificate, and thus, changes its assigned default password. The rendezvous
in this case proxies LDAP requests for the peer group members. Next they send a CSR using a TLS
connection to the rendezvous, and include their password as part of the CSR. The password then guar-
antees that each person is whom they say they are unless a password is broached. But, hey, this is as
good as real world CA security.
Again, with Bob as the connected community’s CA, a similar procedure to the above can be followed.
Now, once private key ownership is verified, Bob can issue a signed certificate to Joan because Bob
has a way to verify Joan’s identity, e. g., her password. To verify that Bob indeed signed the certificate
Joan must have Bob’s public key. This reduces to the root cert distribution problem. It can be embed-
ded in the binary as above, or otherwise use out-of-band verification like the cocktail-party protocol.
All of this makes the imposter-in-the-middle attack extremely difficult to impossible barring torture or
large payments of cash to extract secrets from CC members. There are those who claim that as long as
secrets are involved there can never be 100% security. Among these claimees you will find the authors
of this book.

5.3 More Building Blocks for P2P Security


In the previous section we’ve pointed out how one can use reputation based trust to build a secure P2P
Overlay Network. Yes, we use expressions like “pretty good privacy” when the reputation model
involves trust in a social context. With PGP certainly there can be cheaters, and they can be extremely
difficult to detect. Again, it depends on the goal. PGP was originally applied to securing electronic
mail. If one places a public key on a electronic bulletin board, in an LDAP directory, or just emails it to
a mailing list, what does this imply? The public key is used to encrypt some key material that is used to
encrypt, MAC and sign an accompanying email message. The recipient of the email message owns the
private key and can decrypt and verify the email unless the sender had a fraudulent public key that in
fact does not belong to the recipient of the email. In that case, the recipient will decrypt garbage. It’s a
form of spam. There is a caveat here: Suppose Erik generates a fraudulent public key under the name
of Ralph, and makes it available. Furthermore, suppose Sandy is Ralph’s girlfriend, and Erik is very
jealous of this relationship. If Sandy uses Ralph’s fraudulent public key to send email to Ralph, then
Erik can decrypt the private email that Sandy sent to Ralph if he can get access to the message. PGP
attacks the problem of fraudulent keys by using a trust based reputation system, and having co-signed

5-18
public keys. In the above example, Sandy would have required Cathy’s signature on Ralph’s public
key, and Sandy would also have a signed copy of Cathy’s public key on her system. Thus, Erik would
have had to bribe Cathy to sign his public key, and Cathy would never do that. This model works great
for small groups of people who know one another, and have the implied social trust network. PGP
achieves its goal of “Pretty Good Privacy.” The underlying cryptography is state-of-the-art and
uncrackable as far as we know. Thus, for some small connected communities PGP-like “Webs-of-
Trust” are sufficient. Everything comes down to how secure you want to be always realizing that as we
move to systems that are 99.99999999999% secure, the real weakness is the secrets used to secure the
system, and so, we all learn to live with some sense of insecurity.
In this section we point out several things that are necessary to move towards the goal of
99.99999999999% P2P security. They are well known tools, and when possible we point out open
source code that can be used to implement what is discussed. In general, up front, we can make the fol-
lowing recommendations:
1. OpenSSL security software: C sources for SSL/TLS, and other security protocols. See
http://www.openssl.org
2. pureTLS: SSL and TLS JAVA software. See http://www.rtfm.com/puretls
3. BouncyCastle: JAVA cryptographic libraries. See http://www.bouncycastle.org/

Using the above source code one can build state-of-the-art security software, and implement all we dis-
cuss in what follows.

5.3.1 Securing Locally Stored Data


Let’s consider attacks on systems. There are those who will try to modify other’s data, steal this data,
or eaves drop on files looking for information that might be used in a malicious manner. Thus, security
begins at home, and one must secure locally stored data against such attacks. In any secure system one
should require that all data has integrity, i. e., an intruder’s modifications will always be detected. All
such files will require MAC’s. There is some data that must be private to fulfil the integrity checks.
That is to say, MAC generation requires a secret key to encrypt the hashed data. One might keep this
secret in his or her mind, and be queried every time a MAC must be generated. Clearly, this is imprac-
tical. Usually what happens is the user enters a password when a system is started, or the user types
some long phrase, and from either of these a passphrase is generated. Suppose we use MD5 to hash the
passphrase thus yielding a 16 byte key. From information theory, in English a 16 byte key requires 20
words or about 100 bytes of input to generate this key [APPLIED CRYTPO]. Now, once the key is
generated, it must be safely stored somewhere to be used by the system. One can use the password to
encrypt/decrypt the passphrase in a secure way. The password must be secret, and sufficiently compli-
cated to prevent dictionary attacks. When the password is entered at system start time, the passphrase
is decrypted, the password discarded and the passphrase is left some where in memory to be used for
integrity checks and to secure other data mentioned below. We know of no way to run a secure system
without some secret in the memory of the running code. It is like a grain of sand hidden on a private
beach. There will be intruders sneaking onto the beach on sunny days to take advantage of the warm

5-19
water and sand. It is tough to find a grain of sand, but it can be done. This is why passwords should be
changed on a frequent basis. The intruder needs to take a lot of the beach with him, and look through
the sand carefully. That is to say, he needs to get a copy of the memory of the running system. Thus,
one must write software that will not permit intruders to get into the system, force it to crash, and get a
snapshot of the memory. Using JAVA is a good idea for this very reason.
So, now we have key of sufficient length to permit us to MAC data and thus check for modification
whenever this data is accessed. It is not difficult to write a data store that MAC’s data on write, and
verifies the MAC’s on read. One simply puts a software envelope around writes and reads and MAC’s
are appropriately done here. A good example is to begin with JXTA’s index store:
http://platform.jxta.org/source/browse/platform/binding/java/impl/src/net/jxta/impl/xindice/core/
Then make envelopes around the writing and reading of indexed records, and do the MAC operations.
For some data, MAC’s are not enough. The data must also be private. Examples that must be private
are passwords, passphrases, and private keys. Both software and users will have other sensitive data to
which privacy can be applied. Again, in the spirit of Leonard Foner, it is better to put these security
precautions in the software first rather than retrospectively.
This means private data must be encrypted before it is MAC’d. With any of the above open source
code, one will also get crypto libraries that support most modern ciphers. Examples are: 3DES, RC4,
AES, Camellia. etc. It is again straight forward to add encrypt/decrypt to the above software envelopes
to insure both the privacy and integrity of your local data.
Any implementation of our P2P protocols should MAC all P2P Overlay Network system documents.
Thus, the code will also assure that either documents originally created on a peerNode or those
acquired from other peerNodes and then locally stored have not be modified after they were written for
the first time. But, how do we assure that the documents we receive are in fact from the correct source?
This is discussed in the next section.

5.3.2 Authenticating Document Sources


To assure that we receive documents from the intended source requires that we answer the following
two questions:
1. How can we be sure we connect to the intended source?
2. How can we be sure that the documents we receive are from this source? That is to say,
there is no “man-in-the-middle” attack underway.
Let us assume for the time being that two communicating peers have root X509.V3 certificates on their
peerNodes that permit them to verify signed documents. What is the risk we are trying to avoid? The
risk is forged virtualPort documents. Why? Applications use virtualPort documents to initiate commu-
nication between two peerNodes. Recall that systems documents must be received from the peerNodes
that create them. If we can guarantee that, then the succeeding steps taken to connect to a peerNode
given its virtualPort document can also be secured.

5-20
Now, how are these documents found? A query is done using either the virtualPort Name or the virtu-
alPort UUID. The former is not guaranteed to be unique and the latter is, and surely most queries will
be done by “name” rather than “UUID.” If a peerNode knows the UUID before hand, and there is no
duplication problem, then while we can guarantee the UUID lookup responds with the unique docu-
ment, we still cannot guarantee that is was not forged. If uniqueness is not guaranteed, then we also
need a mechanism to guarantee that the (virtualPort names, UUID) pairs in a connected community are
unique. This is another issue we discuss just below. Here, we are concerned only with assuring we are
connected to the peerNode that created the virtualPort document. A simple solution is to use TLS or
SSL.V3 connections, i. e., secure channels as mentioned in chapter 3.3.2.4.1. As previously stated, one
should not invent new protocols when there are freely available implementations of secure protocols
given the stamp of approval of both deployment and continuous review by experts in the field. So, how
can TLS solve the authenticated connection problem? Recall from section 5.1.2, that if peer1 is con-
necting to peer2 with TLS, then peer2 sends its X509.V3 certificate to peer1. If we require that this
certificate contains as part of the subject distinguished name, say in the common name field, a unique
virtual port suffix, then we have the following:
SubjectDN: O=www.aw.org, L=New York City, C=US, CN=Unique virtualPort suffix, OU=Tech
The “unique” virtualPort suffix is then appended to the virtualPort name in all virtualPort documents
generated by the peerNode. Then, if the handshake succeeds and certificate verification is enabled,
peer2 must have the private key associated with the public key in the X509.V3 certificate for this to
have occurred. Therefor, peer1 can extract the CN field from the received and verified certificate and
compare the virtualPort suffix in the common name to the suffix of the virtualPort name in the virtu-
alPort document. If they match, then indeed, peer2 did create the virtualPort document and is at the
other end of the connection since peer2 has carefully guarded its local data, and consequently, its pri-
vate key. Note that any peerNode can send the virtualPort document but the generator of peer2’s virtu-
alPort document has exclusive possession of the private key. This is the proof that the connection is
between peer1 and the creator of peer2’s virtualPort document as long as root certificate distribution is
secure.

5-21
Peer2’s Certificate
....
CN=peer2 virtualPort suffix
....

peer1 TLS Handshaking peer2

from network match?

Peer2’s VirtualPort Document


....
vportname = MobileAgent + suffix
....

Figure 5-5. X509.V3 Certificate Authenticating VirtualPort Name


So, how do we resolve the problem of duplication. Peer3 may have queried for Peer2’s virtualPort doc-
ument, and submitted a Certificate Signing Request to the “CA” of the CC with Peer2’s SubjectDN
common name information. How does one prevent this forgery? Note that duplication permits spoof-
ing another’s identity as well as the “man-in-the-middle” attack. The latter is difficult to achieve since
the “man-in-the-middle,” say peer3, would have had to be between peer1 and peer2 before peer2
boots, and have assured that its duplicated virtualPort document is published instead of peer2’s. If
mediator’s refuse to keep duplicate hash entries, then spoofing is a matter of luck if peer3 is not in the
middle. In any case this points out that the weakness in the system is the CC “CA” granting a certifi-
cate to both peer2 and peer3 with the same “unique” virtualPort suffix.
First, this can be circumvented by using “Webs-of-Trust,” thus requiring peer2’s certificate to be
signed by one or more trusted peerNodes and peer1 would have these trusted “root” public keys on its
system. Thus when peer1 receives peer2’s certificate with the “unique” virtualPort suffix, peer1 notes
that Mary Ann has co-signed this certificate, and is 100% sure it is the only such certificate in exist-
ence.
Second, having a single trusted CA can solve the duplication problem. Each peerNode must have a
unique identity that can be authenticated by this CA before the certificate is granted. The CA authenti-
cates the peerNode, and generates a unique virtualPort suffix, placing it in the signed certificate.
Finally, out-of-band, or in-band distribution of root certificates, with out-of-band verification of the
source of the root certificate can be used with self-signed certificates to accomplish the same thing. For
example, peer2 creates a key pair, and a root certificate which includes the public key and is signed by
the private key. Peer2 gives this root certificate to peer1 using infra-red, eye ball to eye ball verifica-

5-22
tion. Later peer2 generates another certificate to be used with TLS that includes the unique virtualPort
suffix, and signs it with it’s root certificate’s private key. This works fine.
For in-band distribution of root certificates the procedure is somewhat different. To explain this
method we reintroduce Crypto Based Identifiers (CBID). Recall from chapter 3, section 3.2.2.2, the
discussion of the generation of a CBID. It’s a secure hash of the public key, or X509.V3 certificate
containing the public key.

5.3.3 CBID’s and the Sceau Protocol


For root certificates that are not co-signed we can generate the CBID and use “Secure Cocktail Effect
Authentication” (Sceau) [Montenegro, Bailly]. Sceau is a more “user-friendly” form of the PGP
imprint of a public key. The latter requires one to mumble hexadecimal digits acquired from one’s pub-
lic key for out-of-band verification. Sceau translates the CBID into a unique sentence, for example,
“Apples are silver.” Thus, when peer1 receives peer2’s root certificate, the CBID is generated, and the
Sceau algorithm is applied. Peer1, then phones peer2 and asks for its sentence. Upon receiving the cor-
rect response, peer1 marks peer2’s root certificate as trusted, and it can be securely used with protocols
like TLS and S/MIME.
The Sceau protocol fits very well with social human behavior and communication. People meet at
cocktail parties, exchange information, “network” as we say, and use this information in their day-to-
day lives. This kind of protocol is effective because human beings need not learn the technical side of
security. Rather, just exchange sentences. A well written user interface can exploit this to help integrate
P2P Security into most individuals’ normal life styles. John might be exchanging messages with Viv-
ian, and Vivian will see something like, “Call John and ask him his cocktail party sentence.” If John
replies appropriately, then Vivian can proceed. Otherwise the software will refuse to communicate
with John. Yes, Vivian must recognize John’s voice, and certainly one might record John’s voice, and
redirect Vivian’s phone call to John to the bad guy who will then replay in John’s voice a recorded
message saying, “April fool’s day is fishy,” John’s sentence. Two comments here: This security is not
for government secrets, and surely, Vivian can use the cocktail party syndrome to have a spontaneous
discussion with John. This is very good security, and solves the problem of authentication and thus
eliminates the possibility of “man-in-the-middle” attacks by the duplication of virtualPort documents.
How else can CBID’s be used on the P2P Overlay network? One can, if desired, glue in place a TLS
veneer on top of the overlay network. All communications can be private, even those with mediators if,
for example, the mediators’s public keys (root certificates) are embedded in the binaries of our imple-
mentation. Is this necessary? Might we simply want to assure that some conversations are private
while the exchange of all system documents has source integrity, and authentication.

5.3.4 Secure Document Publication


The P2P Overlay Network is a source of two kinds of documents: The first is for system documents,
and the second is for application content. System documents have two requirements. First, they must

5-23
originate from the peerNode that created them, and second, they must not be forged. Application con-
tent can have any source, and may or may not be limited to a single source. Also, it may be permissible
to modify application content depending on its value, and copy right issues.
So, we have two considerations: What can we do to guarantee that a received document is as it was
when originally created, be that just prior to being sent or in the distant past? And, how can we verify
that the sender is the originator of the document? The latter question was addressed in section 5.3.2.
We use TLS as the underlying protocol, and we have a trusted suffix in the VirtualPort name. As
pointed out, we resolve the second issue using either TLS, or something similar. What is important is
to require protocol whose connection initiation is at least equivalent to the TLS handshake. We suggest
one stick with TLS since it is very well tested, and the code is publically available on open source sites.
Our solution to the first issue of verifying the integrity of documents as originated uses digital signa-
tures along with either a web-of-trust or a CA to guarantee public key ownership. We also add an extra
twist of CBID’s to minimize the data that must be placed in a signed document. For the sake of discus-
sion let CBID0 be derived from the X509.V3 certificate of peer0, and D0 be a document that has the
following properties:
1. D0 contains CBID0,
2. The content is signed using peer0’s private key,
3. The signature algorithm information, and signature are appended to the document much
like is done with a signed certificate [RFC SIGNATURE].
When peer1 receives this document CBID0 is used to index a local public key store, thus retrieving the
X509.V3 certificate of which it is a SHA1 hash. The signature of the document is verified using the
algorithms specified in the document. If all of this succeeds, then the document is guaranteed to have
been signed by the owner of the private key associated with the public key, and thus the CBID. The
strength of this procedure is based on the trust placed in the X509.V3 certificate as belonging to the
originator of the document.

Peer_0 Peer_1
D0 - document
lookup Peer_0’s Public Key
CBID0 Peer_2’s Public Key
...
Content verify Peer_n’s Public Kay
Signature
Local Public Key Store

Figure 5-6. Signed Documents and CBID’s


Thus, if all system documents are signed in this way, and there is either a CA or Web-of-Trust based
PKI, then the above mechanisms can be used to verify who created a document, and that the immediate

5-24
source of the document is this creator. In this way we satisfy the constraints imposed on system docu-
ments which always contain a <source-exclusive> field, see chapter 4, section 4.2.3.
The above can be extended to content in general. Here the recipient software of digital music can com-
pute whether or not the music is source-exclusive and if has been received from the exclusive source
included in the meta-data associated with the music. Well written software will refuse to play the
music and immediately destroy the data. Cheaters’ software will play the music. Note that Digital
rights management is covered in section 5.4.6.
There is a completely orthogonal approach to sharing data and guaranteeing its integrity. This
approach assumes that there are peers with good behavior, that they can find one another, and measure
the degree of “goodness” one another. This approach is an “integrity by reputation” system. There are
many variations of this theme [LOCKSS, BOBBY AND ROB]. In the LOCKSS system there are sev-
eral underlying assumptions of which we list three:
1. Lots of copies keep stuff safe,
2. No long term secrets,
3. Assume a strong adversary.

Users of LOCKSS are national libraries and institutions that wish to have long term digital archives of
journals. To accomplish this, given all of the possible sources digital data storage failures from hackers
to earthquakes to fires to hardware, a P2P system is used that bases data integrity on polling of Archi-
val Units (AU). These are polled at a frequency that is greater than the assumed rate of loss. AU’s are
distributed among several peers known to the polling peer. The polling peer creates a one way hash of
the AU, as do the several peers, and the polling peer compares the results with its own. If there is unan-
imous agreement, this is called a landslide win, and polling is complete. If the poll results in over-
whelming disagreement with the poller’s AU hash, then we have a landslide loss, the AU is corrected
by updating the data from one of the peers in the poll, and another evaluation is done. If an inconclu-
sive poll is taken, then an alarm is raised and human intervention is required. For more details see
[LOCKSS].
This example shows a completely different approach that is being thoroughly researched at Stanford
University with more than 50 nation wide beta-test sites. This approach is 100% in the spirit of P2P
and eliminates all of the complexity required for certificates, etc. It requires lots of redundancy and a
way to measure trust and develop reputations based on this trust. We feel it is really the future of P2P
security for the masses. LOCKSS anticipates attacks from all possible sources and resolves the prob-
lem of data integrity with sufficient redundancy and voluntaryism from enough good guys. It is mod-
eled on social interaction and that is something all of us understand and deal with on a day-to-day
basis.
All of the above mechanisms can be directly applied to any P2P network, and in particular the P2P
Overlay network we have described in chapters 3 and 4. Given that a reasonably secure P2P Network
can be implemented, we still have the problem of denial-of-service (DoS) attacks, particularly on our
mediators, and to some extent on peerNodes. These attacks and solutions are presented in the next sec-
tion.

5-25
5.3.5 Denial of Service Attacks in P2P Overlay Networks
Mediators are critical to keeping our P2P Overlay network running smoothly. And, they are, as all
Internet services, subject to many kinds of DoS attacks: ACP Connection resource depletion, a peerN-
ode or misbehaving mediator tries to exhaust the connection resources of a mediator; UMP flooding;
Mediators denying P2P Overlay network access by not correctly following the mediator protocol rules,
an incorrect RoutingInfoResponse is sent to either a mediator or peerNode; Multicast Protocol abuse
by collaborating peerNodes; Abuse of mediator storage, etc.
5.3.5.1 PeerNode Initiated DoS Attacks
Mediators have limited storage resources allocated to peerNode use. As such, this is a target for DoS
attacks. With respect to the abuse of mediator storage, we have already imposed quotas on each peerN-
ode as is discussed in section 4.1.3.2. These quotas place upper bounds on the maximum number of
messages that can be pending in the M-INBOX. Mediators can also impose a quota on the number of
published documents per peerNode. Since each such document has a time-to-live, when the maximum
is reached, then further publication for that particular peerNode is refused until the time-to-live expires
for a currently published document, thus freeing up a publication slot for that peerNode on the media-
tor. Recall, that a mediator keeps copies of all meta-data it replicates onto its mediator-level view, and
along with these copies, the associated TTL. Therefor, it can implement such a publication quota to
prevent this DoS attack. This, in a sense, is one way to minimize SPAM at the mediator level.
The DoS attacks that involve an excess of network requests by clients or Mediators are the class of
problems that are appropriate for client puzzles as a defense [RSA-Juels]. Almost all client DoS
attacks on the Mediator Virtual Ports, see section 4.2.1, will be of this nature. The idea behind these
puzzles is simple. A client or collection of clients is trying to exhaust the resources of a target system.
Ari Juels in his seminal paper on client puzzles uses a restaurant reservation attack as an example.
Here, the bad guy reserves all of the tables in a restaurant by telephone, and then does not show up.
The restaurant owner then decides to challenge all requests for reservations with a puzzle that takes a
least an hour to solve. We are assuming that not everybody is bad here. Thus, the restaurant owner is
given ample time to have good reservations and serve his clients.
Thus, client puzzles do not overburden the system servicing these network requests but rather pass the
burden on to the requesting client. Some suggest the use of signed requests to minimize this problem.
This has three difficulties: It requires a PKI of some sort; it places a computational burden on the sys-
tem servicing the requests, it must verify, for example, RSA 1024 bit signatures; and it also prohibits
anonymity which is desirable in our P2P Overlay network.
What is a typical client puzzle like? We describe Ari Juels’ example:
First peer0 makes a ACP connection request, R, to Mediator M0. The mediator begins with a secret, a
time stamp and the request R. These three elements are combined into a string of bytes of length N,
S(N). A SHA-1 160 bit hash, SHA-1160 (S(N)), is sent to peer0 along with S(M) that is M bytes of
S(N), M < N. In order to connect successfully peer0 must compute SHA-1160 (S(N)) by guessing the N

5-26
- M bytes. Clearly, the difficulty can be varied, and one might require the solution to such a puzzle for
all attempted connection attempts. The same can apply to excessive UMP messages from peer0. The
mediator must keep a UMP-Messages/second count for each of the peerNodes it hosts, and if the max-
imum is exceeded, then further UMP messages from this source must be accompanied by a solution to
a proposed client puzzle. Continued abuse from a peerNode results in a service cutoff by the Mediator.
Again, the assumption is that most peerNodes are good peers, and Mediators must keep well behaved
peer lists. The lists are indexed by peerIdentity, and each entry is a structure with information that can
be used to judge a peerNode’s behavior and trace the source of misbehavior. For tracing one keeps the
following information:
1. Overlay Network source routing information. This cannot be faked if more than one hop.
2. Source IP address. This may change with each attempt to connect. But, the range is limited,
and using source IP addresses that do not result in a real connection on the infrastructure do
not cause excessive resource use. Connection tables do not grow.
3. Trace route information for Internet routers used to reach the Mediator.

We use the above to help to identify the sources of bad behavior. (1) permits the system to find the
hosting mediator; (2) and (3) can be used by the hosting mediator in the same way. If one has the IP
address and the behavior is consistently bad, then one can at least isolate the problem to a NAT box in
the worst case, and cut off all service to peerNodes hosted by that NAT box. If the IP address changes
but the trace route information of the abusive addresses is the same, then one can cut off service to all
of the peerNodes on the offending IP subnet.
To judge bad behavior in this context we use message-window quotas, and keep statistics like:
4. The number of UMP messages received over a given time window, where a maximum
number of messages/time-window has been configured. This is a quota. If it is exceeded,
then a client puzzle is sent to force the messages/time-window constraint. If messages
arrive without solutions or with incorrect solutions, the service is cut off.
5. The identical thing for the number of ACP messages/time-window arriving for a hosted
peerNode’s M-INBOX. In this case, the mediators in the path will drop the messages since
they cannot send a client puzzle. This is a peerNode-to-peerNode open channel. A good
rule of thumb is to drop the messages for (2n x time-window), where n = 1, 2,...,m, and m is
increased for each consecutive time-window abuse. If the time-window is 10 seconds, then
messages that arrive before the time window has expired, are dropped. If the message rate
does not decrease, then 20 seconds, etc.
If a huge amount of abuse begins, then one might refuse all mediator virtual port supplied network ser-
vice requests except for those peerNodes on the good peer list. If complete refusal is inappropriate,
then peerNodes not on the good list will all be slowed down with client puzzles until the perpetrator is
discovered or gives up. Here, each new request for a network service may be responded to with the
task of computing a solution to a puzzle that will take 5 to 10 seconds. In the meantime, the above list
is rebuilt in an attempt to uniquely identify the bad peerNodes that neither solve puzzles nor correctly

5-27
solve them. Again, after getting a converged list, finally cutting off all service to those peerNodes not
on the list.
If there is excessive and continual abuse on a specific P2P Overlay Network, then something Draco-
nian like requiring TLS connections for all overlay network activity can be imposed until the good
peerNode list is refreshed, and then afterwards turn off this requirement and only accept service
requests from those on the well behaved peerNode list. This means that the appropriate PKI must be in
place (CA or Web-of-Trust based), otherwise it cannot be accomplished. How would this be done?
Mediators agree to contact their good-peer list peers and request that they will only accept incoming
TLS connections and similarly, exclusively use TLS for their outgoing connections. They can spread
the word by the means of their connected communities until pure TLS channel connectivity is estab-
lished. PeerNodes that refuse to follow this edict will be refused service.
5.3.5.2 Mediator Initiated DoS Attacks
The next challenge is misbehaving mediators. First of all, mediators are special and under most situa-
tions, if a mediator is misbehaving, it has either been hacked or is having software or hardware prob-
lems. Still, in purely ad-hoc P2P Overlay Networks that are on the public network, mediator
misbehavior, if possible, will definitely occur. How do we handle this one? We suggest a LOCKSS
style reputation based system. There is an assumption again that most mediators are good, and want
the system to succeed. To this end we introduce a simple reputation model.
A mediator has one of three possible reputation values: bad, unknown, and good. All new arrivals are
unknown and will be given a trial period before being fully accepted as “good.” When a mediator is
accepted as “good,” this is a real-time state, which can be demoted to “bad” at any time. A “bad” medi-
ator is ignored by some or all of the “good” and “unknown” mediators until it disconnects from the
site. When the “bad” guy comes back, it is then reassigned the newcomer’s “unknown” reputation. The
“some or all” subset of the mediators is determined by who is willing to vote. At the same time, all
mediators are continuously being tested in a way that is undetectable from normal mediator activity
and are either upgraded from “unknown” to good or “downgraded” from “good” to “bad” in this way.
Given this introduction, let’s now look at the details of this reputation based model.
Just like LOCKSS has Archive Units (AU’s), our system will have Distributed Hash Table Units
(DHTU’s). Any DHT {key, value} element qualifies as a potential DHTU. For example: In section
4.2.3.4 we describe building a DHT element where meta-data is the key, and {VirtualSocket, source} is
the value. Similar DHT elements are associated with routes. What we now discuss applies to the site,
regional and global level mediators. Let’s assume without loss of generality that we are at the site-
level. Recalling that each mediator keeps a copy of the DHT elements it has replicated, i. e., whose
sources are the peerNodes it hosts, recall that we have two polling schemes to validate mediator behav-
ior: The first is for content and route lookups, and the second is for system document lookups. These
schemes are transparent to all but the mediator, a proxy mediator, and a trusted peerNode (of which
there may be many). These schemes are defined in chapter 4, section 4.2.6.

5-28
There are periodic elections during which each mediator is permitted to express its opinion with
respect to which mediator if any has been detected as “bad,” during the previous polling intervals. Just
like in real elections, a election place is selected, and the voting period begins. We permit several hours
for the election to be completed, for example, eight hours. At the end of the election, the votes are
counted. We are voting to decide if, first, there are bad mediators, and second, if so, which is the worst.
The details on this procedure as follows:
1. The initial election place is the mediator with the smallest UUID in the monotonically
increasing list of UUID’s. The successive election place is always the successor of the pre-
vious election place. If an election place is down, then the successor rule is applied. The
MedToMed communication protocol’s voting command is used.
2. Each mediator participating in the vote, voting is a matter of free choice, selects the worst
mediator from its own polling, or none if its polling results show that all polled mediators
are “good” guys. The mediator signs its vote with its private key. Security considerations
are discussed momentarily depending on your reading speed.
3. When the “poll closes,” then every voting mediator, using the MedToMed Protocol, chapter
4, section 4.2.4.5, retrieves all of the signed votes from the election place, and counts the
votes locally, verifying each signed ballot. We use a simple majority rule. The electoral col-
lege is not appropriate here. If a single mediator, Mbad, has more than 50% of the mediators
voting against it, then Mbad wins the election.
4. Every voter creates a sub-token ring list consisting of those mediators that voted, and will
remove Mbad from this list if Mbad voted.
It is possible that the mediator chosen as the election place is bad and will try to bias the election
results in one way or another. The election place has the unique knowledge of the number of voters,
and can for example, decide to ignore some of the votes to either say there is not a bad mediator or
there is a bad mediator. This is easy with the simple majority rule. Let’s imagine that there are 10 medi-
ators, 7 voted, and a “bad” mediator was detected. This means that at least 4 mediators detected this
“bad” guy. Let’s assume that indeed 4 mediators detected the “bad guy.” Then, given the 7 votes, to
cheat and to say there is not a “bad” guy, means that for each query for the ballots, at most 6 can be
returned, and at most 3 of those must not have detected the problematic mediator. Still, again assuming
most mediators are good, and that the average voter turnout is kept over a period of time. A simple rule
to throw out the election and immediately try another election place can be used. Here, we take the
average turnout minus twice the standard deviation, and if the number of voters is less than the result-
ing number, a new election is held at the next election place. Clearly, the previous election place is a
good candidate for “winning” the new election, i. e., it will be newly the elected bad buy. Given our
assumption about good mediators, even a non-detectable falsified election will cause little harm.
Let’s assume that Mbad has been elected. There are two options the mediators can follow:
1. If human intervention is possible, then that is the best path. It may be that the mediator has
hardware problems, or has been hacked and simply needs to be taken offline, and either
repaired or rebooted.

5-29
2. The voting mediators initiate all activity on the locally formed new sub-token-ring, and
refuse to use Mbad. Most or all participants in the sub-token-ring will delete Mbad. This
implies possible content loss for the sub-token-ring mediators. But the complement of the
content accessed via the bad-guy is still there and reliable. In this case the mediator in this
group with the minimum mediator UUID starts a keep-alive token only including the vot-
ers. This coerces these mediators to rebuild the DHT using exclusively the site-level medi-
ators included in the keep-alive token. If Mbad is not in this new group, and Mbad did not
vote, then to make sure that Mbad doesn’t know about this new sub-token ring, the original
token ring is maintained but ignored by the sub-token ring’s members. This has another
interesting side-effect. Those mediators not in the sub-token ring and not the bad guy, i. e.,
they did not vote, will have no idea that the new sub-token ring has been formed. The sub-
token ring members will continue to be recipients of replicated DHT elements, and respond
to these requests. At the same time, the sub-token ring continues to poll the entire site. If
the bad-guy becomes a good-guy, then they can reconstruct the original token ring.

UUID= 067
1
0
0
1 083
626 1
0
0
1

4091
0
0
1
1
0
0
1
163

1
0
0
1 11
00
3171
0 sub-token ring
00 192
11

284
Keep-alive token ring

Figure 5-7. Sub-Token Ring for Polling


Such a voting system encourages all good mediators to vote, they have nothing to lose since a single
“bad” guy cannot vote out a good mediator. Of course, if there are only two mediators, then there is no
reason to vote but polling is still important to assure hardware integrity and the possibility that one of
the mediators is “bad” as a result of the polling. Then it is ignored by the other, and rule (1) is applied
if possible.

5-30
Finally, how do we address the signed ballots? If a ballot is signed with a private key, then all media-
tors need copies of one another’s public keys. This again comes down to what we discussed in sections
5.1 and 5.2. One can have a very centralized CA certificate granting PKI or a “Web-of-Trust (WoT),”
reputation based PKI, or variations. It is even possible that a given collection of mediators in a purely
WoT based P2P Overlay Network, that voting can partition the network into multiple, WoT-subnet
P2P Overlay Networks. This comes about when the reputation of the public keys is exclusively WoT
based, and not all WoT’s intersect. Thus, after a vote, where even if each member of each WoT collects
all of the ballots, it can only verify those from the members of its WoT. And, thus, forms its WoT-sub-
net from these trusted members. In this manner, voting also permits mediators to discover which
trusted public keys belong to its WoT. Over time, joins of WoT’s can occur if a new mediator arrives
on the scene with public keys that have a transitive relationship between partitioned WoT-subnets.
When a partition is formed history is forgotten. The new mediator will then enlarge the subnet, keep-
alive token-rings to include both subnets. After a vote, the public key, transitive relationships will
reform the original set of mediators with the additional new member.

1 : bad 1 5 2

5 2 5 2
vote partition

4 3 4 3 4 3
a: Original Ring b: Sub−Token Ring c: WoT−Subnets
1,2,3 share public keys
5,4 share public keys

Figure 5-8. WoT-subnets Partition


The above two options require simulation and real world testing, but given the results from the
LOCKSS work, it is clear that this approach is a viable one. What we want to discuss next is not
exactly a DoS attack but may turn into one. This is how to securely add a new mediator to an existing
mediator level. Since one may have been able to detect Mbad as soon as it joined the site, and the pre-
vention of DoS attacks before they start is perhaps the best counter measure, we propose such a mech-
anism.
We assume that a stable, well behaved mediator view has been achieved at a given level. A new medi-
ator arrives, Mnew, it contacts a neighbor, is added to a validation-keep-alive token ring and begins to
participate. Mnew does not realize that it is only seeing DHT replications from a select subset of the
existing mediators. The idea is that if Mnew is indeed a bad guy, then we will minimize its bad behavior
to a small number of the mediators and their hosted peerNodes. Certainly, a fairness algorithm is

5-31
applied to make sure that the volunteers for the arrival of new mediators is done in a democratic way.
For example, a random number generator can be used to select n mediators from m > n mediators.
Given this random collection of volunteers, V, they begin to use Mnew in the usual fashion. In the mean
time Mnew is being polled by V. After a site-level defined waiting period, if Mnew is well behaved, then
it obtains a “good” reputation. The special keep-alive token ring is terminated and Mnew is added to the
regular keep-alive token ring that has been running in parallel with the now terminated one. For practi-
cal purposes we suggest that only one validation-keep-alive token ring is permitted. Thus, if while val-
idating Mnew, Mnew1 arrives, then Mnew1 is added to the current validation-keep-alive token ring.
Similarly, there is an upper bound placed on the number of validations that can be in progress. When
that upper bound is reached, no more mediator candidates are accepted, and the Mediator Greeting
Command response is “refused.”

0
1
UUID= 067
1
0
0
1 083
626

4091
0
0
1
1
0
0
1
163

join
11
00
00
11
validation-keep-alive token ring
00
11 1
0
317 0 192
1

284
Keep-alive token ring

Figure 5-9. Validation-Keep_Alive Token Ring


It is important to emphasize that the above somewhat ad-hoc, dynamic reputation granting mechanism
may for some only be appropriate for a non-Enterprise, public P2P Overlay Network. Still, it has an
appealing beauty because of its simplicity and independence of secrets with the exception of the signed
ballots, and this too can be WoT based. It is self-adapting in the spirit of pure P2P networking. Enter-
prises on the other hand, will want to tightly control their mediators. In this case, we have adequately
discussed how this can be accomplished in this chapter by imposing a certificate authority security
scheme on top of the P2P Overlay Network.

5-32
5.3.5.3 Securing Mediator Routing Tables
While we can secure the reception of system documents to guarantee the contained information, it is
still possible for peerIdentity route spoofing. Recall from chapter 4, section 4.2.5.3 that each peerNode
upon connecting to its hosting mediator hashes {KEY peerIdentity, mediator peerIdentity}. The P2P
Overlay Network we have described is vulnerable to the same DoS attack that occurs when either acci-
dently or maliciously a host on an IP subnet uses an IP address that is already taken. In such a situation
an interesting race condition arrives where one of two such hosts will receive all of the incoming pack-
ets destined to this IP address. The race condition is based on the router’s first received reply to an
ARP for the MAC address belonging to the IP address. First come, first served, as the expression goes.
This attack is difficult to stop because it does not take a lot of expertise to add a fixed IP address to
most systems.
So, what can happen on our P2P Overlay Network? A peerNode may pass a peerIdentity document
that contains an intentionally duplicated peerIdentity to its mediator as part of the mediator greeting
command. When the above hash is done this creates a forged mediator host binding that will be used in
routing. Now, it is possible that the owner of this peerIdentity is active. Thus, its routing hash is
already in place when the second hash arrives to the same site-level mediator. How this is handled is
implementation dependent. It can be like in the IP world where the “first come, first served” rule
applies in which case the second hash is ignored, and an error response is sent back to the hosting
mediator using the MedToMed protocol. Then the false route is not in place. But, there is a race condi-
tion since the real peerNode may disconnect for a long enough time for its route hash to expire.
To resolve this problem the mediator can generate the peerIdentity that must be used by the peerNode.
This means it must rewrite is peerIdentity document and all virtual port documents must contain this
peerIdentity. Furthermore, the mediator will bind the peerIdentity to the real transport address used by
the peerNode. Finally, this peerIdentity is used as the hash key for the route to the peerNode via this
mediator. Thus, any further ACP or UMP messages received from this peerNode can be validated for
correctness of source peerNode UUID. If the wrong UUID is being used, then the command associated
with this message will be aborted possibility without an error response. It may be that the first viola-
tion is acknowledged, and that there will be no responses to subsequent ones.
If the peerNode persists to use a bad peerIdentity in its virtual port documents, then any peerNode that
requests content from the misbehaving peerNode will find a contradiction between the peerIdentity in
the received meta-data and the peerIdentity in the received virtualPort document and should abort the
transaction that is in progress. This is covered in the publish/query protocols in chapter 4, sections
4.2.3.3 and 4.2.3.4.
Another problem that cannot be easily detected by mediators is how to handle P2P Overlay Network
SPAM. This is an peerNode-to-peerNode problem and must be managed within the context of secured
connected communities. This along with mechanisms for security connected communities as discussed
in the next section.

5-33
5.3.6 Secure Connected Community
Recall that connected communities (CC) are a set of peerNodes that have a common purpose or inter-
est. They can be, for example, families, friends, gambling clubs, lonely hearts clubs, wine collectors,
etc. CC’s can be “walled gardens,” or “secure, walled gardens.” The CC’s are described in CC Docu-
ments and these are published and available in the PubCC. CC’s are described in detail in chapter 3,
section 3.4, and CC Protocols are found in chapter 4, section 4.4. It is a good idea to review that infor-
mation before continuing with this section.
The CC document has a policy field that defines three membership policies: Anonymous, Registered,
and Restricted. The first two permit one to form “walled gardens,” in the sense that anyone can join the
CC, and thus communicate, and exchange content within the CC. The registered membership policy is
for the convenience of the CC creator. It permits the creator to keep track of the membership for per-
sonal reasons. This adds no extra security. The restricted memberships limit membership with CC
defined security that can impose authentication to join, and authorization to participate in the CC’s
activities. We describe the latter mechanism first in the traditional X509.V3, PKI CA mode, then move
on to the WoT based CC.
Let’s assume that a CC’s P2P software arrives with a root certificate in the binary. An example might
be Project Jxta creating jar files for their current release that contain a Jxta community, web-site gener-
ated root certificate. Second, the CC creator, CCc, is granted a X509.v3 “membership granting certifi-
cate” (MGCert) by the Jxta community, and it is signed by the private key associated with the
community’s public key in the distributed root certificate. The MGCert contains the name of the CC in
the subject’s DN as well as the URN for the CC membership granting code. The Jxta community guar-
antees the uniqueness of the subject DN. Thus, CC names and URN’s are unique. The CC creator may
have in fact created the software for distribution, and self-signed the MGCert. Finally, assume the soft-
ware is distributed much like a web-browser.
CCc now creates a CC Document with a restricted policy type, and the URN pointing to CC member-
ship granting code that must be run to join the CC. Furthermore, the document has been signed with
CCc’s private key. Now, peerNode, P0, wants to join the CC. On the public network P0 retrieves the CC
Document, makes a TLS connection to the system that the URN specifies. The MGCert is used as a
server certificate by CCc. Therefore, if the TLS handshake completes we can say the following are
true:
1. The MGCert was granted by the community CA for this CC.
2. The owner of the MGCert possesses the private key of the key pair used to issue the
MGCert. This is the important fact. The private key guarantees the origination of the
MGCert.
Next, P0 compares the CC Name in the CC document to the CC Name in the Subject DN of the
MGCert. If that is correct, then we know we have contacted CCc. The contents of the MGCert are
guaranteed by the CA’s signature. Next, we validate the signature on the CC document using the public
key in the MGCert. If that passes, then its contents are valid, and the restricted policy type can be used.

5-34
So, if the above tests are all passed, then P0 downloads, and executes the code that enforces the mem-
bership policy. Next, maintaining the TLS connection, P0 uses PKCS#10 for a Membership Granting
Request. This latter request may require specific information for the membership to be granted. This
might be a password or passphrase that has been given secretly. We cannot be specific because this is
CC dependent. We can only state that the membership granting code may include such requirements.
The upshot is that CCc grants P0 a X509.V3 certificate that is used as a CC Membership Credential
(CCMC). The CCMC contains P0’s public key, the name of the CC in the issuer’s DN, the issuer being
the subject of the MGCert, and anything else CCc wishes to put in the CCMC’s attributes to for exam-
ple grant access control privileges to P0.
The access control privileges are very important. Here, to prevent misbehavior, degrees of membership
are a good idea. They can be as simple as newbie, basic, and privileged. After a trial period, the CC can
upgrade or downgrade these privileges as necessary. They can have lifetimes and these will be speci-
fied in the CCMC’s start and final date fields. Thus, a CCMC expires, and must be renewed with the
same process. That process will require the old CCMC to be included in the membership upgrade
request. Voting mechanisms similar to what we described above can be used to evaluate the quality of
a CC member’s participation. This again, can be done on a regular basis. Another possibility is that
each time a member becomes active and participates in the CC, if there is a quorum of CC members
active, then software can query for a vote on one or more randomly chosen members, the votes can be
tallied by the CC creator, and the results applied the next time a badly behaving member reapplies for
membership in the CC. There must be a minimum time between votes to prevent DoS voting attacks.
This can be taken as far as creating a CC Revocation List (CCRL) that is queried when an active mem-
ber does something like request content. This request is refused if the requesting member is on the
CCRL. This technique mimics Certificate Revocation Lists.

5-35
CA rootCert

rootCert
CC Creator MGCert

Membership CCMC
Granting Request MGCert
P0
rootCert

P0 TLS Connection CC Creator


MGCert
P0 verifies
MGCert using
the root cert TLS Handshake completes
Membership
Granting Request

CCMC

Figure 5-10. CCMC Granting Process


Thus, armed with the CCMC, and MGCert as a CC root certificate, P0 can contact any other CC mem-
ber and with the CCMC authenticate itself in the CC. Using the privacy that TLS provides with the
two-way handshake authentication, both members possess a CCMC/MGCert pair, the contacted mem-
bers will verify the CCMC with it’s MGCert during the handshake, extract the access privileges and
permit CC membership activities given those privileges.
Next, how might one form WoT based CC’s? The easiest way to create such a CCWoT is to have a sign-
ing party where each member signs others’ WoT, PGP Certs so that everyone has everyone’s public
key signed by someone they trust. This is very loose. Still, it is extremely secure if the out-of-band
signing parties are held. Once the initial CCWoT is formed, then if a new member, Mnew, joins by con-
tacting any member, say Mn, receiving an out-of-band dump of all co-signed keys as well as its public
key signed by Mn, it has sufficient keys to present its key to any member for acceptance, as well as
accept other members connections. Why? If Mnew contacts Mo, then Mo will have Mn’s public key
signed by someone it trusts, and thus, can authenticate Mnew’s public key. And, similarly if any mem-
ber contacts Mnew, that public key can be verified with the keys received from Mn. In all of these cases
a WoT, PGP version of TLS can be used for the authentication. This CC has a single access control. All
members are created equal. But, individual members can have their own access control based on a

5-36
WoT, PGP Cert’s cosigners. If a special signature is present it can grant more or fewer privileges. This
is reputation based access control.
In the above way we can securely, create, join and authenticate CC membership. Here, each member is
granted a CCMC which contains its access privileges and can be used in conjunction with the MGCert
to establish TLS/ACP/ONP connections on the P2P Overlay Network. The next issues in securing a
CC are how to secure content and P2P Overlay Network Multicast Groups. SPAM attacks are of par-
ticular importance. We have no restriction on a peerNode claiming by notification to a mediator to be a
member of a CC, chapter 4, section 4.2.3.1. Afterwards, this same peerNode can publish any docu-
ments in the CC. How this is handled is discussed in the next section.
5.3.6.1 P2P Overlay Network SPAM Attacks
Since publication, and the querying of content are required primary application features in a P2P net-
work, we do need mechanisms to control the content in a CC. At this time if the reader is not familiar
with either the notification mechanisms used between peer nodes and mediators for enabling CC mem-
bership functionality on mediators, or the publish/query protocols, then it is a good idea to briefly
review chapter 4, section 4.2.3.1, and chapter 4, sections 4.2.3.3 and 4.2.3.4 before continuing.
Since we are P2P, we know that publication must be defined in a decentralized way to support purely
ad-hoc, anonymous P2P CC’s. This is the most general approach and is what our protocols specify. On
the other hand, one can still centralize the implementation of these protocols if desired. The latter will
take a performance hit. Why? There are two steps to the procedure:
1. The Mediator must securely access the MGCert for this CC as well as a proxy CCMC. This
can either be managed with a TLS connection to a LDAP directory where the lookup is
bound to the CC UUID, or the mediator can securely acquire the MGCert/CCMC out-of-
band.
2. The notify command will be required to use TLS, and thus, with two way authentication set
(client and server in TLS terminology), both the mediator and the peerNode must be
authenticated for the notification to succeed.
Since secure connections are required between the peerNode and the mediator as well as between the
mediator and the data base containing the MGCert and proxy CCMC, the two TLS handshakes are
computationally expensive. Keeping this in mind, we discuss some possible SPAM attacks.
The first attack is a legitimate CC member publishing SPAM. This kind of attack is the responsibility
of the peerNode members of the CC, and is independent of the site-level mediators. Each participating
CC member must note the reception of SPAM from a CC member and keep a history of these occur-
rences binding the SPAMer’s peerNode UUID to these violations. SPAM identification beyond CC
members receiving content, and recognizing it as SPAM, recognizing its source, and keeping histories
can be extremely complicated given its volume. There is plenty of literature and many companies
attacking this problem [REFS on SPAM BAYSIAN] and these can be applied to CC’s by their mem-
bers. P2P content has an advantage over email SPAM. That is, all content must be published in a CC.
The members are known, and SPAMers can be identified and expelled from a CC. A CC with either

5-37
anonymous or registered policy is more vulnerable because there is no authentication required to pub-
lish content. Still, general content does not just arrive at a peerNode in a CC. Rather it is published as
meta-data, looked up and accessed from its source. This puts an additional burden on the originating
source. It must store this content until it is propagated through lookup and query. Thus, SPAM can be
caught early on, and the originator can be put on the bad guys’ list. The protocols we use, hence, pro-
vide a window for rejecting SPAM without being overwhelmed by it. To this end, we discuss initially
how to boot a detected SPAMer out of a CC.
Just like for the mediators in section 5.3.5, periodic votes can be taken using the CC creator as an elec-
tion place. We have put all of the mechanisms in place for signing the votes with the CCMC’s associ-
ated private key, and in this case, the CC creator can also tally the votes. If a CC member “wins” the
vote, then this member’s membership can be terminated, or a warning can be issued accompanied with
a lowering of this member’s access privileges. In either case there are two ways to take action:
1. The bad member’s CCMC will expire within the CC’s defined expiration time. Then, when
a renewal is requested it is either denied, or granted with lower access privileges. Recall
that the CCMC is required for CC activities, and that without it content can neither be que-
ried from other members, nor accessed from the bad member.
2. After the vote is completed the CC members can request the results and then refuse to
cooperate with the bad member until its CCMC has expired. In that case the bad member
must follow action (1) to renew it’s membership.
The next attacks only apply to CC’s where the membership policy is restricted. Authentication is not
required for the right to publish because of the anonymous and registered CC policies. If a CC has the
restricted membership policy, and authentication is proxied by the mediator as described above, then
these attacks are not possible. Otherwise, assume the membership policy is restricted. Then recall that
the CC document is acquired in the public CC, and that this document contains all of the information
necessary to notify a site-level mediator so that this peerNode can become active in that CC. The only
site-level CC privileges this grants are the right to publication, and the right to lookup content in that
CC. Finally, we suppose in general that nothing is done to prevent peerIdentity theft as is described in
section 5.3.5.3. But, if the mediator grants the peerIdentities as per section 5.3.5.3, then it can examine
publish commands for illegal peerIdentities and reject them. Now let’s describe the possible ways to
hack publication.
Given that a non-CC member has the CC document, the peerIdentity of a true member, and the virtu-
alPort document where the true member has published data, we have two possible attacks:
1. A non-CC member hacks the data of a publication command to publish bogus content.
Recall that the publication command has “{peerIdentity, virtual port UUID} = {virtual-
Socket} of the originating peerNode” as part of its data. The virtual port UUID is in the
Virtual Port document which must be accessed as a member of the CC. Therefore, for the
sake of this discussion, we suppose that this document has been acquired by a former mem-
ber that has been banned from the CC and is seeking revenge. There may be many ways to
steal this information. Next, the fake content is then published. In this case the requests for

5-38
the content will go to the real member, and will be summarily rejected because the content
does not exist.
2. Let’s suppose the malicious peerNode has acquired the CCMC of a legitimate member.
Note that a non-member acquiring the CCMC is not possible if all CC communication is
done under the cover of TLS as we suggest, unless a legitimate member’s system has been
hacked. If the fake peerNode publishes content in this CC using the duplicate virtualSocket
in the publish command data, the duplicate peerIdentity in the UMP header, and again,
there is no peerIdentity security check as discussed in section 5.3.5.3, then with a small
probability there will be a route to this peerNode hashed at the site-level and bound to the
peerIdentity. In this case, CC members will attempt to connect to the fake member to access
the content. This will not work because the fake member will not have the private key
belonging to the public key in the CCMC, and the TLS handshake will fail.
One possible P2P application is P2PEmail. We all are aware of the impact SPAM has on Internet based
email. Any P2PEmail application that is implemented with these protocols can be done so in such a
way as to almost entirely eliminate SPAM from outside sources. One simply requires two way, TLS
authentication in a restricted peerGroup. Therefore, if a peerNode wishes to send email to other peerN-
odes, it must connect to them one at a time using TLS/ACP/ONP. The peerNode has joined the CC,
acquired the necessary credentials, and can be removed if it sends SPAM. This is an ON/OFF decision.
SPAMers are caught as originators of bad email. They are removed from the CC, and in the meantime
their email is dropped in the junk mail bucket. There is no way to just give out your email address.
What you give out is permission to join a “gated community.”
There certainly will be P2P applications that are not 100% content based. Yes, content is exchanged as
part of what makes the application interesting, but not as an end. Typical examples are P2P games, and
IM. We discuss securing these kinds of applications in the next section.
5.3.6.2 Securing Connection Based P2P Applications
What kind of connection based applications are possible? There are the typical person-to-person
games like chess, checkers, go, poker, golf, and adventure. We also have IM and chat rooms that might
involve group multicast. This breaks down to one-to-one connections and group multicast connections,
and sometimes combinations of both. For example, two individuals meeting one another in a chat
room, and then creating a private conversation in the same chat room context. The attacks in all cases
are either DoS, SPAM and attempts to hack the security protocols. In what follows we use a chat room
as a metaphor for these kind of applications.
DoS in these situations is an attempt to overwhelm the P2P Overlay Network with traffic so that the
applications become unusable. Whether these applications use UMP or ACP, these kinds of attacks are
addressed in section 5.3.5.1.
We can deal with SPAM on the other hand by using the techniques mentioned in section 5.3.6.1. The
CC community is a powerful mechanism for controlling misbehavior. If one decides to have chat room
in a CC with either anonymous or registered policies, then the risk of receiving SPAM is not important.

5-39
It’s a “laisser faire” chat room. These have their use and certainly an entertainment value. On the other
hand, if one forms a specific CC with the restricted policy, all that has been said in the previous section
applies. One can quickly identify and remove misbehaving members.
Another issue is privacy. While we’ve adequately discussed how to have a private two way connection
be that for content, conversation or a game, we have not yet shown how to create a private chat room
using group multicast. A primary security requirement is that a channel is secured end-to-end. Here,
we mean that the plaintext is never revealed after it leaves the sender until it arrives at the ultimate des-
tination. There are those who secure connections by permitting a proxy behind a firewall to maintain
two TLS connections, one with the sender, and the other with the recipient, and in this case, the cipher-
text becomes plaintext on the proxy in system buffer for example. Historically, infrastructures that
proxy security have regretted this decision. An example is the Wireless Access Protocol (WAP). Such
approaches are not secure enough and ultimately lead to the demise of the associated protocols.

Sender TLS Proxy TLS Receiver


ciphertext plaintext ciphertext

Figure 5-11. Proxied “Secure” Channels


If a private chat room wants end-to-end privacy and a reliable delivery of every message, then it can be
done with ACP/ONP. This creates O(n2) TLS connections, i. e., each participant needs a connection
with n-1 participants. If n is small, then this is acceptable. What is small in this case is a moving target
since systems continue to get faster and faster, and bandwidth similarly increases. What can be said, is
that there will always be a limit to the use of the “O(n2) TLS connections solution.” It will always run
into a barrier and is not a good idea. A better approach is that a single mediator acts as a chat room hub
and takes the responsibility to cache messages and retransmit them on request while always respecting
end-to-end security. We use the multicast StarNode for this purpose. Then when 100% secure delivery
is a requirement, peerNodes can request missing messages from the StarNode, chat room hub. Thus,
we use the Multicast Mediator Protocol (MMP as described in chapter 4, section 4.2.7) as our delivery
mechanism. How this is securely accomplished is discussed next.
First let’s define the mechanism for securely authenticating members of a chat room. Recall that,
within a CC, a chat room multicast group, M0, is registered by a peerNode, say P0. This registration is
propagated to the site-level, Multicast StarNode for M0. At the application level P0 published a multi-
cast virtualPort document identifying itself as the creator of M0. When a second peerNode looks up
this document, it must first authenticate itself with P0 in the CC using its CCMC and TLS. If P0 is not
a member of the CC, then the authentication will fail, and the process must be restarted. Assume all is
well. During this initial TLS session, a TLS master_secret0 is created and known to both peerNodes.
When the hand shake is finished, each peerNode keeps the master_secret0 while the TLS session is ter-
minated. The master_secret0 is then used to create the key material necessary for the chat room’s pri-

5-40
vate, secure communication. This particular master_key0 is associated with the original TLS session
identifier, and thus we have the pair, {sessionID0, master_key0}. Here, we do a variation of TLS.
Under normal circumstances, two symmetric keys are created for the TLS data records, and record
digests. For example, separate client and server write keys are created from the master_key0. This per-
mits two encryption channels with different symmetric keys. The client’s write key is the server’s read
key and the server’s write key is the client’s read key.
For multicast this does not work. We require a single key. After the multicast key material is created
from the master_key0 the two participating peerNodes use this key material to generate TLS data
records that they multicast. When a third CC member looks up the multicast virtualPort document, and
joins the multicast group, it can contact either one of the two existing members to authenticate itself
into the CC, and receive the master_key0 under it’s initial TLS connection in a TLS Data Record, as
well as a list of the peerIdentities of those peerNodes that have also joined the multicast group. This is
done before closing the TLS connection and after the handshake is completed. It too can create the
necessary key material from the shared master_key0 to multicast TLS data records within this group
using the MMP. Thus, in general, once the secure, CC multicast group has been created, any member
can be contacted for authentication and the {sessionID0, master_key0} information.

TLS handshaking
Peer_1 Peer_2
Master_key Master_key
TLS data record using keys from Master_key
Peer_1 Peer_2

Peer_3 joins:
TLS handshake/authenticate Peer_1
Peer_3
TLS data record with shared Master_key
Peer_3 Peer_1

TLS close
Peer_3 Peer_1

TLS data record multicast Peer_1


Peer_3 Peer_2
Figure 5-12. Shared Secret Key TLS Multicast
The creator thus propagates active members and the {sessionID0, master_key0} pair to authenticated
multicast group members. The creator also publishes multicast group content describing the active
group membership. These mechanisms permit other interested CC members to participate in the multi-

5-41
cast group. The details with respect to the creation of the multicast key material from the shared master
key are in Appendix II.
In the above approach no rekeying is required as each new member joins the chat room, and, as previ-
ously mentioned, chat room behavior can be extended to cover other similar group activities that must
be private. We know that if keys are used over an excessively long period of time, then the sessions are
more vulnerable to crypto-analysis attacks. Thus, such a chat room can decide to rekey; assign an
active peerNode member as the key generator; shut down the chat room to synchronize rekeying; and
begin a new session as we have just described, with a new {sessionIDi, master_keyi} pair. Since there
is possibly session state dependent data in transit from before the chat room was shutdown for rekey-
ing, the old {sessionIDk, master_keyk} pairs can be kept for a short period of time to extract the plain-
text from the previous session.
Clearly, rekeying in this manner is very important if intellectual property rights are being discussed or
for example, the chat room might be a private “gaming room” where money is exchanged. Such exam-
ples abound. Another possible approach is to rekey each time new members join the group. There are
group membership algorithms to do this, and a popular one is Group Diffie-Hellman (GDH) [GDH].
This is computationally expensive and is recommended for small groups only. We give a brief explana-
tion. See the reference for details.
The goal is to generate a Diffie-Hellman (DH) group private key from a collection of public keys. This
private key can be used to exchange a master_secret like the above that can be used for sending TLS-
like data records using the MMP. Recall how basic Diffie-Hellman works. Pick a large prime number
p, and generator g < p such that g is relatively prime to p, that is, 1 is the only common divisor of p and
g. One can pick g as small as possible. It is also best if (p-1)/2 is a prime[SCHNEIER]. Given two prin-
cipals A and B, each generate large prime numbers, a and b, respectively, and g and p are public infor-
mation:
A sends ga mod (p) to B
B sends gb mod (p) to A
A and B each then compute gab mod(p) which is the shared private key.
Given the private key A and B can exchange symmetrically encrypted data with integrity checks using
an algorithms like RC4 and SHA-1. For example, again the master_key material can be used to gener-
ate the secret keys to multicast TLS-like data records. The GDH algorithm works like the following:
Assume for this explanation that we already have three members in our group, A, B, and C
and the ordering is determined by their peerIdentities that are monotonically increasing.
Then the current private key is gabc mod(p). We will leave out the mod(p) in the remainder
of this discussion. Now, D joins the group, and computes a suitable large prime number, d.
For the sake of generality, we can assume that D contacts any one of A, B or C. D will be
given the complete multicast group list. Then D contacts C, and the following occurs:
1. C computes gabcc’/c = gabc’, c’ a new large prime number, and sends this to D.

5-42
2. D contacts A and B sending each of them gabc’, and has locally gabc’d. The latter is the
new private key.
3. A sends gabc’/a to D.
4. B sends gabc’/b to D.
5. C sends gabc’/c to D.
6. D computes the collection of public keys, {g(abc’/a)d, g(abc’/b)d, g(abc’/c)d}, and sends the
first member of the collection to A, the second to B, and the third to C, respectively.
7. A computes g(abc’/a)da = gabc’d, yielding the private key. B, and C do the same thing and
we have rekeyed with the DHG protocol.
Given this shared private key, the group can begin to multicast as described above using TLS Records
after exchanging the master_key material. We note that DH is attackable in the middle since the public
keys are exchanged in the open. The attack works as follows:
A sends ga to B and C is in the middle.
C sends gc to A and to B.
C computes gac and gbc as do A and B respectively.
Thus, A and C, and C and B share private DH keys and all further ciphertext communica-
tion between A and B is revealed in plaintext to C with neither A nor B knowing C is in the
middle. Similarly, C can encrypt the plaintext and send ciphertext to both A and B.
All CC group members exchange CCMC’s and have the MGCert to verify these service certificates.
Therefor, all exchanges of DH public keys can be signed by the sender and the signature can be veri-
fied by the receiver. In this way, the public key information sent in steps 1-6 above can be signed and
as such is not attackable in the middle. In steps 1-6 above 3n connections are required.
We also can do similar rekeying with our TLS mechanism. This is because of the CC TLS authentica-
tion mechanism is already in place. We prefer periodic rekeying because of performance issues and the
minimization of rekeying episodes. The latter can in fact turn into a DoS attack by a misbehaving CC
member.
This concludes our description on securing connected communities. We have been strongly biased in
three directions. First, we permit very open CC’s that are either anonymous or registered. Here authen-
tication is not important, and any document can be published in these spaces. The only possible attack
in these situations is publishing unwanted content. Even in this case, if peerIdentities are granted by
the mediators, see section 5.3.5.3, then at least the bad peerNode can be identified, and it’s content
ignored by the peerNodes accessing it. We also pointed out that a reputation based models can be con-
structed based on content quality and recommendations of peerNodes in a Web-of-Trust to create and
share lists of those peerNodes that publish desirable content.

5-43
Second, when a CC is restricted, we can apply sufficient torque to guarantee authentication; control
publication where if SPAM is published, it will be done by a known CC member, and this member can
be detected and dealt with according to each CC’s policies which may involve LOCKSS like voting;
and we can secure connection based applications that rely on the MMP. We can even take this one step
further to only permit publication by a peerNode that authenticates itself as a CC member to its hosting
mediator as a prerequisite to publish in that CC.
The third secures mediators at each level using polling methods to detect misbehavior, and in these
cases, the mediators being polled cannot differentiate polling from normal mediator activity. Final
decisions are made by distributed voting in a way that extends what is done in LOCKSS. We’ve also
addressed securing mediator, peerNode route bindings.
In the next section we take up some of the issues surrounding digital rights management (DRM). This
is a difficult area to discuss because it is politically and legally loaded. Neither politics nor copyright
law are appropriate for this book. We will make technical suggestions for solving the DRM problem
but just like nuclear energy, or even a lighted match, both can be used for the well being or detriment
of society as a whole.

5.4 Digital Rights Management on P2P Overlay Networks


At the heart of the design of our P2P Overlay Network is the organization of peerNodes into connected
communities. And, these communities can be open and anonymous or closed, gated communities
which require one to pass security checks before entering. The idea is that it is up to the implementors
to use these tools as they see fit. Clearly, neighbors wanting to share family photos or simply chat
together must have this right in a way that is both autonomous and anonymous. By the latter we mean
that the CC to which they belong may be invisible to the outside world as, for example, one can do
with an 802.11a/b/g network with the appropriate encryption and security measures as we discussed in
the previous chapter. This is very easy to do with the current technology and will only become easier as
the technology improves.
On the other hand, one can use the tools we have described for commercial purposes, and make it
extremely difficult to violate copyright laws. A content provider can create its own CC, put in place its
own mediator hierarchy with selected mediators as certificate authorities that will only grant X509.V3
certificates to preregistered customers. They can then have complete control over all open channels
and the data that is transmitted. As long as no one has hacked the content provider’s binaries, there is
also complete control at the sending and receiving endpoints with respect to the transmitted content.

5-44
So, more precisely, what do we do to create a totally secure connected community using the protocols
and security mechanisms we’ve described? Let CP0 be the content provider for our secured connected
community, CCsec. To secure its content CP0 does the following:
1. CP0 creates the implementation of our protocols, and places its X509.V3 root certificate in
the binary it ships.
2. Similarly, CP0 has installed several mediators and granted to each of them a signed,
X509.V3 Mediator Service Certificate (MSCert). As pointed out just below, this permits
the various site-level mediators to use TLS with all of their hosted peerNodes, and same
site-level mediators.
We therefor can have TLS connections between all of CP0’s site-level mediators, as well as between
each such mediator and its hosted peerNodes since each peerNode has CP0’s root certificate. We need
not worry about up-level mediators that are part of the global infrastructure because in their roles as
routers, they are oblivious to the data they route, which will always be secure for CP0’s sub-infrastruc-
ture in the global P2P Overlay organization.

rootCert M_1 M_2 rootCert


MSCert_1 MSCert_2

rootCert M_3 M_4 rootCert


MSCert_3 MSCert_4

rootCert rootCert
P_1 P_2

TLS Connection
Figure 5-13. CP0’s TLS Secured Infrastructure
Next we must secure peerNode-to-peerNode communication and here we use CCsec membership. To
this end a selected few of CP0’s mediators also play the role of peerNode, CCsec creators. Since any
peerNode that is a member of CCsec must be able securely communicate with any other member, each
such peerNode must have all MGCerts that have been granted by CP0 to its mediator/peerNodes, MPi,
i = 1,...,n. Thus, these mediators are aware of one another even across the expanse of the planet, can
communicate with TLS and therefore securely exchange MGCerts. Hence, we can assume the follow-
ing:

5-45
1. Each MPi has the complete list, MGCerts = {MGCertj: j = 1,...,n}. Recall, these are the
community root certificates, and are not public information outside of CCsec.
2. When a peerNode joins CCsec, then it will securely receive both its signed CCMC, and
MGCerts. Again, the peerNode possesses CP0’s root cert and can thus verify the creator’s
MGCert under the TLS handshake.
Consequently, all communication within CP0’s P2P Overlay infrastructure is secure.

MGCert_1 MGCert_1
MGCert_2 MGCert_2
MGCert_3 rootCert M_1 M_2 rootCert MGCert_3
MGCert_4 MSCert_1 MSCert_2 MGCert_4
exchange MGCert
MGCert_1 MGCert_1
MGCert_2 rootCert M_3 M_4 rootCert MGCert_2
MGCert_3 MSCert_3 MSCert_4 MGCert_3
MGCert_4 MGCert_4

rootCert rootCert
MGCert_1 P_1 P_2 MGCert_1
MGCert_2 MGCert_2
MGCert_3 MGCert_3
MGCert_4 MGCert_4
TLS Connection
Figure 5-14. TLS CC Secured Communication
Finally, a user’s system must become a peerNode in CP0’s infrastructure. This requires downloading
the software from a web-site, or purchasing the software, say on a CD. There are security issues that
must be addressed here. If one acquires the software from a web-site, then an account can be set up,
and this is secured by standard Internet protocols. The latter are attackable, and that is not for us to
address. The only warning we have is never assume what you are doing is secure if you are using any
open kiosk for Internet access [KIOSK ATTACK]. If you purchase a CD, then again, the software will
be unusable until it is appropriately registered with the originator, and this registration should be under
TLS with the provided certificates. In both cases, the software should be signed and the signature must
be verified before the software becomes usable. This is not 100% full proof since it is possible to mod-
ify the software’s binary before using it. While modifying a binary is trivial, it can be done in any edi-
tor, modifying it to change its behavior is tough. Here, we are imagining saving the signature, then
changing the code and at the same time when ever the signature is to be verified either locally or
remotely, guaranteeing that the original signature is used in this procedure.

5-46
The user has the software and next joins CCsec. Any one of the MPi is contacted, account verification
may be done after the TLS handshake completes, and the peerNode becomes a member of CCsec. The
peerNode can now access content. Clearly, seeding peerNode’s must be present and have already pub-
lished content. All of the meta-data lookup and access is done as described in this chapter. Also, the
public CC is also private with respect to CP0 and its mediators. As we said, all site-level communica-
tion is authenticated and done with TLS.
Let’s imagine that peerNode, P1, accesses some music in a mpeg file from P2. Remember, this access
is authenticated within CCsec. First, the file is signed by CP0. The software itself arrives with a mpeg
player. Before accessing the mpeg file, P1 is given its next symmetric key by its mediator under TLS.
This symmetric key is bound to the CBID of the CCMC. Before the symmetric key is used, the player
verifies that this peerNode has the private key associated with the public key in the CCMC using a ran-
dom challenge that must be encrypted in the private key. If this challenge fails, the music is immedi-
ately destroyed. Otherwise, this symmetric key can be used to both encrypt/decrypt and play the music.
The music is encrypted as it is being received byte-by-byte. In parallel the plaintext data is hashed.
Finally, with the hash and before playing the music, the mpeg player verifies the signature of the plain-
text mpeg file with the CP0’s public key. If the signature is not valid, then the music is unplayable, and
is discarded. The symmetric key can have a TTL, and be renewed with the mediator after a given num-
ber of plays.
What about payment? We assume the peerNode has an account with CP0. Upon receiving the music,
and verifying the CCMC really belongs to this peerNode, the peerNode contacts its mediator register-
ing the purchase. The mediator has hooks into CP0’s billing system. Finally, the peerNode from which
the music was acquired is given a coupon for a reduction on its next purchase. This is registered in a
similar way into the billing system.
We view the above description as an architectural discussion. We cannot guarantee it is 100% water
tight without an implementation and testing. Still, it does point out the mechanisms we have archi-
tected into our P2P Overlay network to assure the secure exchange of content.
This ends our discussion on DRM and security.

5-47
Chapter 6 Wireless
and P2P

With the arrival of the Wireless Access Protocols (WAP) in 1997-1998 Internet
Web content made its initial debut on mobile phones on GSM networks. In the
beginning this was primarily in Europe, next WAP was deployed in Australia
and Asia, and it finally arrived to several carriers in the United States. These
latter deployments included other bearer networks such as CDPD, and
CDMA1. While WAP opened the doors to wireless access to the Internet, it
also showed the weaknesses that were inherent in the wireless networks of
that epoch. In particular, these networks were of very low bandwidth, had data
loss due to noise, and the network latency was high. 9600 bits per second was
the maximum GSM data access rate, and indeed, the WAP browser content
was minimal, and the mobile phones made it cumbersome to navigate with
their small displays and keyboards. Still, WAP showed us that wireless, mobile
web content access was possible even under these highly constrained condi-
tions. It was exciting and the future promising. WAP’s major problem was that
it was over marketed. It created expectations that the technology could not
deliver and the consumers became disenchanted. P2P must pay attention to
this latter message. At the same time in Japan another approach to getting
web based content to the same devices was being taken. NTT-DoCoMo intro-
duced iMode in February of 1999 and the number of subscribers to this ser-
1. GSM, CDPD, and CDMA are explained later in this chapter.
vice grew geometrically over the next few years. In fact by December of the
same year, iMode had more than 500,000 subscribers. It was clearly the har-
binger of things to come. What was the difference in these services and how
do these differences relate to P2P? What did iMode anticipate that WAP did
not? At the WAP Forum we found the world’s Telcos, handset manufacturers,
and Other Equipment Manufacturers (OEM’s) attempting to divide up the wire-
less marketplace to drive their revenue. Political battles locked WAP into a
TELCO protocol and service paradigm. They refused to look elsewhere, to
accept the work being done in the IETF, the W3C (www Consortium) and Java
Community Process (JCP) standard bodies to accommodate their respective
technologies to wireless networks. Because iMode was not caught up in the
former global industrial struggle, it could take advantage of its market place in
Japan to experiment with mobile phone services and standards driven by their
users’ needs, good market analysis, the rapid evolution of mobile handsets,
and the recent innovations in the IETF, W3C and JCP that were applicable to
wireless networks. As a consequence, NTT-DoCoMo took the technical steps
that changed the way small, wireless devices will be viewed forever. Among
other things, NTT-DoCoMo iMode embraced Java, and made it possible to
download small Java applications from partners’ web sites and to execute
them locally on constrained, mobile devices. Because these applications were
written in Java, they would run across the Java enabled device space. A spe-
cial small footprint Java was created to make this possible. Now, suddenly, the
application space for these devices was open ended. It is this fact along with
the arrival of P2P in 2000, and the launching of the Jxta open source project
on www.jxta.org in April of 2001, that makes it a straightforward engineering
task to introduce P2P into wireless networks. When we also consider the
upsurge of Bluetooth and the wide deployment of 802.11a/b/g hotspots, we
have set the stage for the arrival of P2P across this enormous and growing
device space of mobile phones, PDA’s, laptops and sensors. We can thank
NTT-DoCoMo for having the courage to go it alone and opening our imagina-
tions to many of these possibilities.

6.1 Why P2P in the Wireless Device Space?


Why would one wish to introduce P2P into the wireless world? When we view what is happening to
the Internet from 30,000 feet, we see a single network. My mobile phone can receive both Voice over
IP (VoIP) phone calls as well as text messages, digital radio, and appropriately sized images, screen
savers, and ring tones, that is to say, multi-media content from potentially any system on the Internet.

6-2
What does P2P offer here? A very strong market force is the ability to establish communities of peers,
and to restrict communication to the community members. These communities may arise in an ad-hoc
manner or be well organized, for example: Either Howard Rheingold’s ad-hoc, smart mobs
[www.smartmobs.com] or a family wishing to privately communicate in a device independent way. As
a participant in a smart mob, one can carry all sorts of wireless devices from mobile phones to wrist
watches to bracelets and can discover one’s compatriots with common interests in, say, a shopping
mall. In a family Mom has her hybrid PDA/mobile phone, and can communicate with Dad or the kids
because they are similarly equipped, dad in his car and their children in the local library connected to
the Internet. This goes even further because the various appliances in one’s home, e. g., alarms, air con-
ditioning, refrigerator, VCR, and TV, can participate in the family communication as community mem-
bers. These communities are as diverse as people’s interests and wireless communication is the
enabler. They are at the heart of P2P.
Also note that such communities need not be comprised of human beings and their personal systems.
They can be made up of the devices themselves. BBN Technologies has a Sensor Systems Technolo-
gies Group that is using Jxta and agent based computing to coordinate fields of sensors and
robots[www.bbn.com/sip/ssw.html]. It is a natural fit in the wireless world. Sensors and robots may in
fact be on a battlefield where a single points of failure must be tolerated. Using P2P, if a sensor or robot
is “taken out,” then the connected community regroups and continues to be effective.
What does Java have to do with this? Its “write once, run anywhere,” paradigm has simplified the task
of getting P2P software into different devices. The Java Micro Environment (J2ME) is specifically tar-
geted to very small devices. One need only implement the Java Application Manager (JAM) and its
associated virtual machine, the Kbyte Virtual Machine (KVM), on each device. The JAM then permits
one to download small Java applications to be locally stored and executed. Jxta has done exactly this
with its Jxta Micro Environment (JXME) implementation. It requires a Jxta rendezvous as a surrogate
to provide connectivity into the Internet and the associated discovery of other peer nodes. Given this, a
device running JXME can, for example, participate in chat rooms, and exchange multi-media content
with any Jxta peer node on any system. The big advantage is only one application needs to be written,
and the identical Java class runs across all of the J2ME enabled devices.
P2P permits an independence that was not previously found on the Internet. We’ve been hammering
this issue home throughout this book. It returns the network to the users. What can be more appropri-
ate? With the evolution of sensors and their placement in buildings, automobiles, clothing, etc., inte-
grating them into our personal, day-to-day life, has the real, near-term possibility of creating a truly
sentient Internet that may represent our daily comings and goings as a personal network presence. We
will be able to be a passive observer of the “network be-ins,” gatherings of people with common inter-
ests to play, to work, to investigate our world from this point of view, or we will be able to actively par-
ticipate. The Network is the User will be the P2P paradigm.
In order to understand P2P technical requirements in the wireless space we must first have at least a
basic understanding of the wireless infrastructures. This includes mobile phones and their associated
networks, 802.11/a/b/g, and Bluetooth. For similar reasons we will take a quick look at wireless sen-

6-3
sors. Good engineers need to know the capabilities of the deployment targets of their software to create
responsive, well performing systems and applications.

6.2 Introduction to the Wireless Infrastructures


A basic understanding of wireless infrastructures will help the software engineer or architect to create
P2P applications that are suitable for wireless P2P (W-P2P). The infrastructure’s use of bandwidth is
very different than ethernet. Wireless communication is always at least partially point-to-point and
requires antennas, base stations, and their associated software and wireless protocols. Two wireless
devices on the same network cannot simply send and receive data packets over the air. That is to say, if
two mobile phones (handsets) are adjacent to one another, then they cannot directly receive the RF sig-
nals that they are sending. Base station receiver/transmitters are always necessary and depending on
the wireless bearer it is possible that a large supporting infrastructure is required to enable the commu-
nication. To explain this we will give an overview of the Global System for Mobile Communication
(GSM) infrastructure which is the most widely deployed mobile, wireless system in the world.
Because GSM has an upperbound of 9600 bps, a bandwidth increasing augmentation of the GSM
infrastructure, the General Packet Radio System (GPRS) was deployed beginning in 2000. We also
give an overview of GPRS. Where GSM is 2nd generation (2G) mobile phone technology, and GPRS
2.5G, we will also discuss 3G wireless networks. The first was deployed by NTT-DoCoMo to its cus-
tomers using Wideband Code Division Multiple Access (W-CDMA) in the October of 2001. This latter
service was called FOMA and followed a pilot project that began in May of the same year. Now, on to
the description of GSM.

6.2.1 Overview of GSM


The basic GSM infrastructure is shown in figure 6-1 below. This topology is divided into cells and in
each cell one typically finds an antenna which is called the Base Transceiver Station (BTS). The BTS
is part of a complex called the Base Station System (BSS) which also includes a Base Station Control-
ler (BSC). The BSC supports multiple BTS’s. The handset or mobile station (MS) communicates with
the BTS using two channels. One channel is for sending and the other for receiving. As the handset
moves from cell to cell a handoff is required to acquire the BTS belonging to the next cell. This hand-
off is managed by the BSC. The BSC also delegates frequency allocation and controls the RF power of
the signals between the BTS and the handset.

6-4
PSTN
BSS GMSC
VLR
BSC
HLR
MSC
BSC AC

EIR
NAS
− BTS IP Network

Figure 6-1. The Basic GSM Infrastructure

Given this communication between the handset and the BTS, a great deal needs to be done to actually
place a phone call or send packet data either to another handset, a phone in the Public Switched Tele-
phone Network (PSTN), or to a system on the Internet. To accomplish these tasks the Mobile Switch-
ing Center (MSC) is used. Mobile switching centers are the central control for coordinating
communication between the handset and the GSM infrastructure. The MSC has implementations of
the GSM protocols that support handset to handset communication; a gateway into the PSTN (GMSC);
and a router into the Internet for data packets. Each subscriber has a “home” MSC and an associated
Home Location Register (HLR) that is a data base for subscriber account and identifying data that
must be bound to the handset as soon as it contacts the base station system. This data includes services
like call forwarding, voice mail, caller ID as well as billing information as well as information about
your current location in the GSM infrastructure.
If a subscriber is visiting the MSC, i. e., it is not the subscriber’s assigned, home MSC, then the visited
MSC contacts the subscriber’s HLR to make a copy of the subscriber’s account information and stores
this temporarily in its Visitor Location Register (VLR). Similarly, the subscriber’s HLR stores the
identity of the current VLR. The handset’s phone number or MSISDN is the identifier linked to the
subscriber’s HLR entries. Thus, when a mobile phone number is called, the MSISDN does not identify
a handset, but rather identifies the HLR entries for that MSISDN.
There are two more components of the MSC that are important to discuss. The first is the Authentica-
tion Center (AC) that is used to authenticate the subscriber and to protect the subscriber’s account from
illegal access. The second is the Equipment Identity Register that is used to identify the handset and
prevent unwarranted use in case it is stolen.

6-5
As is shown in figure 6-1, GSM circuit switched, IP packets are sent from the handset to the MSC to
the Network Access Server (NAS) which routes into the Internet IP network. In this case the handset
makes a PPP connection to the NAS to send the data.
To see how all works together let’s assume that Lois is making a PSTN phone call to Joan whose hand-
set is active. Thus, we know that Joan has an associated mobile switching center, MSCj, and base sta-
tion system (BSSj). Joan’s phone number or MSISDN is sent to a GMSC. The MSISDN identifies
Joan’s HLR, and the GMSC sends the MSISDN to Joan’s HLRj. The HLR looks up Joan’s Interna-
tional Mobile Subscriber Identifier (IMSI) and sends it to MSCj. MSCj contacts its VLR, and the VLR
sends the Mobile Station Routing Number (MSRN) back to HLRj. HLRj in turn forwards the MSRN
back to the GMSC. The GMSC then routes the call to MSCj. The MSCj signals Joan’s phone by the
means of her BSSj and its base station controller resulting in Joan’s handset ringing in her favorite ring
tone. This is shown in figure 6-2.

PSTN 1.M
SISD
N 2. MSISDN HLR
GMSC
6. 5. MSRN I
Ro MS
ut 3. I

N
eC
SR
all M
MSC
4.

all
ou te C VLR
8. Route Call 7. R
BSC

BSC

IMSI − International mobile subscriber identifier


MSISDN − MS ISDN (called number)
MSRN − Mobile station routing number

Figure 6-2. Making a Call

Finally, we want to introduce the reader to the over-the-air, GSM data transmission specifications.
GSM uses both Time Division Multiple Access and Frequency Division Multiple Access. This forms a
matrix in frequency and time as is seen in figure 6-3.

6-6
TDMA 4.615
Frame ms
n+1

TN7
TN6
TN5 TDMA
TN4
TN3 Frame
TN2 n
TN1
TN0
FDMA 200KHz FDMA 200KHz
1 2 3 123 124 Channel 1 2 3 123 124 Channel
890 25 MHz 915 935 25 MHz 960
MHz Uplink MHz Downlink
MS Tx MHz MS Tx MHz

Figure 6-3. FDMA and TDMA

To support FDMA GSM runs at two bandwidths, 900Mhz and 1800Mhz. Suppose we are running at
1800Mhz. There is 75Mhz allocated from 1710-1785Mhz for sending, and 75Mhz allocated from
1805-1880Mhz for receiving. This yields a total of 374 channels of 200Khz for sending and the same
for receiving. A handset is assigned two channels by the BSC. For 900Mhz a total of 50 channels each
of 200Khz is supported.
Overlaid on FDMA is TDMA. TDMA is a continuous sequence of 8 time slots, 0 - 7, on each channel.
Each time slot is .577 ms for a total of 4.615 ms. Along with the channel assignment are two time slot
assignments, one for downlink and the other for uplink. When a call or data is progress, time slots are
reserved even if there is neither voice nor data activity. It is important to understand that the spectra are
shared by both voice and data, and that voice is the big revenue earner for mobile phones. The implica-
tions are discussed in our section on GPRS. But in general, voice will take precedence over data during
peak telephone hours.
When we add to this complexity the problems of high latency between FDMA/TDMA cells, and loss
of signal due to noise, one begins to understand the problems encountered in sending circuit switched
data on GSM. These problems are surmountable even to the extent that TCP/IP can be used on these
networks. The tactics required to accomplish this are discussed later in this chapter in section 6.2.4. A
great deal of work has been done in the IETF in this area. And certainly as one moves from 2G to 2.5G
and finally to 3G, as we will point out, real progress has been made with respect to data mobile wire-
less data bandwidth.

6-7
One cannot discuss GSM without including the Short Message Service (SMS). SMS permits multiple
sources including handsets to send short text messages to a handset in using a store and forward mech-
anism that approaches real time delivery. It continues to be extremely popular and a driving revenue
source for TELCO carriers. SMS is the proof that inter-personal communication is the service that
users desire above any other. This communication embraces implicit connected communities, and thus,
assures the success of P2P applications in this domain.
6.2.1.1 Short Message Service (SMS)
The Short Message Service originated in 1991 on GSM networks in Europe. It permits message of 160
seven bit bytes (ascii) or 140 eight bit bytes to be sent to handsets, and makes use of the GSM infra-
structure as described in the previous section. SMS has the salient feature of being able to send short
messages to a mobile handset while a telephone call is in progress. The early popularity of SMS was
an unexpected phenomena. After all, it only offered the ability for users of GSM mobile phones to
exchange simple text messages in near real time. Yet, what comes to mind was the french school sys-
tem having to disallow mobile phones in class rooms because SMS was being used to exchange
answers to examinations. Like any other service with such popularity the uses of SMS grew over the
1990’s.
The store-and-forward server for SMS is called the SMS Center (SMSC) as shown in figure 6-4. Ini-
tially the right hand side of the figure is what was deployed. With the arrival of E-mail servers in the
enterprise, and then to ISP’s, the left hand side of figure 6-4 came into being. A special protocol that
could be written on top of TCP/IP and connected E-mail servers, web sites, voice mail and FAX sys-
tems, etc., to the SMSC caused an explosion of SMS services. Another reason for the success is the
SMS is its responsiveness. SMS is tightly integrated into GSM and other mobile networks using the
Signal Transfer Point (STP) and Signal System 7(SS7) [ETSISS7] mobile infrastructure protocol. The
STP uses SS7 to communicate with multiple components associated with the MSC. The SMSC also
communicates with the STP using SS7.

6-8
Figure 6-4. SMS Infrastructure Overview [SMS]

SS7 is embodied in what is called the Mobile Application Part (MAP) of the GSM infrastructure. MAP
defines a set of protocols that use SS7 for communication. SMS in turn makes use of MAP to send
short messages. First, each MSISDN identifies its SMSC as an entry in its HLR. Thus, to send a short
from a handset, H1, to another handset, H2, first the MSC must find a route to H2‘s SMSC. The MSC
uses MAP to contact H2‘s HLR and get the route to the destination SMSC. SMS using MAP, and the
route along with the short message contacts the STP which in turn sends the short message to the
SMSC.
Next, the SMSC needs to send the message to H2. MAP is used to request the route to current MSC for
H2 from H2‘s HLR. Given the route, the SMSC sends the short message to the current MSC which in
turn sends it to the handset. Since the MSC uses the BSS to communicate with the handset, the mes-
sage is sent using the short message delivery-point-to-point and forwarding mechanisms and the GSM
airspace protocols, i. e., FDMA/TDMA. Message delivery may in fact fail for many reasons. One
example is that a handoff is in progress and the handset registration of its current MSC has not yet
completed. Thus its HLR does not yet know the current MSC. In the case of such a temporary failure,
the SMSC notifies the HLR that there is a short message waiting for H2. When the registration is com-
pleted the HLR will notify the SMSC that the short message can be sent. This notification uses a ser-
vice center alert. When the SMSC receives this alert, it sends the short message.
When GPRS was introduced to the GSM infrastructure, this also motivated changes in SMS to
improve the available services because of the increase in bandwidth. In fact, an new protocol was intro-
duced in 2002 called the Multi-Media Message Service (MMS)[MMS]. MMS permits one to use SMS

6-9
to send messages longer than 140 eight bit bytes by concatenating multiple SMS messages as a reliable
delivery stream of octets. These longer messages then can include text, audio and images. MMS uses
the Multipurpose Internet Mail Extensions (MIME) [MIME] that have been deployed on the Internet
for email since the early 1990’s. Clearly, MMS anticipates the arrival of 3G mobile networks and its
concomitant megabit per second bandwidths.
In the next section we briefly discuss GPRS, and the changes it requires in the GSM architecture.

6.2.2 Overview of GPRS


Recall that to send digital data on a GSM network a circuit switched telephone call is required. The
data thus passes through the Mobile Switching Center as is described in figure 6-1. The upshot is that it
can take from 15 to 20 seconds to initialize a data sending session. This, in fact, was one of the WAP
killers. The user always became impatient. And, similarly, SMS was popular because it is based on a
point-to-point protocol using SS7 signaling which avoids the circuit switching setup.
GPRS permits a handset to have a network presence, an always on the network feature, that is lacking
in GSM. One can literally “ping” a GPRS enabled handset if one knows its IP address. What is inter-
esting is that this was accomplished without hardware modifications to the pre-existing infrastructure.
New components added, some software changes were required in the base station system (BSS). Fig-
ure 6-5 shows the GPRS additions to the GSM infrastructure.

PSTN
GMSC
VLR
BSC
HLR
MSC
BSC AC

EIR
SGSN
− BTS

GGSN
IP Network

Figure 6-5. GPRS Architecture

The first additional component is the Serving GPRS Support Node (SGSN). The BSS sends the GPRS
packets to the SGSN. The SGSN in turn uses a second new component, the Gateway GPRS Support

6-10
Node (GGSN), to send and receive data from external networks such as the Internet. In the case of the
Internet, the SGSN and GGSN use the GPRS Tunnel Protocol (GTP) to encapsulate the IP packets
they send to one another.
Each handset thus has an IP address once it has contacted the SGSN and in most cases DHCP is used
to assign this address. For all intents and purposes, as far as the user is concerned, the handset has a
real Internet presence and uses the Internet protocols to communicate to both other handsets and ser-
vices on the Internet itself. What about the increased bandwidth? How is that accomplished?
If one takes another look at figure 6-3, one sees that each FDMA channel has eight TDMA slots. Since
the data capacity for each slot is up to 14.4kpbs, the designers of GPRS networks assumed that a band-
width in excess of 100kps would be obtainable if a complete eight slots was assigned to a single
mobile handset. But, there were a couple of issues. First, and importantly, this spectra is shared with
voice calls and at voice call prime time, one could not assign that many time slots to GPRS devices and
also handle the voice call volume. And yes, voice is the primary source of revenue for mobile phones.
A second issue was the GPRS handsets actually burned the user’s hand during pilot testing in 2000.
The phone electronics generated too much heat at high bandwidths. As a compromise, in 2000 a maxi-
mum of two slots was assigned to a handset yielding about 28kpbs as the maximum bandwidth. This is
3,500Bytes per second and that’s not so bad given the screen sizes, and memory constraints of these
devices.
Still, more bandwidth is desired for better image and video services. As we all know, handsets are now
multi-dimensional. They serve as phones, mini-cameras, access web sites that serve up customized
web pages, one can play interactive games, and have P2P support using Java. To meet these ends 3G
networks are being deployed. We will concentrate our 3G discussion on W-CDMA after a brief intro-
duction to CDMA.

6.2.3 Overview of CDMA and W-CDMA


Code Division Multiple Access (CDMA) is used by many carriers world-wide. Taken along with
GSM/GPRS these two bearer networks support the majority of the mobile, cellular handsets in the
world. It is also important to note that these over-the-air standards are supported by more and more
devices including PDA’s and Laptops.
The CDMA infrastructure’s topology is functionally similar to that of GSM. Both networks are cellu-
lar, and there are base stations, and mobile switching centers to mediate the mobile communication.
But unlike GSM, in CDMA all uplink and downlink channels share the same bandwidth in each cell in
the system. This bandwidth is 1.228 Mhz. Each mobile device is assigned a unique code to identify its
two channels that are modulated onto the 1.228 MHz band.
CDMA is a spread spectrum technology, which means that it spreads the information contained in a
particular signal of interest over a much greater bandwidth than the original signal. With CDMA,
unique digital codes, rather than separate RF frequencies or channels, are used to differentiate sub-
scribers. The codes are shared by both the mobile phone and the base station, and are called pseudo-

6-11
random code sequences. Since each user is separated by a unique code, all users can share the same
frequency band (range of radio spectrum). This gives many unique advantages to the CDMA technique
over other RF techniques in cellular communication[www.protocols.com].
Using voice as an example, suppose a user is making a CDMA phone call. The voice is digitized,
assigned its special code and then modulated at the CDMA frequency. When one is talking on a phone,
the voice frequency ranges from about 300 to 3,400 Hz. This is an analog signal. It is digitally encoded
at 9,600bps, and then spread to 1.228Mbps by adding special codes for redundancy. These codes
include special device identifiers. The digital signal is then broadcast at 1.228Mhz and it is added to
the other signals that are present on the downlink channel. On the receiving end the same code is used
to decode the incoming signal. The 9,600bps digital signal is extracted, and the original analog signal
is reconstructed. CDMA can be used for data as well as voice. In this case, the digital data is directly
encoded at 9,600bps.
The 3G wide band CDMA (W-CDMA) realizes a much higher bandwidth. W-CDMA has a number of
methods to increase its system capacity. The current use of W-CDMA theoretically yields up to 2Mbps
and provides a highly efficient use of the radio spectrum. The current W-CDMA permits 384Kbps and
the stable bandwidth is about 120-150Kbps. This allows simultaneous access to several voice, video
and data services at once. Ericsson has been the leader in this technology and provided a prototypical
W-CDMA system to NTT-DoCoMo in the Fall of 1998. W-CDMA like GPRS gives the mobile device
a true network presence, and thus rapid data connections between mobile devices and the Internet
using IP.
We’ve now discussed both 2G, 2.5G and 3G mobile wireless networks. It’s clear from our discussions
that the goal of the evolution of these networks and their handsets is to give them true Internet connec-
tivity that provides unique blend of multi-media services because the devices are small and handheld.
One can use current phones as digital cameras, and send these images immediately to friends using
email. If the friend is within a few meters, than bluetooth or infrared can be used for transmitting a
photo. This is perfect territory for P2P Networks to capture. We discuss this in more detail in section
6.6.
Let’s now look at the mobile phones themselves. Just like the cellular mobile infrastructures, these
devices have constrained architectures that the software engineer must take into consideration to write
effective, user centric, and easy to use P2P applications.

6.2.4 Introduction to Mobile Phones and their Application Environments


The majority of today’s software engineers do not have to worry about applications running on a con-
strained device. Desktop systems and laptops routinely have a gigabyte of memory, CPU speeds in
excess of 1 gigahertz, multiple gigabytes of disk storage, large color displays, 10 to 100mbps ethernet
connections, etc. Small mobile devices lie at the other end of this spectrum because of device size, and
more importantly, battery power limitations.

6-12
To start our discussion we describe the capacities of current mobile phones, the anticipated set point
for these capacities in the next year, and also give the reader an idea of their feature sets to help set the
stage for those who wish to architect service solutions and ultimately program these devices. It’s
important to have a handle on the right expectations. The high end phones give us an indication of the
evolution of mobile handset computing. Writing P2P software for such a device will be an easier task
because there will be less of a dependency on the mediator as a surrogate for storage and processing.
To begin with, the major constraint of these mobile devices is the batteries. One gets about 4 hours of
“talk time,” and roughly 10 days of standby time. Applications vary with respect to power consump-
tion since they can use almost all or very few of the aspects of a device’s capabilities. In any case, the
power is clearly limited for applications too. For users, the most obvious characteristic is the screen
size and whether or not it supports color. Color screens are the norm, and better phones now support 16
bit color (65,536 colors). The sizes vary but on the average one sees larger screens. It is not unusual to
have a PDA-like screen of about 170x220 pixels. The tendency is for mobile phones to become PDA/
phone hybrids. With respect to applications, memory size is extremely important. The powerful phones
of the late 20th century had 256Kbytes of RAM and 1 megabyte of flash. Now, high end phones which
are referred to as IMT-2000 or 3G phones have 16-32Mbytes of RAM, and 32-64Mbytes of flash
memory. The footprint for the average phone is quickly converging to 16Mbytes of RAM and
32Mbytes of flash. For applications this is very important because it permits them to have about
4Mbytes of heap available along with 20Mbytes of flash as the application file system’s permanent
storage. Finally, as mentioned in the previous section the W-CDMA phones have excellent bandwidth.
The feature list is rich. Typically, one now sees features like the following as common in these phones:
1. W-CDMA network service. For example, the -Mode FOMA phone which achieves 384bps
downlink and 64kbps uplink.
2. One or two CMOS cameras. The i-Mode FOMA phone has a 320,000 pixel outer camera
for scenery, and 110,000 pixel inner camera for portraits.
3. One or two speakers.
4. Java support and over-the-air provisioning of Java applications.
5. SMS, MMS including video streaming, FAX, Internet Browser, Email, Games all arrive as
prepacked, in the phone applications, and digital radio reception.
We see from the above that mobile phones are feature rich to attract customers and this trend is grow-
ing. And, most of these features are available through the core software that is delivered with the
device. Each phone is a full computer with a CPU, buses, and RAM memory. It has an operating sys-
tem, and while this OS is small, is not different than any other OS with respect to functionality. It
requires device drivers for managing flash memory, cameras, RF, the screen, keyboard, joy sticks,
sound, the network interfaces, etc. It has a complete network stack and today this always includes the
IP suite of protocols: TCP, UDP, HTTP, SSL/TLS is the minimum complement. IMAP email is also
finding its way to mobile phones since it is much better suited to these devices than POP3. The better
phones have XML parsers and support XHTML basic browsers. Others without XML will use stripped
down versions of HTML like HTML basic [W3C]. XML will be the default in all 3G devices.

6-13
The OS must have code to manage the bearer networks such as GSM/GPRS, CDMA and W-CDMA.
GSM phones have Subscriber Identity Modules (SIM cards), and SIM cards are finding their way into
3G phones too. These SIM cards always contain the user’s identity information for a given Telco oper-
ator. Thus, if one acquires a new phone, the same SIM card is used and this provides a “pluggable”
user identity. A SIM card interfaces with the phone’s OS, and is really a tiny, onboard, computer with
its own CPU, applications, RAM and storage.
For example, they have full crypto suite support for RSA, ECC, SHA-1, RC4, MD5, etc., and therefor,
the mobile phone can use the in-silicon, SIM card cipher code to do its crypto computations. See the
Java Card description in section 6.2.5 for more details. The same SIM cards manage SMS/MMS with
hooks into the phone’s OS, and have access to the phone’s display driver for running small UI based
programs. With the addition of a SIM card one has a parallel, dual processor system in the phone.
There is a special toolkit, the SIM Tool Kit, that has API’s into those SIM card functions used for SIM
card access to the phone’s operating system, as well as for SIM card access by applications running in
the phone. Thus, programming one of these “beasts” is a really fun task that requires lots of special
expertise. It is not unlike the assembly language programming of an earlier epoch. A joy to experience
for those willing to give it a try!
From the perspective of P2P we are interested in applications on these devices that can interact with
our P2P Overlay Network. While it is highly likely that such a P2P application will ultimately be part
of the core software that is on a mobile phone when it is purchased, this is in reality a long way into the
future. It is important to understand that the turn around time for adding new mobile phone resident
applications to a shipping device once the technology’s market value is understood in the context of
this device, can take between 14 months and two years. Resident software in mobile phones cannot
have bugs that cause the phone to crash because one cannot readily replace, or patch these applica-
tions.
Also, since these applications reside as part of the phone’s flash memory, even as the size of this mem-
ory grows, the flash memory “real estate” that applications occupy is costly. There is competition for
this space among many applications which is evident if one just looks at the feature sets of the above
phones. Music players, cameras, web browsers, video display, email, calendars, games, and IM are just
a small example of the competitors for the occupancy of the flash memory store. There is a way around
this problem.
In 1998 Sun Microsystems launched the Doja project with NTT-DoCoMo. The goal was to write an
application, execution environment that resides in the mobile phone along with a small Java virtual
machine to then download and run Java applications. As mentioned earlier in this chapter, the applica-
tion, execution environment is called the Java Application Manager (JAM) and the first small footprint,
virtual machine the KVM. The unique innovation was that small Java applications, in fact, could be
downloaded over the network, stored temporarily in the small space reserved for them in flash mem-
ory, and be executed. The downloading, flash memory management, and class loading from flash
memory is done by the JAM. It thus provided for the first time a very flexible testing environment for a
collection of applications that could eventually be loaded onto a web site, and then accessed and run by

6-14
millions of Doja users. The JAM also permits the user to manage the reserved, Java application storage
on the device.
As a final remark about Doja, in 1999-2000 the initial Java applications were limited to 10K byte jar
files. This is small by today standards. There are at least three reasons for this. The first is that over-the-
air (OTA) application provisioning takes time. The early NTT-DoCoMo infrastructure had a bandwidth
of 9600 bits per second, and again, these networks had high latency and data loss due to noise. I-mode
is always concerned with a good user experience. Waiting too long for an application to download was
not tolerable, and still is not. The second reason was the limited flash memory size in the early phones.
It was 1 megabyte in high end phones, and this memory also is where the core phone software is
stored. On the other hand, good developers wrote exciting software even with what seems to be an
unreasonable size limitation to most of today’s software engineers. The third reason is that mobile
phones have very limited RAM. RAM is power consuming. Thus, even if application code can be
directly executed from flash memory, the heap size is restrictive, and a large portion of the RAM and
heap is allocated to the resident operating system when the phone is turned on because executing from
RAM is much faster than executing code from flash memory. Consequently, keeping programs small
and minimizing memory allocations that are kept on the heap encourages the conservationist approach
to software programming that is a requirement for these devices. I-mode has increased the jar file max-
imum size to 30K bytes for its 3G W-CDMA phones. These latter phones have significantly more flash
memory and network bandwidth. As we will see below, even much larger Java applications are being
down loaded to 3G phones.
Finally, a limited number of Java applications can be downloaded, stored in flash, run, and either kept
or deleted at the user’s whim thereby creating reusable, flash memory real estate. There are many tech-
nical issues that must be discussed to better understand the Java2 Micro Environment (J2ME), and the
development cycle of J2ME applications in the mobile phone. These are covered in section 6.2.5.
Also, it is noteworthy to mention that QUALCOM has an OTA technology called BREW. BREW
applications are usually written in C or C++, compiled and downloaded in binary format to a mobile
phone with a supporting execution environment. BREW has a substantial following. From our point of
view, BREW has one drawback. It requires different binaries for the different instruction sets that are
found on different mobile phones. It does not have the write once, run anywhere feature of Java appli-
cations because it does not generate machine independent byte codes as Java does. Thus, BREW appli-
cations must be ported to mobile phones with differing instruction sets. It is for this reason, that we
will concentrate on Java on the mobile phone. We truly see this as the wave of the future, and this
comes from a long experience with distributed programming. Java simplifies application development
by only requiring a single jar file per application that requires no porting for all of the mobile phone
platforms that support J2ME.
We’ve pointed out that the mobile phone software programmer must be aware, and in fact, almost para-
noid when it comes to an application’s performance of these devices. Since almost all of these applica-
tions will use the network, they must be tolerant of data loss due to noise, low bandwidth, and high
latency. Also, the network is shared, and an application must be a good network neighbor, and not hog

6-15
bandwidth. As much as these networks have evolved over the last decade, they are still constrained,
they are not Ethernet. Bad network use leads to unhappy users, and applications that make users
unhappy do not get used. Also, applications must be tolerant of battery consumption. Every bit that
goes onto the network, and every instruction that gets executed by the CPU consumes the battery’s
power. And, finally, there are not gigabytes of flash memory for application storage, and surely, OTA
application downloading must be rapid. Keeping applications small is therefor another maxim for soft-
ware engineers in this domain although “small” is becoming less and less constrained. On W-CDMA
networks with a downlink bandwidth of 120Kpbs one can download a 128KByte Java application in
10-12 seconds. That is a large application for such a small device noting that the Java runtime libraries
are part of the Java core software that is packaged in each device. Certainly, many different applica-
tions can be written that will beautifully integrate with our P2P Overlay Network’s infrastructure.
What some of these applications might be, and how this is accomplished is discussed in the following
sections.

6.2.5 Java Across the Wireless Device Space


Java is a ubiquitous programming language because of its “write once, run anywhere,” paradigm. As
the systems across which Java can be run have increased, the Java paradigm has been stretched. Small
wireless devices created an opportunity for Java applications that required some innovative thinking to
come up with Java for different platforms, and in particular for those platforms that are incorporated
into small, wireless devices. To this end different Java configurations have been defined for different
devices. For large systems we have the Java 2 Standard Edition (J2SE) and Java 2 Enterprise Edition
(J2EE) with which most of us are familiar. Similarly, for small devices we have the Java 2 Micro Edi-
tion (J2ME). All of these “editions” are defined by “expert groups” in the Java Community Process.
This is an open standard body managed by Sun Microsystems, Incorporated [JCP].
The space of micro devices is large and thus it is necessary to have different Java virtual machines for
different devices. Some devices have floating point and some do not. There are processor/memory and
memory differences, etc. To accommodate these differences, device configurations have been defined
to which a specific virtual machine can be targeted. With respect to J2ME there are three such configu-
rations. They are as follows:
1. The Connected Limited Device Configuration (CLDC) which targets devices like mobile
phones and smaller PDA’s.
2. The Connected Device Configuration (CDC) is targeted at more powerful PDA’s and
devices that have typically 32 bit processors. Here we have the larger PDA’s and set top
boxes for example.
3. Java Card which targets Smart Cards which is a miniscule platform but quite powerful. Java
Cards like SIM cards are from the point of view of their hardware footprint follow the
Smart Card growth path. A short description of Smart Card hardware specifications is as
follows [www.gemplus.com/techno/chipware]:

6-16
a) 32K or 64K bytes of flash memory, and 1M flash memory is anticipated in the near
future.
b) 256 bytes to 2K bytes of SRAM
c) 8-bit, 16-bit and 32-bit CPU’s
d) 8K to 64K bytes of ROM
e) Crypto Processor for cryptographic calculations
f) Random Number Generator for cryptographic calculations
The CLDC targets a KVM with minimal characteristics. This does not prevent one from writing an
implementation for somewhat larger devices, i. e., these characteristics define a lower-bound for imple-
mentations and the basic Java libraries that CLDC requires. In all practicality, CLDC is for low-end
mobile devices with 512K RAM. The RAM must accommodate the OS, KVM as well as heap space
for Java applications and libraries. The actual CLDC specification targets devices with much smaller
constraints to permit a variety of implementations. For our intents and purposes, i. e., applications that
include a P2P Overlay network implementation, only the more powerful devices fitting this configura-
tion are of interest. The second configuration, CDC, is for high end mobile devices with a minimum
2MB RAM, for example, high end PDA's, and some 3G phones. As mentioned before, KVM imple-
ments the CLDC configuration and there are other VM's that exist, such as IBM's J9 for CLDC. These
virtual machines are interfaces between running Java applications and the underlying system. Beyond
byte code execution, they provide basic Java runtime support such as is found in java.lang.*,
java.util.*, and java.io.*. There is also support for the network and security. CLDC does permit appli-
cations to be multiple threaded [CLDC 1.1]. Again, we must emphasize that the boundaries defined in
the specifications are flexible. There will be and are CLDC implementations on mobile phones with
16Mbytes of RAM. This is because devices are evolving rapidly and CLDC permits extensions of its
libraries.
To extend a configuration for the complete functionality, J2ME also defines profiles containing a set of
classes or API's. There are several profiles to support different configurations, Mobile Information
Device Profile (MIDP), PDA Profile (PDAP), Foundation Profile (FP), Personal Basis Profile (PBP),
and Personal Profile (PP). Besides profiles, J2ME also uses optional packages to provide more func-
tionalities and these functionalities are not necessary to be associated with a particular configuration or
profile. Examples of such optional packages include PDA Optional Packages for the J2ME Platform
(PIM), Java API's for Bluetooth (BTAPI), Wireless Messaging API (WMA), etc.
Among the profiles, MIDP was the first profile to be finished and adopted [JSR37]. It required 256KB
non-volatile memory, 128KB volatile memory for Java heap, 8KB non-volatile storage for application-
created persistent data, 96x54 pixel display, keypad and to support http 1.1. The applications devel-
oped for MIDP devices are called MIDlets. There are several MIDP development environments, such
as Code Warrior Wireless Studio, Borland's JBuilder, and Sun's J2ME Wireless Toolkit. The MIDlets
are much like other Java programs except they must not halt the VM and don't enter from main(). They
extend javax.microedition.midlet.MIDlet and implements some associated methods.

6-17
The lifecyle of a MIDlet is from discovery, installation, update, invocation to removal. A user discovers
a MIDlet using a mobile device by accessing a network location. Among a list of MIDlets provided by
the service provider the user can make a selection. To start the installation, the application descriptor
and its URL are passed to the device's Application Management Software (AMS) or JAM. Using the
information in the application descriptor, the JAM can determine if the device has enough memory for
the MIDlet, much like before installing a software on a PC, the installation software will check the
memory first. The device initiates the download of the selected MIDlet via HTTP and converts the
downloaded MIDlet(s) into a MIDlet suite. If there is already the same MIDlet suite existing on the
device, the device modifies the user the version difference between existing and downloaded MIDlet
suites and gets the user's permission to either overwrite or keep the original MIDlet suite. When a
MIDlet is executed by the user, the associated CLDC and MIDP classes are invoked to support the
MIDlet. MIDlet removal is very straightforward and must be confirmed by the user beforehand
[MIDP2.0].
An implementation of the MIDP API’s among other things provides the UI widgets, game develop-
ment, control of the application’s lifecycle (start, stop, pause, continue, etc.), persistent data storage in
the non-volatile memory, as well as support for networking, X509.V3 certificates, https, sound and
tones.

MIDP

CLDC

KVM

Figure 6-6. The MIDP/CLDC Stack

We have barely touched the surface of the power of Java in mobile devices. Our goal has been rather to
point out the flexibility and robustness of the model that MIDP/CLDC along with Java Card provide
the system architects and programmers that target building a P2P Overlay Network that includes wire-
less devices. Clearly, from this point in time and into the foreseeable future wireless communication
will be a significant, increasing and ultimately the dominating technology completes the Internet. It
permits cars, homes, people, desktops, stereos, TV’s, offices, “you name it,” to merge into a single net-
work infrastructure with unlimited potential for human communication. The latter is the real driving
force that generates revenue.
As we continue in the wireless vein of discussion, we next bring into this context the providers of the
wireless networks and services, and discuss their current roles and what their roles might be in the P2P
space.

6-18
6.3 The Telco Operator/Carrier
The Telco operators or carriers provide the wireless infrastructure. This includes all of the hardware
and software that permits a device that can communicate within their infrastructure to do so. It must be
appreciated that this is a both a huge investment and extremely costly to operate. For these networks
there are two direct sources of revenue. They are voice and data. With respect to data the current trend
is to charge on a per packet basis. The kinds of applications an operator wishes to see are those that
will increase data traffic. As a consequence, one needs to dream up new applications that will be popu-
lar with subscribers and generate packet traffic that is maximized within the constraints of the shared
network. To attract interesting services and applications, operators form partnerships with both handset
manufacturers, and independent software vendors (ISV). As we’ve already explained, there are differ-
ent bearer networks, and not necessarily all handset manufacturers make devices that support all bearer
networks, and thus, liaisons develop. What goes on is quite complicated because both the operator and
the handset manufacturer have their own ideas about what users might want to see in a handset. So, in
the end, the final device that arrives is the result of negotiations between the interested parties. And
such a device along with the software and hardware to support the infrastructure also has a predefined
set of core applications. Each application is there because the market analysis has shown that it will
drive new revenue. After all, the goal of the handset manufacturer is to sell handsets to subscribers to a
particular carrier’s network. Now, OTA provisioned applications open up many more possibilities
because the sources may be less controlled. Applications that arrive in the phone are placed there by
the manufacturer, have undergone full QA and are trusted.
Once the network is opened up as a source of applications, one needs to be more cautious. There are
features in the phone that not every application from any source should be able to access. A simple
example is a user’s email address book. This is a well known source of network attacks. So, how is this
controlled? The model is fairly simple. Access to critical functionality in the phone is controlled by
API’s into the phone’s operating system, and this access is reserved to those applications that are
known to be trusted. How is this trust established and how can a carrier gain revenue from such a
model?
The trust is usually established by the operator signing applications with its private key, and associat-
ing with the signature a level of trust. This implies that a known set of public keys is already in the
phone. There is an exception and that is those applications installed in the phone by the handset manu-
facturer. The manufacturer essentially writes the code that is bundled into a phone. Thus, it has full
control over what the code can or cannot do. This is a part of the negotiated deals between the two par-
ties.
Otherwise, for OTA provisioned Java applications the operators can have relationships with ISV’s, and
will sign some applications with their private keys in exchange for money from the ISV. This increases
the value of the application in the marketplace, and the ISV also financially benefits. Clearly, the privi-
leged applications are those that will be most popular with the users. The operator has a lot of power
since it controls the network and can restrict or permit applications to access addresses anywhere in the

6-19
Internet. There are game servers, servers that are sources of music, and video, etc., and these services
are another source of operator and provider revenue.
If we add the JAM to this mix, then even more power is given to a signed application since the JAM /
KVM controls a Java application’s interface to the phone’s operating system. All application access to
critical features is trapped by the JAM. If the application is not authorized to access, for example, the
SIM card or the network, then a security exception is thrown. The UI will indicate to the user that the
application is attempting to do something illegal, and the user can delete the application by the means
of the JAM. This is a very attractive revenue model for the operator, the handset manufacturer and the
ISV’s.
What then can P2P applications provide in this environment that is not already there? And, as impor-
tantly, create additional revenue for the operators, handset manufacturers and ISV’s. It is certainly true
that file-sharing-like P2P applications can be written for these devices so that J2ME applications,
images, video, and music can be shared on the P2P Overlay Network between any two mobile phones
independently of the device hardware differences and carriers. Recall, first, that J2ME applications are
device independent, and second, that on the P2P Overlay Network, 3G and GPRS phones have built in
data roaming. Yet, we find it more compelling to look at the additional power that can be given to users
by the means of the connected community aspect of our P2P Overlay Network infrastructure. All P2P
network activity is done in the context of a CC. CC’s provide a secure “walled garden” that keeps non-
members from intruding on intra-CC communication. Consider the following statement:
Telstra has boosted capacity for its mobile phone networks around Australia to cater for
expected record call and SMS volumes on New Year's Eve and New Year's Day. Telstra Corpo-
rate Relations Manager, Maria Simpson, said more than 16 million SMS messages were
expected to be sent by customers on New Year's Eve and New Year's Day - outstripping last
year's record performance by two million messages. “Portable mobile base stations will be
installed and mobile phone capacity increased at various celebration locations and holiday
spots across Australia to help our customers connect on what is one of the most celebrated
nights of the year,” Ms Simpson said. “Our customers sent 14.2 million SMS text messages on
New Year's Eve and New Year's Day last year and we anticipate this record will be exceeded
again as these services continue to grow in popularity. Thousands of customers are also
expected to use Picture Messaging to send photos and personalized messages, ring tones and
icons to family and friends around the world [http://geekzon.co.nz Jan. 2, 2004],” Ms. Simpson
said.
People are primarily interesting in communication with one another. P2P applications foster this kind
of global activity within connected communities. CC’s are arbitrarily defined groups of individuals
where the membership can be strictly controlled by the members. This tribal behavior benefits all con-
cerned parties: The infrastructure is tied together by mediators. If operators provide the mediators, this
is another source of operator revenue. The CC members get the built in privacy, and the associated
sense of control and security that this brings along with it. For example, eves dropping on family con-
versations is no longer a concern. Surely families will want to include many kinds of Internet wide
communication capabilities within these CC’s as their automobiles, offices and homes become part of

6-20
this CC fabric. This can only increase the data traffic, and thus the operators’ revenue. Operators can
also charge a small amount per supported CC. Again, we will have the ISV’s creating operator signed
P2P applications. Here, we immediately have a way to deal with copyright laws since the operator
signed applications can be written and verified to respect copyright.
This is enough to tickle the imagination of those who wish to create business models around P2P in the
wireless world. We’ve only scratched the surface of a very rich space of applications and services that
P2P can provide in this domain. In section 6.6 we give more details about implementing the P2P Over-
lay Network that incorporates wireless and wired communication. Before doing that we give a brief
introduction into fixed wireless networks that are also a large part of the wireless picture.

6.4 Fixed Wireless Networks


The fixed wireless network offers great mobility and flexibility for personal as well as commercial use.
Now, at home, we can find that Mom, Dad and the kids use 802.11 a/b/g enabled Wireless LAN
(WLAN) to access the Internet from anywhere in the house, garden or garage through inexpensive and
easy to install access points that can be purchased from any neighborhood computer store. And this is
possible as long as somewhere either on the property, or within WLAN range there is wired access to
the Internet. Even if the latter does not exist, with sufficient access points in place, homes and neigh-
borhoods can form private WLAN’s for their own use. In hospitals medical professionals obtain
patients’ data from their bedsides and other hospital locations using WLAN’s and medical instrumen-
tation that is 802.11 enabled. Warehouse workers can check a company’s inventories with wireless
scanners by connecting to a database server. Hotels, coffee shops, cafes, airports, etc., are all becoming
802.11 enabled. More and more of these locations give free access to the Internet to all 802.11 devices
in communication range, and are called free “hotspots.” What is the basis for this phenomena? First,
WLAN’s are more economical since they reduce the cost by not requiring the installation of wired
solutions everywhere. Second, the Internet is pervasive as a communication medium. This combina-
tion is especially attractive in areas that are difficult to reach or that have no wired infrastructures
already in place. Thus, one expects to see a large deployment of Public WLAN’s (PWLAN) world-
wide, and particularly in Asia where there is a huge economic motivation with concomitant growth in
mobile and fixed wireless Internet Access. In the end, it is the existing standards proposed by IEEE for
the WLAN’s that simplify the process of WLAN installation and use. This section will give a brief
introduction to these standards and point out the potentials of using such WLAN’s as P2P Overlay Net-
works.
The basic IEEE standard for WLAN’s (IEEE 802.11) operates on the top of a cellular architecture
which divides the system into Basic Service Sets (BSS) that are analogous to the cells in GSM/GPRS
systems described before. Each BSS also contains two or more wireless nodes, or stations which can
communicate with each other. This forms an ad-hoc wireless network, shown in figure 6-7
[www.ydi.com/deployinfo/wp-80211-tutorial.php].

6-21
node

node node

BSS

Figure 6-7. BSS Ad-hoc Wireless Network

802.11b is the most widely deployed network. Its bandwidth is 11Mbps. 802.11g is becoming more
popular and is backwards compatible with 802.11b. Its bandwidth is 54Mbps. Finally, we have
802.11a with a bandwidth of 54Mbps. These ad-hoc networks have a range of about 30 to 50 meters
depending on the standard used and the antennas. 802.11g has the longest range and 802.11a the small-
est.
While ad-hoc 802.11 networks are advantageous, there are disadvantages. The nodes need protocols to
assign network addresses, and to do routing when nodes are out of range from one another. Also, there
is no way to interoperate with wired networks. To deal with these problems and manage the network in
a more structured fashion we add Access Points (AP) to the wireless network. In this way each node is
controlled by an Access Point, which is analogous to the base station in GSM/GPRS systems. An AP is
really a hub, and creates a star shaped wireless topology. In this star, nodes will communicate with
each other or with other wired components through the AP. In a home one AP may be enough to form
a WLAN to satisfy personal use. And in this case, the AP can have a wired connection to the Internet.
They also act as NAT’s, manage the IP address space, and can use DHCP to acquire an outside IP
address if that is necessary. A good diagram of this kind of configuration is chapter 3, figure 3-8. For a
larger WLAN where there are difficulties with respect to range, several AP's may be connected
together. This connection is wired and in most cases is ethernet. Figure 6-8 shows two BSS’s con-
nected by the means of an ethernet. The latter finds its way to the Internet via a router.

6-22
Internet

Ethernet

AP AP

node
node
node node
node
BSS1 BSS2

Figure 6-8. Connected BSS

IEEE 802.11 standard is limited in scope to Physical layer and MAC (Medium Access Control) layer.
The physical layer uses two RF technologies: Direct Sequence Spread Spectrum (DSSS) and fre-
quency Hopped Spread Spectrum (FHSS). Both technologies operate in 2.4GHz ISM band world-
wide. The Industrial, Scientific and Medical (ISM) radio bands were originally reserved internationally
for the non-commercial use of RF electromagnetic fields for industrial, scientific and medical pur-
poses, and they are now unlicensed RF spectra. The basic access method is Carrier Sense Multiple
Access with Collision Avoidance mechanism (CSMA/CA). CSMA lets a station sense if a medium is
free or busy. The transmission of a packet will be delayed for a busy medium. But if multiple stations
sensed the medium is free and transmit simultaneously, then a collision occurs without detection on the
part of the senders. To avoid the collision, or if the medium is busy, then if the medium is free for a
period of time, called Distributed Inter Frame Space (DIFS), the station transmits. Each sent packet
has a 32 bit cyclic redundancy checksum (CRC). When a packet is received, the CRC is validated by
the receiver and an acknowledgement is sent if the CRC is correct. After receiving the acknowledge-
ment, the sender then knows that no collision occurred. Otherwise, the sender will retransmit the
packet under the constraint of the DIFS until it receives an acknowledgement or gives up after a num-
ber of retransmissions with no acknowledgement have been sent. CSMA is also used in Ethernet with
one small difference. Sending nodes can detect collisions while they are transmitting. For this reason
ethernet is called CSMA/CD which is specified in IEEE 802.3.
WLAN provides a great deal of mobility since nodes may be allowed to join an existing BSS freely.
This is the concept of free “Hotspots.” Even if the access is not free, a simple password is often all that

6-23
is necessary to gain entry into the Hotspot. This is exactly why 802.11 Hotspots are so popular world-
wide. They use AP’s that have high-speed Internet connections and thus Internet access for the active
BSS nodes. Figure 6-7 shows wireless hotspots in Lawrence, Kansas. Similar aerial photographs can
be made of most metropolitan areas in the world today.

Figure 6-9. 802.11/b Hotspots in Lawrence, Kansas [Hotspots]

If there is an AP, then a new-comer node synchronizes with the AP by either passive or active scanning.
The passive scanning will make the node wait for a signal from the AP. And the active scanning will let
the node actively find the AP. For ad-hoc nodes, the same synchronization is used with other BSS
nodes.
Because the WLAN RF signal is generally not contained, it is detectable and thus the transmitted data
accessible by any observer within range and with the appropriate 802.11 device. As a consequence,
security issues are also addressed in the IEEE 802.11's Wired Equivalent Privacy (WEP). An Authenti-
cation mechanism is designed to prevent intruders from accessing the wireless network resources.
Only correct keys can enable a connection to the network. But this does not prevent passive observa-
tion of the network data. Eavesdropping on the WLAN traffic is prevented by an implementation of the
RC4 byte stream cipher algorithm as a part of WEP. The WEP key sizes can be 40, 64, or 128 bits.
WEP also uses the CRC as an integrity check on the data. Unfortunately, even using the maximum key
sizes, WEP can be attacked by a determined hacker passively snooping on the network. For a 40 bit
key, simple cryptanalytic attacks can recover the plaintext in about 15 minutes. For larger keys the
attack scales linearly. Thus, unfortunately for 802.11, WEP is basically insecure. Fortunately, there are
new standards, the Wi-Fi Protected Access (WPA), that claim to address the weaknesses of WEP. For
details of these attacks see [http://www.isaac.cs.berkeley.edu/isaac/wep-faq.html].

6-24
The above brief introduction on WLAN (802.11 protocol) shows that it is straight forward to use wire-
less nodes as peerNodes to form a dynamic wireless network upon which the P2P overlay network can
be placed in the case where AP’s manage the connectivity. Such wireless nodes use TCP/IP as the
transport on the wireless bearer network. Thus, from the point of view of the applications, they run
exactly the same as they would on the Internet. The only exception is that some devices like mobile
phones are resource constrained when compared to a laptop, and application writers need to keep this
in mind. As a consequence, the P2P application programmer need not do anything special for these
802.11 networks. The 802.11 peerNodes will use exactly the mechanisms we have described in previ-
ous chapters to contact and thereby be hosted by a mediator. We have been intentionally prudent in our
design to assure that the system documents and protocols we have designed can be implemented across
a large device space. Some of this is not true for BSS’s that do not have AP’s. Here we have mobile ad-
hoc networks that, as mentioned above, need the assistance of their own protocols for discovery and
routing amongst themselves at the bearer network level. They don’t see each other as IP devices. These
protocols are below the ONP layer but still the nature of these networks does not yield a topology that
is appropriate for mediators to play a role. We discuss mobile adhoc networks in the next section.

6.5 Mobile Ad-hoc


Here one imagines a collection of a few to possibly a few hundred 802.11 enabled devices that may
be in multiple BSS’s as shown in figure 6-7 and with no AP’s. Because of the problem of 802.11, all of
the devices may not be within RF range of one another. What we see are BSS’s that have an overlap of
at least one node. If a BSS is isolated from all other BSS’s, then no inter-BSS communication will be
possible. This is shown in figure 6-10 below.

6-25
node
node
BSS4

node

node
node
node node

node BSS1 BSS2

BSS3

Figure 6-10. Overlapped and Isolated BSS’s

Mobile Ad-hoc Networks without AP’s are called MANets. It has been argued that MANets are not
P2P networks: “Peer-to-Peer is based on an IP network, and mobile ad-hoc networks are based on a
mobile radio network [SCHOLLMEIER].” The same authors also say, “The most important difference
between a mobile ad-hoc network and a peer-to-peer network is the motivation to create such a net-
work... the main reason for using a MANet is to communicate with other users... the primary goal of
most P2P users is to search for data.” From the point of view of modern P2P networks and the direc-
tion in P2P is evolving, these authors are incorrect. Certainly, at its onset P2P was strongly influenced
by Napster and similar applications. Now, and we believe these authors will agree, P2P covers all
aspects of communication between peers. It has moved into the mobile phone and PDA world. In fact,
there was an excellent paper by Mikael Wiberg presented at HICSS-37 [Wiberg04] that describes
research that is a blend of MANet’s and P2P. Mikael’s MANet users search for other users who have
common music interests. One can imagine people arriving at meeting places like village squares and
shopping malls with their MANet devices in hand making network contact with others, looking at their
music interests, and if they are the same, then streaming the music to one another. The network contact
can similarly form an ad-hoc, music-interest chatroom.These networks are truly MANets, and exhibit
the multiple directions that P2P has taken since Napster hit the streets. P2P is no longer cornered by
pure content exchange. In fact, MANets are in the spirit of P2P that was imagined by people like Ian
Clarke of Freenet, and Gene Kan of Gnutella.
This is the primary purpose of connected communities in our architecture. Connecting people with a
common purpose in a manner that is independent of the network stack and its routing protocols. This is

6-26
precisely why we introduced the P2P Overlay Network and application routing protocols on top, and
independent of the underlying network stack. A good example of this is the Siemens Corporate Tech-
nology, Software & Architectures division’s implementation on Jxta of “an Ad-hoc Awareness applica-
tion that allows mobile users to find people and required resources, such as devices and services, in an
ad-hoc, spontaneous way. The application runs on the Siemens SIMpad mobile device using WLAN
connectivity in the 802.11 ad-hoc mode, which means that no centralized access point is required. Net-
work resources are found in a fully decentralized fashion based on profiles that are stored directly on
mobile devices, the resource providers [SIEMANSJXTA].” Still, an understanding of how MANet
routing behaves will aid the P2P application designers and programmers to create applications that are
suitable to the possible routing delays that might be experienced on these networks.
To this end, we now understand that on a MANet an underlying routing protocol is required to permit
network communication to cross BSS boundaries. Because of the ad-hoc behavior of MANets, routes
can be very dynamic, and as a consequence periodic broadcasting of routing information is not gener-
ally used. Rather, there is a preference for on demand routing protocols. This is to minimize the impact
of routing lookups on the bandwidth that arrives with Intra-MANet, periodic broadcasts by all nodes.
On demand routing protocols as the name implies, discover routes when they are needed. In general
these protocols work as follows: A node broadcasts a routing request in its BSS to find a particular des-
tination route. The broadcast carries an identifier to prevent routing request loops, and a TTL or hop-
count to limit the indefinite propagation of the request. If the route is known by a member of the BSS,
then a route response is returned to the requestor and no further propagation occurs by the responding
node. Other nodes within range of the broadcast will continue to propagate the request if they do not
also know the route. In either case, the propagation persists until either the route is found and a
response is sent, or the hopcount/TTL expires.
We discuss two flavors of on demand routing protocols that are variations on the above theme. The first
is Dynamic Source Routing (DSR) [DSR]. Here, as the route request propagates, each node in the path
adds itself to the list of nodes required to return to the source. When the request finally arrives to the
destination or to a node that knows the rest of the route, then the full source route is returned as a
response to the requestor. The acquired source route is then used to send data to the destination. It’s
dynamic because the ad-hoc nature of the network implies that routes may be short lived. The down-
side of DSR is that the routing response, and any packets sent to the destination will contain the com-
plete source route. For long routes this can be a large portion of each packet.
The second flavor is Ad-hoc On Demand Distance Vector Routing [PERKINS] (AODVR). In this case,
nodes along the discovered route only keep information about the next hop in the route. When a rout-
ing response arrives to the requestor, then the requestor sends the packet to the first hop in the discov-
ered path, and etc. The downside here is memory use for each intermediate node. Each such node must
maintain next-hop routing information for all of the destination nodes in the network it knows. Also,
when routes to destinations no longer exist, routing responses are generated and propagated with infi-
nite hop counts so that any subsequent attempts to reach such a destination will require another routing
request. This approach is preferred because memory is becoming a non-issue for devices as small as
mobile phones. Figure 6-11 demonstrates AODVR.

6-27
node

node2
node
node node

node1 BSS1 BSS2

BSS3

Figure 6-11. Routing from Node1 to Node2

Other hop-by-hop routing protocol is “Topology Dissemination Based on Reverse-Path Forwarding”


(TDBRF) [TDBRF]
So, now we have a better understanding of routing on MANets. MANets can generate a lot of traffic
within a BSS. This is fine as long as the network is not so busy as to prevent timely access for all
nodes. Fortunately network time is extremely fast compared to human time. A great deal can happen
on a network in the “blink of an eye” at 11Mbps to 54Mbps. The latter is almost 7 megabytes in a sec-
ond. This is about 4,667 packets in a second if the MTU is 1500 bytes. That’s a lot of bits!
MANets can use a non-mediated P2P Overlay Network where broadcast is the basis for discovery. All
system documents are broadcast across the MANet much like routing requests, and the PNToPN pro-
tocol is used to establish a direct connection. This is discussed in chapter 4, section 4.3.
Still, we imagine the day when MANets will arise and there will be at least one AP in the proximity
that is also a P2P Overlay Network mediator. This AP will be well known because its presence can be
propagated, that is to say, as soon as this AP is known in a single BSS, its existence will be known
across all overlapping BSS’s. Clearly, these AP-Mediators can have access to the Internet, and thus, the
entire P2P Overlay Network. The possibilities of these combinations are almost without limit. As con-
sequence, we will discuss P2P for wireless networks in this spirit. We do this in the next section.

6.6 P2P for Wireless


We are writing this section with the assumption that almost all mobile devices will or soon will have IP
protocol stacks, and a network presence. By a network presence we mean: First, if the IP address is

6-28
known, then it will be possible to IP “ping” any such device from anywhere on the Internet. The IP
address may not be known everywhere of course. Second, when such a device is powered “on,” it is on
the network without having to make something like a circuit switched telephone call as was required
for pre-GPRS, GSM phones. If perchance the IP stack is not supported, then in this case a protocol
translating gateway will be required. There are already such gateways that permit small Bluetooth net-
works to glue themselves to the Internet as P2P networks and actually run Internet applications like ftp
transparently using Jxta [BluetoothP2P]. The former mobile networks are the focus of what follows
here.
Because these devices still are limited in their capabilities when compared to workstations and laptops,
mediators may be also required to play the role of a surrogate, a partner that can more efficiently man-
age all of the resources that might be of interest to hosted peerNodes. Remember, these devices will
have access to the entire Internet. Certainly, there will be personal CC’s to manage, for example, the
user’s family, friend, clubs, or office CC’s. The management of these CC’s is appropriate on each
device. The data involved does not require that much flash memory space. On the other hand, there will
be thousands of connected communities to discover and to join on the Internet. These CC’s have sys-
tem document descriptions that can more appropriately be mediated rather than stored locally. Simi-
larly, each CC may have thousands of members with VirtualPort and PeerNode documents. The same
rule applies in this case. Finally, there is content. Mobile phones do not yet have 100’s of gigabytes of
disk storage. Thus, it is appropriate for mediators to either proxy the storage of content or store it
locally. In each of these cases minor additions to our protocols are required. These are described in this
section.
One requirement for peerNodes on constrained wireless devices is that mediators are ever present,
highly available systems. This is because they will an active participant in almost everything that the
peerNodes do on the P2P Overlay Network. Let’s imagine that Jim is at his favorite Internet cafe with
his 802.11g enabled laptop, and that Gale is using her 3G mobile phone, and they have a P2P IM appli-
cation. How then does discovery and communication take place? Jim in all likelihood will be behind
NAT and his NAT-external IP address acquired by DHCP. He will rely on a mediator for receiving
ONP messages, and will be hosted by that Mediator with an underlying TCP/IP connection. On the
other hand, Gale will have a 3G network presence satisfying the two requirements just above. As a
consequence, Jim will be able to send ONP messages directly to Gale if her peerNode responds posi-
tively to the RequestDirectConnection PNToPN command as described in Chapter 4, section 4.3. Such
responses will be made by Telco carriers and their handset manufacturers, and will also be a function
of the application being used. For simple IM, and chat room’s, we certainly recommend an affirmative
response. On the other hand, because of Gale’s device’s capacities, streaming video or music to her
may be prohibitive and/or prohibited. The latter prohibition depends whether or not copyright issues
are involved. If they are, then we have provided the mediator infrastructure to support DRM as is dis-
cussed in Chapter 5. In any case, for Gale to communicate with Jim his mediator is definitely required.
Let’s now return to Gale’s mediator requirements. They are a function of her device’s capabilities, and
carrier policies. We concentrate on the former.

6-29
The peerIdentity document has an optional <deviceProfile> field for this purpose. What subfields
might be present are determined by the device manufacturer and these can be used by the P2P applica-
tion writer as required by the carrier. The peerIdentity document is included in the mediator greeting
command. The primary purpose of the profile information for the mediator is to permit it to scope the
resources it will provide for the peerNode. For example, some web-content may have to be trans-
formed to fit the device’s screen size. If the mobile device has sufficient flash memory, then it can be
expected to store a reasonable number of system documents locally. Just like mail servers but with
more intelligence the default mediator’s local device storage can vary from device-to-device. This allo-
cation will occur the first time the device contacts the mediator during the devices carrier initialization
phase. Thus, the device will need N commands.
First let’s look at what the peerNode / mediator relationship permits with respect to the off device stor-
age of system documents. Recall that we have three system documents: The peerIdentity, virtualPort,
and connected community documents. When a mobile device looks up a virtualPort document based
on known meta-data, then the document will be retrieved directly from the remote peerNode on the
P2P Overlay Network. This may or may not involve a non-mediated connection. In either case, before
retrieving the document, the P2P software will know the space required for it’s local storage. If there is
not enough space, then three choices are available. First, it can remove local stale documents if any
exist. Second, it can request mediated storage, and if this fails, then, third, it will require user interac-
tion to make the decision with respect to what local data will be removed to free up enough space to
store the document. Smart software will anticipate the devices storage requirements, and free up space
ahead of time. Involving the user in this process is really a last resort. To accomplish the above tasks
we need three additional PNToMed commands.
The first command is a request for the peerNode’s available storage and is called the RequestStorage
Command:

Command: RequestStorage

Response: Size of storage in K bytes

The second command is used to store data on the mediator. This is somewhat complicated because the
application must be able to identify this data and retrieve it. Thus, for every object that is stored on the
mediator the P2P software must generate two object identifiers: The first is for the local device, and the
second for retrieving the object from the mediator. These may be the same but that is unlikely. The first
must contain some meta-data that a user can interpret: The virtualPort name field from the virtualPort
document might be MobileIM.GalesPeer.Gale@WinnieThePooh.com. On the other hand, the second
can a monotonically increasing integer. Also, since a mobile peerNode may have multiple mediators,
the local object identifier will be bound to a mediator document. Overtime, from this discussion it is
clear that a large amount local storage will be used to manage these objects. To store these objects we
have the StoreObject command:

6-30
Command: StoreObject ObjectIdentifier
ObjectIdentifier := System dependent unique identifier

Response: OK StorageAvailable | Failed ErrorMessage


StorageAvailable := Remaining storage in K bytes
ErrorMessage := Text explanation of failure

The third command is to retrieve stored objects from the mediator. It is the RetrieveObject command.

Command: RetrieveObject L(i), i = 1,2,...,N


L(i) := Object Identifier

Response: O(i), i = 1,2,...,N


O(i) := {ObjectIdentifier(i), Data(i) | Error}
ObjectIdentifier(i) := Requested Object Identifier
Data(i) := {size}Data
size := M, M the number of bytes that follow the right hand “}”
Data := M bytes
Error := “text“ where text is the error message.

The authors realize that all text messages will need to be internationalized, and do not want to get into
the issue of I18N. How to accomplish this is well known. The only restriction above is that the error
message must be a double quoted string to distinguish them from the object data size. And, if there is a
double-quote in the error message, then it must be escaped. This is a standard practice.
For content device local and mediator stored data are both possible. Songs that play for 3 to 5 minutes
can take several megabytes of storage. Music videos are a different story. It may not be possible to
store a 3 to 5 minute music video on a mobile phone or some PDA’s. These can be fifty or more mega-
bytes in size. Similarly, if one considers the issues of copyrights, it may be that in a carrier based
mobile infrastructure that all copyrighted content is mediated, and streamed to peerNodes under the
constraints of DRM. In the extreme, the content may never be stored on peerNodes, and in other cases
because of the storage restrictions on these devices, a very limited amount of content is stored locally,
and what remains is placed on mediators using the above StoreObject command.
This introduces the idea of Mediator-auto-publication. For the retrieval of content whose source is a
mediator, the mediators play dual roles and are also peerNodes. They generate meta-data for their con-
tent, and virtualPort documents for its retrieval, and appropriately hash this data across the mediator
site view in those CC’s to which it belongs. They can also create CC’s, advertise them, and manage all
that any peerNode would manage in this case. In this manner, peerNodes, mobile or not, can lookup
the meta-data by using the query commands described in chapter 4, and similarly retrieve the content
using the ContentRequest command.
This leads to a very good mechanism for peerNodes advertising this content as CC members. Content
providers can create copyright free music and video “trailers.” These can be shared among peerNodes

6-31
as CC members, and thus, promote this content. A trailer includes the meta-data required to download
the content. When another peerNode acquires a trailer, and decides to buy the content, the peerNode
from whom the trailer was acquired gets credit towards a purchase.
These mechanisms can be used to integrate mobile peerNodes with constrained capabilities into a P2P
Overlay Network. This latter network may or may not extend into the Internet and we see no reason
why it should not. One imagines towards a future where the Internet is without boundaries. Thus, as we
have previously stated, one can create a full, private and virtual representation of her or his network
presence. One’s home, automobile, office and recreational activities will be reflected on the Internet
and accessible given the correct authorization. The infrastructure is a transparent background to one’s
day-to-day activities. This leads to an almost unlimited number of applications and of course, there is
always the question of administrating and monitoring the P2P Overlay Network to keep it running in a
maximal way. These issues are discussed in the next chapter.

6-32
Chapter 7 Applications,
Administration,
and Monitoring

P2P is a young but not new technology, and as a consequence, there are
already many popular applications in use. We all know the first P2P applica-
tions were used for file sharing. They were primarily, but not necessarily,
based on Gnutella or a variation thereof. For example: LimeWire and Bear-
Share use Gnutella, while FreeNet,KaZaA, Morpheus, and Napster have their
own protocols. With the success of P2P filesharing opening the door to other
possible P2P applications, as we’ve mentioned several times, the list has
grown to include areas such as grid computing, collaboration, distributed
search engines, sensors, robotics, and gaming. These are all appropriate for
leveraging P2P computing’s advantages. Our goal in this chapter is to intro-
duce new areas that are rich with possibilities for creating exiting applications,
rather than to laboriously list and describe applications like the above. You will
see that this approach will also open the door for innovative ways to both mon-
itor and administer P2P Overlay Networks. In this vein our own research has
led us to the idea of applying Java mobile agent technology to P2P Overlay
Networks. This has turned out to be an excellent underlying technology for
diverse applications in the P2P space. Thus, we begin this chapter with a thor-
ough discussion of Java Mobile Agents, and will introduce a novel protocol,
the P2P Java Mobile Agent Protocol (P2P-JMAP) for their use on P2P Overlay
Networks. We finish with a discussion of P2P Email.

7.1 Java Mobile Agent Technology


Java Mobile Agent technology has been with us since the middle 1990’s. The question we ask our-
selves is, “Why hasn’t this technology succeeded?” It seems so perfectly suited to applications that one
might use, for example, in the e-Commerce, electronic marketplace. One has always imagined Java
mobile agents with a user supplied, or dynamically learned itinerary visiting several computer stores to
return with the best price/performance fit given a buyer’s preferences and price limitations. In the dis-
cussion that follows we show that Java mobile agent applications, with their intrinsic code mobility
and asynchronous behavior, when placed on the P2P Overlay Network, exhibit a resiliency that is
missing in the earlier adoptions of this technology. This is because the P2P Overlay Network, by its
very nature, view network “outages” as “peer presence” events rather than “errors,” and can addition-
ally encourage a self-reorganization of the content peers share, by explicitly migrating this content
closer to the peers which show interest in it. Agents active on such the P2P Overlay Network can adapt
to this natural self-reorganization of “peer presence” and content to give us a more consumer respon-
sive applications. Also, the application programmers using such an agent system need not be con-
cerned with the underlying real network impediments to communication such as Network Address
Translation (NAT) and firewalls. Such agents have almost unlimited application domains among which
are mobile agent shopping, content sharing with digital rights management, content collaboration
workshops, catalogue searches and P2P Overlay Network administration and monitoring, verification
of copyright protection by a roving copyright mobile agent SWAT team, creative gaming where mobile
agents might morph both the difficulty and behavior of a game in progress, etc. The marriage of
Mobile Agent Technology with the P2P Overlay networks is an ideal match, that will benefit both of
these technologies.
If a mobile agent service is to succeed on the current Internet, multiple problems with the current infra-
structure that must be solved: If mobile agents’ itineraries are URI’s which must be resolved to unique
IP addresses, then this is not always possible. Given the millions of systems that must be reached, cen-
tralized name services and data base registration are no longer feasible, and these services are at the
heart of centralized “discovery.” Name to address translations are already overburdened and are a per-
formance bottleneck in today’s Internet [Cheriton88]. We’ve already shown that the P2P Overlay Net-
work’s protocols and system document publication / query can solve this discovery problem without
centralized name services.
Similarly, mobile agent technology must be usable by people at home, in the office, or when they are
traveling. However, most home users do not have registered IP addresses. Again, firewall traversal is a
requirement if one is to reach the Internet from the office. System mobility, e. g., a laptop, cellular

7-2
phone, or PDA, usually requires a protocol like Mobile IP. Wouldn’t it be nice to launch an agent from
an airport, and have it return to one’s mobile “network home” when one arrives at a hotel? To do this,
communication must be independent of locality and IP addresses since fixed IP addresses are no longer
to be counted on as a means of locating a system. As we’ve discussed in chapter 3, the P2P Overlay
Network solves these problems.
Many studies have shown several benefits of mobile agents: reduction of bandwidth; more fault toler-
ant; flexibility in the sense that once a mobile agent service is in place, a mobile agent need not be
installed at each point of service, but rather, can be created and launched across the mobile agent ser-
vice infrastructure; and they can be either persistent or disposable as well as atomic or stateful. In spite
of these benefits, mobile agents have not yet been accepted by the mainstream network application
community [Kotzand99]. The two reasons often sited are security concerns and denial of service
attacks. These concerns are real and are being addressed by the Java and mobile agent communities.
Java provides adequate security of execution if each agent is isolated to its own Virtual Machine (VM).
This brings with it a serious impact on performance. On the other hand, the ability to run multiple
agent applications under the same VM is subject to both application state conflicts and denial-of-ser-
vice attacks because of static variables and static synchronized methods [Czajkowski00, Liang98]. The
Java Community Process has a Java Specification Request and associated expert group working on the
“Application Isolation Specification [JCP121]” to resolve these issues in a standard way. We do not
intend to address these in this chapter. Rather, our goal is to show that the combination of Java mobile
agent technology with the P2P Overlay Network protocols will help to eliminate some of the barriers
to mobile agent deployment. Another aspect of Security is privacy, authentication and integrity of
migrating mobile agents. In chapter 5 we discussed all that is required to solve these problems on the
P2P Overlay Network.
We have a preference for a Java mobile agent service on top of the P2P Overlay Network where agents
are both light-weight and disposable. Because we wish our agents to be able to traverse a suitable sub-
set of the peerNodes without foreknowledge of an itinerary, we also suggest the use of a trust by repu-
tation models as discussed in chapter 5 to aid in evaluating discovered itineraries. In this latter sense, a
peerNode while traversing its initial itinerary may dynamically augment it based on visited peerNodes’
opinions, and the initiating peerNodes’ confidence values for each member of the itinerary. These lat-
ter confidence values would have to be additional data that accompanies the itinerary during traversal.
Finally, on the P2P Overlay Network, since each peerNode plays the role of both “server” and “client,”
this eases of implementation of a mobile agent service. The same service runs everywhere. Therefor, to
add a new agent only requires writing the agent code. Given that a authentication service is in place
when it is required, the new mobile agent will then run on each peerNode hosting a mobile agent ser-
vice without any modification to these systems. We will point out with Java examples that every peer-
Node can easily initialize, execute, and terminate agents. Since the P2P Overlay network is by its very
nature volatile, choosing the next hop in an itinerary is an inherent, real-time “peer presence” decision
managed by the protocols discussed in chapter 4. The complexity of the underlying real-network
behavior is completely invisible to the mobile agent service. In this way both elegance and quality of
implementation is achieved.

7-3
7.2 Implementation of Java Mobile Agents on the P2P Overlay Network
In the section we describe our general purpose, light-weight Java Mobile Agents, then define the P2P-
JMAP protocol, and finally close with a case study done by the authors on the JXTA Platform. More
details of the actual implementation can be found in a paper published by the authors at HICSS-36
[ChenYeager03] in 2003.

7.2.1 Disposable Java Mobile Agents


In the Java mobile agent community both persistent and disposable agents are used. The former have
both an itinerary to follow, an assigned task that they must take to completion, and maintain state. They
may be suspended for periods of time, and under most circumstances, barring a catastrophic system
failure, they will obtain their goal and return to their launching site. They stubbornly persist. On the
other hand, because of the volatile nature of the P2P Overlay Network, it seemed appropriate to us to
use what we call disposable Mobile Agents. These agents are stateless, and computationally light-
weight. Thus, such an agent is always started or restarted in “state 0” at its initial entry point, and run
to completion, or terminated. It can never be halted midway and later resumed at that point at the next
hop on the itinerary. While it is permitted to generate a return payload, this data is never fed back to the
agent as it proceeds from peer to peer. It is rather, an accumulated return value. Furthermore, a hosting
peerNode can chose not to execute a disposable agent and either forward it, i.e., send it to the next
accessible peerNode on its itinerary, or dispose of it. This is very important because in P2P networks
there is never a guarantee that a launched agent will return to its home peerNode.
In order to be launched, traverse its itinerary, and return home, our Java mobile agents require a proto-
col. This protocol, as mentioned in the introduction, is discussed next.

7.2.2 The P2P Java Mobile Agent Protocol (P2P-JMAP)


Rather than use an existing protocol for agent transport, we have chosen to create our own. The reasons
are clear. First, we are targeting the P2P Overlay Network rather than the traditional client/server net-
works where previous mobile agent protocols have been applied; and second, we want to take advan-
tage of the P2P Overlay Network protocols because the mobility features these protocols provide are
explicitly directed at P2P Overlay Networks and greatly simplify the task of traversing an itinerary
since the barriers imposed by the underlying real network will be of minimal concern to the applica-
tion programmer.
The P2P-JMAP is prototypical at this stage since it has been minimally tested in a research environ-
ment. Before describing P2P-JMAP in detail, let’s look at the semantics of the flow of events from
when an agent is launched to when and if it returns home. This will both set the context for, as well as
clarify the detailed definitions.

7-4
After becoming active in the desired connected community, the first step is to create a virtualPort doc-
ument to describe the local communication channel used for receiving mobile agents. The virtualPort’s
text name is the name of the local, listening virtual socket and the virtualSocket is type unicast or uni-
cast-secure. In order to let other peerNodes in the same connected community lookup this virtualPort,
the virtualPort document is immediately published as described in chapter 4. While any meta-data that
uniquely defines the virtual port document can be used in the query PNToMed command to lookup the
document, to simplify our explanations, only the text name is used here. Note that in chapter 5 we
mentioned several ways to assure that this text name is unique within its connected community. This is
very important to keep in mind for itinerary construction. Finally, a local daemon is started to listen for
incoming mobile agent connection requests.
The construction of a mobile agent is just as straight forward. We create an ACP/ONP message that
comprises elements like the home daemon virtualPort name, the itinerary to follow, the Java class
name and class byte codes to execute. The itinerary is a list of listening, virtualPort names. This list
may be either static, learned at launch time or a combination of both. The agent may also return home
with additional virtualPort names to be evaluated for its next launch. We have the option of signing the
this message.
At launch time the application attempts to connect to one member of the itinerary. If at least one mem-
ber is up, this succeeds, and the agent is sent to that listening daemon, where it is processed. This con-
tinues until the itinerary is exhausted, or the agent is disposed of, and if this is not the case, then the
agent is sent home. After the agent returns home, the initial itinerary may be reevaluated. Figure 7.1
demonstrates this process:

Initial itinerary: 2,3,4,5


5
1
4

2 3
2 is down

(updated itinerary: 3, 4,5)

Figure 7-1. Agent and Its Itinerary

One can assign a time-to-live (TTL) to each member of the itinerary. For example, the TTL can be a
simple integer counter that is decremented each time an attempt is made to connect to the listening vir-
tualSocket of associated itinerary member. If this counter reaches zero, i.e., the member could not be
contacted after TTL attempts, then that member is removed from the itinerary. Let assume in our dis-
cussion that positive integer counters are used for the TTL’s. At any hosting peerNode, if the itinerary
is non-empty, an iterative attempt is made to connect to each member, one at a time, sending the
mobile agent by the means of the first successful connection. Clearly, appropriate time-outs are taken

7-5
after making a full itinerary traversal with no success. If all of these attempts fail given the TTL’s, and
time-outs, then an attempt is made to connect to the home listening virtualSocket. If that fails, then the
hosting peerNode decrements the TTL associated with the home listening peerNode, waits an imple-
mentation dependent time-out, and finally, will dispose of the agent when the TTL is zero.
The following table specifies the P2P-JMAP message data format. The message data is compose of
multiple fields and the required elements are marked with a “*”. Each field has a distinct MIME con-
tent-type that is “content-type: text/plain” unless noted otherwise.

Table 1: P2P-JMAP Message Data Format

Name Space Field


AgentHome* Home virtualPort name, TTL pair.

AgentLaunchName* Unique agent Identifier. This is system dependent and is used for
identifying each launched agent.

AgentFrom* Host agent virtualPort name, or the current hop.

AgentTo* The destination agent virtualPort name, or the next hop.

AgentMediator Known Mediator virtualPort names to be used as staging areas when


an agent cannot return home. An agent is quiescent, and is processed
as the hosting mediator sees fit.

AgentDynamicItinerary* The remaining itinerary. This is a comma-separated list of “virtu-


alPort name, TTL” pairs preceded by a member count and a “:”.
An example is 2: AgentService.agent1,3, AgentService.agent2,1.
AgentInitialItinerary The itinerary at launch time. This includes the first AgentTo value.

AgentClass* Mobile agent class, i.e., the class byte codes. This may be either the
Java class or a jar file containing the Java class.
Content-Type: Application/Octet-String

AgentClassName* Mobile agent class name

7-6
Name Space Field
AgentParameters Initial set of parameters that can be passed to the mobile agent. The
parameter list is in http argument format:
<http argument list> := arg1=v1&arg2=v2&...&argn=vn.
Each argi is an agent dependent ascii text string, and each vi is an
agent dependent ascii text string parameter value. Neither “=”, “&”,
nor <LF> are permitted characters in these text strings. <LF> is ascii
linefeed. The text string may not start with “{“.
Also, binary base64 encoded arguments are permitted. In this case
the argument format is:
arg={length N in bytes}N base64 encoded data bytes<LF>
For example: bytes={10}aN0f6XiYG+9<LF>

AgentResults This is the accumulated return payload. The results are concatenated
lines, Li, i = 1,...,n, each Li in http argument format: <http argument
list><LF>.
Binary results are permitted as with parameters.

AgentVisitedList* Agent pipe names of peers visited. As each successive peer is visited,
its agent daemon name is appended to this comma-separated list.

AgentSignature Subset of a X509.V3 certificate for signing this message. This is the
base64 encoding of the ASN.1 DER encoding of the following fields
from the certificate used to generate the signature itself: issuer, sub-
ject, signature algorithm, digital signature. If this element is not
present, then the message is unsigned.

AgentError A courtesy error return element:


Error code Information Meaning
1 Element name malformed message
2 none bad signature
3 none unknown issuer
4 none unknown subject
5 none certificate expired
6 none algorithm not supported

The AgentHome is the virtualPort name of the peerNode that launched the mobile agent, and the TTL
to be used when attempting to send an agent home. At any time that is it necessary for a hosting agent
service to send the agent home, this field is used to locate, connect to the home virtual socket, and
manage connection failures. Also, see the AgentMediator field and its relationship to this field.
The AgentLaunchName as a unique identifier on the launching system is useful for that system’s agent
service to identify agents. When multiple agents are launched in our prototype, this field provides us a

7-7
mechanism to create data structures to track these agents. In our prototype we used the following inter-
nal class to define such an object:

private class LaunchStatus {


public LaunchStatus(String identifier,
String className,
long launchTime)
{
this.identifier = identifier;
this.className = className;
this.launchTime = launchTime;
this.RTT = 0;
this.returned = false;
this.results = null;
}

String identifier; // AgentLaunchName


String className; // AgentClassName
long launchTime; // System launch time in milliseconds
int RTT; // Round trip time for traversal
boolean returned; // True if agent has returned home
String[][] results; // The AgentResults
}

A hash table was used to store instances of this class hashed on the passed parameter identifier thus
enabling the agent service to track each launched agent. The launchTime is used to calculate both the
RTT and the expiration timeout.
The AgentFrom and AgentTo specify the hosting and next hop agent services. This is a standard source
/ destination pair that is useful for monitoring and debugging.
The AgentMediator field may be used when the itinerary is exhausted and the home agent service is
not responding. In this case a hosting agent service may not have the resources to permit agents to wait
locally until the home service is reachable. There is the choice to dispose of the agent. On the other
hand, we decided to add an optional staging area on mediators for agents waiting for a viable home
connection because the mediators tend to be more stable systems with adequate resources to provide
this service. In this case the mediator will require and additional port called the AgentStagingPort. The
management of staged agents is up to the implementation. Mediator staging is optional as this parame-
ter.
The AgentDynamicItinerary is the list of agent services to be visited. With each itinerary member entry
there is also a TTL. This itinerary is traversed as discussed above.

7-8
The AgentInitialItinerary is the itinerary in its original form. This is optional and a useful book keep-
ing parameter. In particular, since members of the AgentDynamicItinerary are removed from the itiner-
ary after either being visited or when the TTL expires, this parameter permits a home service to
evaluate the itinerary traversal’s status taken in conjunction with the AgentVisitedList parameter. This
parameter is optional since an implementation of these services may decide to keep it as part of its own
data.
The AgentClass and AgentClassName are required by the Java classloader to load and run the Java
mobile agent. The following Java code is an example of how to use a classloader to run a mobile agent.

// A class loader for loading Java mobile agents


class agentClassLoader extends ClassLoader{
byte[] classBytes = null;
String agentName = null;

public agentClassLoader(String className, byte[] bcodes)


{
classBytes = bcodes;
agentName = className;
}

public Class findClass(String name) {


// We should only be called for our own classes
if (!name.equals(agentName))
return null;

Class c = defineClass(name, classBytes, 0, classBytes.length);


resolveClass(c);

return c;
}

// Create an instance of the class loader passing the AgentClassName


// and AgentClass parameters
ClassLoader loader = new agentClassLoader(agentClassName, bcodes);

// Load the JavaMobileAgent class


JavaMobileAgent torun = (JavaMobileAgent)loader.loadClass(
agentClassName).newInstance();

// Run the class passing AgentParamters


Vector agentResults = new Vector(1, 1); // For returning the parameters
boolean success = torun.start(AgentParameters parm, agentResults results);

7-9
In the case where the AgentClass field is a jar file, this jar file can be signed and thus provides a way to
create collections of Java mobile agents that are authorized to run in a particular connected community
by the signer of the jar file. See the discussion on authorization below.
The AgentParameters field is the parameters to pass to the mobile agent when it is called. The Agen-
tResults is the accumulated payload for each execution of the mobile agent. This may include newly
discovered itinerary members for the next itinerary traversal.
The AgentVisitedList is a list of the virtualPort names of those agents that are visited on the traversal of
the AgentDynamicItinerary.
The AgentSignature’s signature is computed using the signature algorithm and a private key to digitally
sign the union of the bytes of the following fields:
AgentHome*, AgentMediators, AgentInitialItinerary, AgentClass*, AgentClassName*,
and AgentParameters.
As a default, only three required elements (marked by *) need be part of the signature calculation in a
minimally signed P2P-JMAP message. A valid signature guarantees the signed content has not been
modified during the itinerary traversal. In order to validate the signature the hosting peerNode must
have a valid X509.V3 certificate. The issuer and subject fields of this certificate must match issuer and
subject fields that are included in the AgentSignature parameter. This will guarantee the correct public
/ private key pair is being used.
As discussed above, a peer hosting a mobile agent service can abandon a visiting agent at any time
without notifying the originator. Also, given the possibility of a badly formed message, the AgentError
is an extremely helpful debugging tool and, in some cases, a simple courtesy to be used to notify the
agent’s home service of detected errors in format, or an invalid signature.
In order to test and validate the above mobile agent protocol we wrote a Java mobile agent service that
implemented the protocol, and then prototyped the communication layer to run on the Jxta 2.0 infrac-
structure. What we discuss below is independent of latter infrastructure considerations. Here we will
describe how to create a Java Mobile Agent Service on a P2P Network in general and thus this service
can be implemented on the P2P Overlay Network as specified in this book.
To begin with not all peerNodes will necessarily launch Java mobile agents. There will be cases where
peerNodes only wish to provide mobile agent hosting services. For example, a peerNode may be will-
ing to provide administrative monitoring information to authorized visiting mobile agents. This autho-
rization can be done using the AgentSignature parameter. In this case, the agent to be authorized will
have its launching peerNode’s X509.V3 certificate on all hosting peerNodes. This is public key infor-
mation and can be stored and accessed from a data base. The critical authentication information is the
issuer’s root certificate and the launching peerNode’s private key. The issuer by granting a certificate
authorizes the owner to sign and launch mobile agents.
We have an AgentListener class, AgentListener.java, that will listen for incoming connections in the
connected community in which it runs. In our implementation it opens a thread and runs as a daemon

7-10
beneath the JVM. This class runs the Java mobile agents, and then, as discussed above, sends them on
the next available hop on the itinerary.
If a peerNode wishes to launch mobile agents from an application, then the application creates an
instance of the AgentService class. This class extends the AgentListener class, and when it is instanti-
ated it appropriately calls its super class to start a listener. The listener is required to permit mobile
agents to return home. It may or may not host other mobiles agents at the applications discretion. The
AgentService class has all of the methods that are required to launch, track, expire by TTL, and as dis-
cussed in the next paragraph, analyze the returned payload.
As well as extending the AgentListener class, the AgentService class also implements the AgentEven-
tResponder interface. The implementation of AgentEventResponder requires a method that is called by
the AgentListener when a locally launched mobile agent returns home. The AgentService class’s con-
structor initializes this method in its super class before returning. Figure 7-2 shows the class layout.

Figure 7-2. Agent Class Organization

To clarify the above descriptions we include Java code fragments rather than 4PL code since we are
discussing Java. The Java prototype implementation on Jxta is found in Appendix 3.
We begin with the constructor of the AgentListener class and the method for setting the event
responder callback.

public class AgentListener implements Runnable, VirtualSocketListener {

/*
* AgentListener: Listens for incoming mobile agents
*
* @param ConnectedCommunity CC class for agent traversal

7-11
* @param VirtualSocket VirtualSocket class
*/
public AgentListener (ConnectedCommunity cc, VirtualSocket vSocket)
{
// the CC in which we open the virtualSocket
this.cc = cc;
homeCCName = cc.getConnectedCommunityName();

// Connected Community Query Service


this.queryService = cc.getQueryService();

// Local virtual socket information


this.vSocket = vSocket;
this.homeVirtualPortName = vSocket.getVirtualPortName();

// We are off and running: Save start time


bootTime = System.currentTimeMillis();
}

AgentEventResponder eventResponder = null; // callback to AgentService


// for locally launched agents
/*
* setEventResponder Sets the callback for a launched agent that has
* returned home
*
* @param AgentEventResponder The object that has the responder method
*/
public void setEventResponder(AgentEventResponder responder)
{
eventResponder = responder;
}

The next code fragment shows the call in the AgentListener class to the AgentEventResponder
acceptAgentResults method as implemented in the AgentService class. The call happens when an
agent returns home:

if (eventResponder != null &&


(agentHome.compareToIgnoreCase(homeVirtualPortName) == 0) {

eventResponder.acceptAgentResults(identifier, // Unique identifier


agentResults, // Return payload
peersVisited);// Peers visited
}

7-12
Next, let’s take a look at the AgentService class contructor for which the AgentListener is a super class.
Note that it calls AgentListener.setEventResponder(this) to set the callback for locally launched agents
that return home within their TTL lifetimes.

/*
* Mobile Agent client. Launches agents.
*/
public class AgentService extends AgentListener implements AgentEventRe-
sponder {

/*
* Agent: Starts listener
*
* @param ConnectedCommunitycc
* @param VirtualSocket vSocket
*/
public AgentService(ConnectedCommunity cc, VirtualSocket vSocket)
throws Exception
{

super(cc, vSocket);

// point the super class to our agentReply() method


super.setEventResponder(this);

// Get our virtual port name


homeVirtualPortName = vSocket.getVirtualPortName();

// Our home cc
this.cc = cc;
this.CCName = cc.getName();

Our prototype first launches a Java Mobile Agent with the following method:

/* launchAgent Launches the agent "classname".


*
* @param className Agent class name (full path name)
* @param itinerary array of virtualPortNames
*
* @return Integer agent identifier ( < 0 if launch failed)
*/

7-13
public Integer launchAgent(String agentClassName,
String[] itinerary)
throws IOException
{

synchronized (syncObject) {
int ID = -1;

// load the byte codes from the jar file


byte[] bcodes = getByteCodes(agentClassName);
if (bcodes == null) {

throw new IOException(agentClassName + " is not in the agents.jar");

// get the unique ID


ID = ++agentID;
String identifier = (new Integer(ID)).toString();

// launch the agent


if (!sendAgent(itinerary, identifier, bcodes, agentClassName))
return new Integer(-1);

// add this agent to our active list


long startTime = System.currentTimeMillis();

// Status object
LaunchStatus lStat = new LaunchStatus(identifier, agentClassName,
startTime);

// add to hashtable
activeAgents.put(identifier, lStat);

return new Integer(ID);


}
}

Following in this vein, we next have the prototype implemenation of acceptAgentResults


method in AgentService that is called from AgentListener as is discussed just above.

/*
* acceptAgentResults Callback for the response. The AgentListener
* calls back here
*

7-14
* @param String AgentLaunchName as an identifier
* @param String AgentParamters results from returning agent
* @param String AgentVisitedList peers visited on traversal
*
*/

public void acceptAgentResults(String identifier,


String results,
String peersVisited)
{
synchronized (syncObject) {
// Get the launch status from our active agents hash table
LaunchStatus lStat = (LaunchStatus)activeAgents.get(identifier);

// If null, then its TTL has expired and it has been removed
// from the hash table
if (aStat == null) {
System.out.println(">> Oops .. agent not in active list");
return;
}

// Turn the results into a String[][]


String withVisitors = null;
if (peersVisited == null || peersVisited.length() == 0) {
// NO visited peers? Weird.
withVisitors = results;

} else {
// Start with the visited peers in a http param string
withVisitors = peersVisited;
// Append the results
if (results != null && results.length() > 0)
withVisitors += "&" + results;
}
// create the parameter array
String[][] rData = makeParamArray(withVisitors);
// replace the hashtable entry with updated entry
lStat.results = rData;
lStat.returned = true;
lStat.RTT = ((int)(System.currentTimeMillis() - lStat.launch
Time))/1000;
// Replace in hash table
activeAgents.put(identifier, lStat);
}
}

7-15
AgentService also permits an application to track the progress of a launched agent to see if it has
returned home yet. In doing so it passes a TTL parameter which if expired causes the launched agent to
be removed from the LaunchStatus hash table, activeAgents. This is shown in the following prototyped
code:

/* returnedHomeCheck Checks to see if an agent has returned home.


* It will wait until the timeout expires
*
* @parameter theID Integer identifier of agent
* @parameter lifeTime Expiration time in seconds.
*
* @return 0 not yet returned home
* 1 returned home with results
* -1 Timeout expired. Removed from list..
*/
public int returnedHomeCheck(Integer theID, int lifeTime)
{
synchronized (syncObject) {
// Sanity check
int ID = theID.intValue();

if (ID <= 0 || ID > agentID) return -1;

// Convert to string
String agentIdentifier = theID.toString();

LaunchStatus lStat = (LaunchStatus)activeAgents.get(agentIdentifier);

String statusLine = "returnedHomeCheck for ID = " + agentIdentifier +


", status[";
// No longer in the list?
if (lStat == null) {
System.out.println(statusLine + "NOT A VALID ID]");
return -1;
}

if (lStat.returned) {
// Have results
System.out.println(statusLine + "RETURNED HOME]");

return 1;

} else {
// check expiration
long voyageTime = System.currentTimeMillis() - lStat.launchTime;
if (voyageTime > (long)(lifeTime*1000)) {

7-16
// remove the entry
activeAgents.remove(agentIdentifier);

// The agent is lost or late


System.out.println(statusLine + "EXCEEDED LIFETIME]");

return -1;

} else {
// Still out there
System.out.println(statusLine + "VOYAGING]");
return 0;
}
}
}
}

Finally, we show a small example of a method that will use the above classes to launch an agent. In this
example it waits on timer for the agent to return home. It is passed instances of the Connected Commu-
nity and VirtualSocket classes, the agent class name, the itinerary and an instance of a class that has a
method to analyze the return results. First, it will create a new instance of the AgentService class. Sec-
ond it will launch the agent, check for their status, and if they return home within the prescribed time
interval, then call the pay load analyzer to evaluate the results. The payload analysis depends on the
agent that is launched. Every agent has a different task and thus will return different results. If the
agent successfully returns home, then the method returns true. Otherwise it returns false.

public boolean sendMobileAgent(ConnectedCommuity cc,


VirtualSocket vSocket,
String agentClassName,
String[] itinerary,
PayloadAnalyzer pla) {
// Create a new instance of our AgentService
AgentService service = new AgentService(cc, vSocket);
// Wait 15 seconds for every itinerary member.
int timeToWait = itinerary.length * 15; // 15 second wait/member

int n = 0; // a counter
while (true) {

Integer ID = null;
try {
// Launch agent

7-17
ID = service.launchAgent(AgentClassName, itinerary);
} catch (IOException ioe) {
System.out.println("Launch failed: " + ioe.toString());
break;
}
n += 1;
while (true) {
// Wait for return or timeToWait to expire
try {
Thread.sleep(5000); // 5 seconds
} catch (InterruptedException ie) {
System.out.println("Thread Interrupted: " + ie.toString());
return false;
}
// See if the thread has returned
int check = service.returnedHomeCheck(ID, timeToWait);
if (check == 0) continue; // still waiting
if (check == -1) {
// TimeToWait expired
String error = "Agent[" + n + "] =" + AgentClassName +
" TTL expired";
System.out.println(error);
pla.setNoResults(agentClassName, "TTL Expired");

return false;
}
// Agent returned home: get http parameter list as a 2-dimensional
// String array, and analyze the results
String[][] results = service.getAgentResults(ID);

// Give the results to the payload agent handler


pla.analyze(results);

return true;
}
}
}

When we first implemented the prototype Java mobile agent services on Jxta, our initial mobile agent
was called Ping. Ping is a generalize form of the traditional ping command where the Ping agent tries
to visit all of the peerNodes on its itineary. There was no return payload. Rather, only the AgentVisit-
edList was of interest. Next, to further illustrate the power of mobile agents on a P2P infrastructure we
wrote the “Poblano” mobile agent.
Poblano [POBLANO] is a reputation based, distributed trust model for accessing and evaluating con-
tent on P2P networks. We wrote an e-Commerce, Java mobile agent, the Poblano agent, that used the

7-18
Poblano algorithms, the JMAP protocol and several JXTA Shell commands to do a Jxta peer group,
Poblano based keyword search on content distributed across the Jxta Overlay Network. The same con-
cepts can be applied to the P2P Overlay Network and Connected Communities as described in this
book.
While the Poblano agent is content independent, to explain Poblano we use the example of finding the
best restaurants given a Jxta Overlay Network distributed collection of restaurant guides. On the Jxta
Overlay Network one uses peerGroups in a manner similar too but not equivalent to our own Con-
nected Communities. Our own CC’s are much richer and handle communication differently. Still, at
the conceptual level they are similarly motivated. They both have the notion of a “walled garden.”
In the Jxta context, each peer that the Poblano agent queries is a member of the “Les-Gourmands”
peerGroup. In this example we rate restaurants by cuisine-type, price, quality, ambiance, and service.
A search is keyed by both a major and minor category. For restaurant searches, the major category is
“restaurant,” and the minor category is cuisine-type, for example, “restaurant,” “mexican.” In all that
follows ratings are on a scale of -1 to 4 as is required by Poblano.
For each known member of the “Les-Gourmands” peerGroup, a peer keeps two XML documents.
The first document is for peer confidence, and the second the content confidence. The peer confidence
XML document is the local peer’s rating of the peer to whom the content belongs. The other document
is the peer’s rating of the content it contains. The document has multiple keyword sets, one for each
recommendation, based on “restaurant” and cuisine-type. Each keyword set has two entries: The path
to the peer (in Jxta terminology its pipe name) and the local peer’s confidence in the peer, or the remote
peer’s reputation. The content XML file has the corresponding remote peer’s restaurant guide ratings
for the sub-categories above used to evaluate each restaurant. Each restaurant rating has the restau-
rant’s name, price expressed in number of “$,” quality expressed in number of “*,” ambiance as quiet,
noisy, and loud, and service expressed in number of “+”. For each symbol the scale is one to five. Fig-
ure 7-3 gives peer and content confidence XML document examples. Note that Codat in the context of
Jxta means code or data. We prefer the more general term content.

7-19
<?xml version="1.0"?> <?xml version="1.0"?>
<!DOCTYPE jxta:MA> <!DOCTYPE jxta:MA>
<jxta:MA xmlns:jxta="http://jxta.org">
<keywordset> <jxta:MA xmlns:jxta="http://jxta.org">
<keyword> restuarant </keyword> <keywordset>
<subword> mexican </subword> <keyword> restuarant </keyw
<link> <subword> mexican </subwo
<content> <link>
Celia's, $$, ***, noisy, ++ <agent> a1 </agent>
</content> <conf> 3.5 </conf>
<conf> 2.4 </conf> </link>
</link>
</keywordset>
</keywordset>
</jxta:MA>
</jxta:MA>

Codat_Conf.XML Peer_Conf.XML
Figure 7-3. XML confidence files

A peer’s final “Poblano score” is based on the values in both documents and is computed when the
mobile agent returns home. The agent’s only task is to gather information based on the original two
categories with a set reputation lower bound for evaluation, e. g., only return the ratings for the two
keywords “restaurants” serving “mexican” cuisine if the peer confidence is greater than 2.0.
We launch the mobile agent with the two keywords described above, a discovered itinerary and the
above lower bound. The agent makes a tour, and returns home with the acquired information as a pay-
load, and the data is then evaluated. The restaurants are ranked given the initial conditions for the
search, and a new itinerary is built for the next search. This itinerary may contain recommended peers
not on the earlier itinerary.
The Poblano search Jxta shell commands are as follows:

JXTA> agent -i
- Shows the current poblano itinerary.
JXTA> agent -r
- Resets the poblano itinerary. This forces a full discovery of all mobile
agent daemon pipes belonging to the current peerGroup.

7-20
JXTA> agent -p PoblanoClassName -k keyword1 -s keyword2 -m
confidence_value_lower_bound
- Invokes poblano search, launches a mobile agent named PoblanoClassName,
major keyword is keyword1, minor keyword is keyword2, and confidence value
lower bound is the real number in the interval -1, 4
For example:
JXTA> agent -p poblano -k restaurant -s mexican -m 2.0

The first time the above command is executed the Jxta attempts to discover all listening mobile agent
uni-directional, communication pipes in the Les-Gourmands peerGroup using the Jxta search protocol.
This collection of pipes forms the initial JMAP itinerary. Successive “agent -p” commands modify the
initial itinerary, adding and deleting members given the information acquired on each itinerary tra-
versal.
In the screen shot below, the agent command launches poblano.class to find all mexican restaurants
with a confidence value lower bound of 2.0. Prior to launching the agent, the Jxta discovery protocol
found two listening mobile agents, agent3 and agent1. Agent3 is contacted, and our mobile agent is
sent to that peer. Subsequently, and not shown in the screen shot, agent3 contacts agent1 and our
mobile agent also visited agent1. The mobile agent is then sent home by agent1 with the accumulated
restaurant guide information.
Two restaurants were found on agent3. The first, La Estrella, is accepted with a confidence value of
2.7, and a second restaurant, Que Lastima, is rejected. Additionally, two new peerNodes, pcAgent and
rarelyHome are recommended by agent3 and agent1, respectively, to be added to the itinerary. Finally,
an new itinerary is generated for the next launch: agent3, agent1, pcAgent, and rarelyHome.

7-21
Figure 7-4. Poblano Screenshot

7-22
As the above examples point out, Java Mobile Agents are suited to P2P Overlay Networks to do many
tasks in the background and in a non-intrusive, adaptable way. They are secured by signatures to assure
both the integrity of the P2P-JMAP data, and that only connected community approved agents can be
run. This provides a mechanism to certify mobile agents. Certification in turn can be used to generate
revenue: It can be used as a form of licensing to guarantee payment for mobile agent software that is
being used; Independent Software Vendors can have a relationship with ISP’s or carriers where the lat-
ter authorize applications and in particular mobile agents to be used in their infrastructure by the
means of a digital signature, and use implies payment. From the latter point, when mobile agents are
used for the administration and monitoring of the P2P Overlay Network, certification is also very
important because it can guarantee that agents used in this context will be non-intrusive and respect the
privacy of the systems that they are monitoring in the background. Here again, the mobile agents are
certified as non-intrusive by the existence of the signature. In the next section we discuss the topic of
monitoring and administrating the P2P Overlay Network with Java Mobile Agents.

7.3 The Management of the P2P Overlay Network with Java Mobile
Agents
As we have thoroughly discussed, a great deal of attention is being given to developing the functional-
ity of P2P networks, and this book is all about that topic. As new P2P networks literally “hit the
streets” nearly every week, we have seen very little effort to put in place a management paradigm that
is appropriate for this technological sector. Why is this? We see one key reason for this: There is nei-
ther a common infrastructure nor interoperability between the existing infrastructures. Thus, it is
appropriate for us to propose a means of both monitoring and administrating any P2P network that is
an implementation of the one we describe because we are primarily targeting a solution for interoper-
atbility by the means of commonality, that is to say, a single P2P Overlay Network with its protocols.
To this end, we propose that a mediator centric, P2P Overlay Network is amenable to the active partic-
ipation of Java Mobile Agents performing non-intrusive management tasks. That, in fact, these agents
can provide us a unified, single view of the state of such a topology. Thus, the P2P Overlay Network
along with Java Mobile Agents can yield a single management view. For us, management therefore
incorporates both P2P Overlay Network Management as well as the state of the mediators and hosted
peerNodes. We are really going beyond what is commonly considered network management to incor-
porate the general health of the entire system. Yes, this is a large effort and we do not intend to describe
all of the solutions here in detail, but rather point the way, and put in place the groundwork. P2P net-
works are after all suitable digital eco-systems for supporting mobile agent technology, and it is natural
to extend their role to aid in the careful management of the network of peerNodes and mediators that
co-habitat this environment.

7-23
While gathering administrative information on the P2P Overlay Network using Java Mobile Agents is
a novel, and an additional tool in one’s P2P Overlay Network management toolbox, the triage and
analysis of this gathered information is driving force, the “raison d’être” for these Java Mobile Agents.
We define what we call the P2P Overlay Network’s “management heartbeat.” This is the pulse of the
P2P Overlay Network. And, just as a doctor first takes a patient’s pulse, we first take our P2P Overlay
Network’s pulse. What exactly is this pulse, what are the upper and lower bounds for the pulse rate,
and what does it imply with respect to the health of our network?
To begin with so that peerNodes and mediators can consistently manage the P2P Overlay Network in
our proposed model, the mediators will use the known management CC-UUID that is advertised to the
peerNodes in the Mediator Document using the mediator greeting command. Then, to create and main-
tain the management heartbeat, each of our mediators will launch one or more Java Mobile Agents on
a periodic timer with the MedToMed mobile agent commaDoctorP2PNetworknd. This command will
use ACP/ONP because of the unrestricted size of the P2P-JMAP Message. It is these Java Mobile
Agents and the periodic data they collect that yield the P2P Overlay Network’s heartbeat whose pulse
we can take. There is the issue of how this data will be formatted. One might imagine using an SNMP
Management Information Base but we feel this is too heavy handed. Another chose would be XML
files with well defined name spaces. We prefer this latter approach. We are introducing new technol-
ogy, and XML is the wave of the future.
To begin simply, let’s suppose that only a ping Java Mobile Agent is used. This Java Mobile Agent fol-
lows its itinerary of mediators and the only information that is returned is the AgentVisitedList. This
can be saved along with the AgentInitialItinerary. Comparing these two lists yields which mediators
are active during the itinerary traversal. Each such entry will be time stamped using UTC as defined in
rfc3339. The periodic itinerary traversals yield a regular “who is up” hearbeat. Then, for our stetho-
scope we use a simple management application named DoctorP2PNetwork. DoctorP2PNetwork uses
the PNToMed protocol’s GetManagementData command as defined in chapter 4, section 4.2.3.7.
Suppose, for example, that we have four mediators, M1, M2, M3, and M4, and a periodic ping mobile
agent is being sent every ten minutes. Then the peerNode running DoctorP2PNetwork is active in the
management CC, and sends a PNToMed command requesting the ping data using UMP./ONP with the
following command format:“GetManagementData, ping.” Furthermore, for brevity we assume that
only the three most recent ping status statements are returned as a response to this command in XML
format as follows:

<ping>
<MediatorName> M1 </mediatorName>
<uptime> 19 Days, 05:23:17 </uptime>
<status>
<time> 2004-03-09T20:08:08.88Z </time>
<initialItinerary> M2, M3, M4 </initialItinerary>
<visited> M2, M3, M4 </visited>
</status>

7-24
<status>
<time> 2004-03-09T20:18:08.77Z </time>
<initialItinerary> M4, M2, M3 </initialItinerary>
<visited> M4, M2 </visited>
</status>
<status>
<time> 2004-03-09T20:28:08.00Z </time>
<initialItinerary> M3, M4, M2 </initialItinerary>
<visited> M3, M4, M2 </visited>
</status>
</ping>

The <uptime> is how long M1 has been up and running. The format for uptime that we are using in this
example is DAYS Days, HH:MM:SS for the number of days, hours, minutes and seconds.
To the human observer it appears that there is a network partition between M2 and M3 as is seen in the
2nd status report. This kind of analysis can easily be done by software too. Assuming that
DoctorP2PNetwork has this kind of intelligence, it would launch two ping Java Mobile Agents with
{M2, M3} and {M3, M2}, respectively, as itineraries to verify its connectivity to these possibly parti-
tioned mediators. If the ping agent visits both mediators in both cases, then it would next would try the
full itinerary to see if the problem has gone away. Also, and most importantly, a responsible party
would be contacted with the analysis. Mediator itinerary traversal by Java Mobile Agents can be quite
complicated as the number of mediators increases. Thus, one looks for repeating patterns have dura-
tion rather than transitory conditions. Software is much more able to detect possible problems at a
glance, displaying the active mediator map on a monitor with possible errors highlighted. This is very
much like air traffic control.
What mobile agents permit us to do is use a single mediator as a focal point for mediator activity
across the active mediators. Here, if one were to do a “systat” or Unix “ps,” then one sees a global view
where rather than listing processes on the local system, one lists the mediators and their activity. The
following table is a simple illustration of this concept of a Single Management View of peerNode and
Mediator interaction. Thus, we call our proposed management system, PM1, and we call the view the
PM1 View.

TABLE 1-2. PM1 View 2004-03-11T20:08:08.88Z

Hosted Messages/ uptime


TR1 MMap Size peerNodes N Active CC’s second (DD:HH:MM) Routing Errors
M1 5 347 23 10,000 (*) 23:19:57 39

M2 5 112 19 3,001 99:22:42 68

M3 4 239 8 7,118 100:22:11 17

7-25
TABLE 1-2. PM1 View 2004-03-11T20:08:08.88Z

Hosted Messages/ uptime


TR1 MMap Size peerNodes N Active CC’s second (DD:HH:MM) Routing Errors
M4 5 101 29 4,000 03:01:01 91

M5 5 7 6 333 00:01:33 299

The PM1 View can be used for further management analysis either by launching more mobile agents
or using DoctorP2PNetwork to directly interrogate a possibly troublesome mediator. In the above table
the values appear as static, but in reality would be updated periodically with DoctorP2PNetwork. Like
in any management system, one would expect alarms to be set and triggered when the data collected
crosses some critical boundary. We have tried to show this with the “(*)” flag in the messages per sec-
ond column. In some management systems the alarm thresholds are set statically, that is to say, some
person has an idea of what an alarm condition is, and sets it by the means of a UI. We see this as inap-
propriate because one first needs to understand the normal behavior profiles of Mediators by collecting
enough data model the P2P Overlay Network’s behavior, say one week’s worth of data to initiate pro-
file boundaries, and then use simple statistical analysis to trigger alarms given a continuous collection
of data. So, in the above situation, such a scheme would have set the alarm for 20:08 to be the average
messages per second taken from the profile for this time plus or minus 2 standard deviations. One
imagines that a person would click on the entry and see the actual alarm values:
Thursday at 20:00:00 Average Messages per second = 9,200
Standard Deviation = 400 (ALARM ON)
The person should then be able to turn on a visual display to see a graph of normal versus current
behavior:

Messages per Second

19:00 20:00 20:10 time

Figure 7-5. PM1 Profile with Alarm Condition


In figure 7-5 the blue line shows normal behavior in a “tube” bounded by plus or minus two times the
standard deviation of the modeled data. The red curved line is the observed data. At 20:00 the alarm
condition is set because the collected data exceeds the normal behavior plus two standard deviations.
Notice that we have avoided running mobile agent services on peerNodes. The P2P-JMAP protocol
works fine in this case but we do see possible privacy issues which we do not wish to get into here. It is
a Pandora’s box. On the other hand, CC’s may certainly sign Java Mobile Agents and provide services
for hosting them on the member peerNodes. This is a CC decision and avoids the aforementioned pri-

7-26
vacy problems. One can easily imagine Java Mobile Agents doing a mobile “whois command” across a
user or chat room supplied itineary in such a CC. The possibilities are almost without limits.
Next we address a different application that we think will be quite exciting to explore in the context of
P2P Overlay Networks. This is P2P Email.

7.4 Email on the P2P Overlay Network


Email was server based before it was client / server based. We initially had mainframes on which email
was received on networks like the ARPANET, and TELENET. It was accessed first by serial line and
later by ethernet connections to these servers using EtherTIPs (Ethernet Terminal Interface Proces-
sors). EtherTIPs supported modem connections from users’ terminals, and permitted those same users
to connect to mainframes on the ethernet. The email clients ran on the mainframes, and data ultimately
arrived via a serial line to a terminal display. In the early 1980’s when desktop systems with bitmap
displays and ethernet connectivity arrived, this paradigm changed. We then had true client / server
email systems. The Simple Mail Transfer Protocol (SMTP) rfc821, and the standard format for Internet
text messages defined in rfc822 were both were published in August of 1982, and the Domain Naming
Service rfcxxx in xxx of 198x. The POP1 rfc918 was written in 1984 and the first IMAP rfc1064 was
written in 1988. Up until the mid-1980’s email was for the most part limited to universities and
research centers. It soon moved into businesses with a multitude of for the most part proprietary proto-
cols and like ccMail, and their associated email clients. Interoperability was a serious problem. IMAP
service was running at Stanford University as early as 1986 where the protocol was co-invented by
Mark Crispin and Bill Yeager. Bill Yeager wrote the first Unix IMAP server at the same time. Here,
Stanford users had true email clients with UI’s running on Xerox PARC Lisp- or D-machines and
using IMAP to access the email on Sun servers. The first IMAP email client was written in Interlisp by
Mark Crispin, and called MM-D, MM for Mail Manager and D for the D-machines. In 1988 Bill Yea-
ger and Frank Gilmurray wrote MacMM which was the first MacIntosh IMAP client. This IMAP tech-
nology was in the public domain, and macMM along with the Unix IMAP server were used world-
wide by the end of the 1980’s since they were freely accessible by public ftp from Sumex-AIM.Stan-
ford.edu. Thus, by the end of the 1980’s the Internet software technology was in place for providing
client / server email services to the masses. Now, it’s here. Email is the major way that people commu-
nicate on the Internet. The logical next step given the advent of P2P is P2P Email. Why should this be
done?
The initial response to the above question is for better service. The 2nd response is that P2P Email as
we will describe it is more suitable for the email users. It is clearly an experiment, and we feel one that
must be done. We want to remove the limitations imposed by most email services with respect to stor-
age, protocol use, and bandwidth utilization or the number of times one can access her or his email.
Finally, we are using protocols that were invented nearly twenty years ago. As chapter 2 points out,
computers and networks have radically changed since that time. It is reasonable to ask if it is even

7-27
appropriate to use client/server email given the current computation power and disk storage in the
average users hands, and the connectivity that exists today. Let’s first look at the latter limitations.
If one has a Mac G4 laptop or desktop, or a Windows PC, then the local disk storage is almost unlim-
ited. We are saying hundreds of gigabytes and soon a terabyte of storage on the average computer.
Also, most email services charge a premium if one wishes to keep his or her email on the server. And,
most do not want to pay this premium, and, as a consequence, store their messages on their own sys-
tems. Thus, if we can devise a P2P Email delivery mechanism that also stores email on a user’s system,
we have parity of storage. We can also devise schemes for peers to backup one another’s email for no
cost. Here, they will simply give one another disk storage privileges using a bartering system. Your 100
megabytes is equal to my 100 megabytes. Yes, there is an issue of security, but it is easy enough to
solve with today’s crypto algorithms that are universally available. This is a straight forward encrypt
and MAC locally before storing the data remotely. Finally, if we can eliminate SPAM altogether, and
we believe we can, we will have a attractive P2P Email system. Given these ideas, if we can convince
ourselves that we can provide equal or better delivery service with P2P Email, then we have a definite
winner. Let’s get to the technical details.
We have a natural organization for allocating and organizing P2P Email accounts. These are connected
communities. We assume that the majority of email users will want to form connected communities,
and then restrict their email activity to these communities. Families, friends, colleagues, clubs, etc. will
all have their private connected communities for the purpose of intra-community email. Individuals
can belong to multiple communities and thus we get an excellent built-in triage of one’s email by con-
nected community. It is an automatic filter that will save users a great deal of time in sorting through
their messages.
Here, we emphasize that the email is restricted to the connected community. Note also that our con-
nected communities can have a very strong authentication mechanism for joining, becoming active,
and sending messages within the community. This is described in chapter 5. With such a mechanism in
hand, SPAM can be easily cut off since each member has a unique identity, and SPAMers can therefor
be immediately identified and dealt with, i. e., kicked out of the community. The the INBOX organiza-
tion we will be suggesting, a user can simply notify the peerNodes that are her or his distributed
INBOX to revoke email sending privileges from a SPAMer. This notification is done securely with
authentication in the CC.
As in any email system, each user needs a unique email address to which email can be sent. We must
say right up front that our system will not use SMTP to send email. Rather, all email will be sent in
ACP/ONP messages to a virtualPort for which a virtualSocket will be open and listening in a con-
nected community. The only restriction on this virtualPort is that it must be unique within the con-
nected community. Recall that the connected community document defines a virutalPort UUID to be
used for email in the connected community. On the other hand, we do not want to reinvent email for-
mat, and want to take advantage of existing clients, so therefore all email messages will be formatted
following the rules for Internet mail, that is, rfc822 for the text header definitions, MIME for message
header extensions and message body formatting, and S/MIME for secure email.

7-28
An email address has a text formulation which is bound to the unique peerNode address. The former is
for users’ convenience and the latter is for the use of the P2P Email software. These addresses along
with other email related information are distributed in an Address Book Document which is described
at the end of this discussion. The email address is defined as follows:
UserName@PeerName.ConnectedCommunityName
and this is bound to the unique email address:
UserName@PeerIdentity.CC-UUID
The UserName, PeerName, and ConnectedCommunityName must adhere to the Internet requirements
for email addresses as is mentioned in chapter 3. In particular, Unicode 3.1 characters are permitted
and when they appear in the email address, then they must be encoded as specified in the MIME
header extensions.
The email software has access to the PeerIdentity Document bound to the PeerIdentity, and this can be
used to differentiate between identical text email addresses. The former document can contain suffi-
cient information to identify the user and also be signed by the user’s private-key to authenticate the
document. Thus, email addresses so defined are unique within connected communities. Similarly, a
user’s peerNode will have accessed the connected community document and thus be able to construct
the virtualPort document required to advertise the listening email service at the email virtualSocket on
the peerNode.

Given the above discussion and an adherence to the appropriate Internet standard documents for email
header creation, one can make well formed email headers, and messages from the information in a
user’s peerIdentity and connected community documents. This permits us to existing Internet email
clients for our P2P Email system. The trick to accomplish this is simple and commonly used today by
VPN applications for sending email on a VPN. One creates a small local email proxy that listens on the
localhost at configurable port. The email client preferences set the outgoing email service’s address to
the localhost / port pair. Then the proxy code captures the message, parses the headers to extract the
P2P Overlay Network information from the email addresses, and appropriately encapsulates it in an
ONP/ACP message for each recipient peerNode specified in the To:, Cc: and Bcc: email addresses
When one attempts to deliver a P2P Email message, initially, an attempt is made to contact each recip-
ient at its peerNode’s connected community email virtualPort. If this succeeds, then the received email
message is removed from the ONP/ACP envelope and stored locally in the P2P Email INBOX. What
do we do if the recipient’s peerNode is not reachable? Because we are dealing with a P2P Overlay Net-
work, and do not have 24x7 centralized email services, we need an alternative. To this end, each peer-
Node will have a distributed INBOX that is a list of peerNode email distribution points. These
distribution points are willing to relay in a store-and-forward manner undeliverable P2P Email. Unlike
traditional SMTP relays, the distribution point storage is passive rather than active. Here, the ONP/
ACP messages are queued waiting for a request from the recipient. Each recipient will check for new
mail in its distributed INBOX on a periodic timer, and actively retrieve such email when it is present.

7-29
One notes that there might be a problem of duplicate messages appearing on two different distributed
points, and as a consequence the recipient may receive both copies. To solve this problem a list of the
SHA-1 hashes of the received messages’ message-ID