The Difficulty of Accountability -------------------------------- In simple distributed systems, rudimentary accountability measures are often sufficient. If the list of peers is generally static and all are known to each other by hostname or address, servers can only misbehave under the threat of permanent bad reputation. Furthermore, if the operators of a system are known, pre-existing jurisdictional mechanisms such as legal contracts help ensure that systems abide by protocol. In the real world, these two social forces -- reputation and law -- have provided an impetus for fair trade for centuries. Since the earliest days of commerce, buyers and merchants have known each others' identities -- at first through the immediacy of face-to-face contact, and later through postal mail and telephone conversations. This knowledge has allowed them to research the past histories of their trading partners, and to seek legal reprisal when deals go bad. Much of today's e-commerce uses a similar authentication model: clients (both consumers and businesses) purchase items and services from known sources over the Internet and the World Wide Web. These sources are uniquely identified by digital certificates, registered trademarks, and other addressing mechanisms. Peer-to-peer technology has blurred this distinction between client and server. Peer-to-peer technology removes central control of such resources as communication, file storage and retrieval, and computation. Therefore the traditional mechanisms for ensuring proper behavior can no longer provide the same level of protection. The problem of peer-to-peer identity is threefold. First, peer-to-peer technology increases the difficulty of uniquely and permanently identifying peers and their operators. Connections and network maps might be transient. Peers might be able to join and leave the system. Participants in the system might wish to hide personal identifying information by distributing trust among many peers along both operational and jurisdictional lines. Second, even given a known identity, there may be no way to identify a particular individual, or to associate their history or predict their performance. Third, individuals running P2P services are rarely bound by contracts, and the cost and time delay of legal enforcement would generally outweigh their possible benefit. There are a number of different models on which to base a peer-to-peer system. As systems become more dynamic and diverge from real-world notions of identity, it becomes more difficult to achieve accountability and protect against resource degradation attacks. [is this paragraph redundant? -rd] The simplest type of peer-to-peer system has two main characteristics. First, the list of servers in the system is mainly fixed: new servers cannot be frictionlessly added to the system. Second, the identities of the servers/operators are known, generally by DNS hostname or static IP host address. Since the operators are known, they may have legal responsibility or economic incentive -- leveraged by the power of reputation -- to fulfill the protocols according to expectation. An example of such a peer-to-peer system is the Mixmaster2 remailer. [citation? who runs it? who owns it? -rd] The original Mixmaster client software was developed by Lance Cottrell and released in 1995. [Cottrell, Lance. Mixmaster and Remailer Attacks. Web-published essay. http://www.obscura.com/~loki/remailer/remailer-essay.html] Currently, the software runs on about 30 remailer nodes, whose locations are published to the newsgroup alt.privacy.anon-server and at web sites such as efga.org. The software itself can be found at http://mixmaster.anonymizer.com The remailer nodes are known by hostname and remain generally fixed. While anybody can start running a remailer, the operator needs to spread information about his new node via out-of-band means (meaning manually sending email -- it's not part of the protocol) to web pages that publicize node statistics. The location of the new node is then manually added to each client's software configuration files. This process of manually adding new nodes leads to a system that remains generally static. Indeed, there are less than 30 widely-known Mixmaster nodes. [Electronic Frontiers Georgia List of Public Mixmaster Remailers. Web Page. http://anon.efga.org/Remailers] A slightly more complicated type of peer-to-peer system still has identified operators, but is dynamic in terms of members. That is, the protocol itself has support for adding and removing participating servers. One example of such a system is Gnutella -- it has good support for new users (which are also servers) joining and leaving the system, but at the same time the identity and location of each of these servers is generally known or knowable. [Gnutella Wall of Shame. Web site. http://www.zeropaid.com/busted/] [R. Dingledine, M. Freedman, and D. Molnar "The Free Haven Project" 2000 Berkeley Workshop on Design Issues in Anonymity and Unobservability http://www.freehaven.net/doc/freehaven.ps] These sorts of systems can be very effective, because they're generally easy to deploy (there's no need to provide any real protection against people trying to learn the identity of other participants) while at the same time they allow many users to freely join the system and donate their resources. Farther still on the scale of design and deployment difficulty are the types of peer-to-peer systems which are dynamic in terms of participants, and have pseudonymous server operators. In these systems, the actual servers that store files or proxy communication live behind digital masks that conceal their geographic location and other identifying features. Thus, the mapping of pseudonym to real-world identity is not known. A given pseudonym may be pegged with negative attributes, but a user can just create a new pseudonym or manage several at once. Since a given server can simply disappear at any time and reappear as a completely new entity, these sorts of designs require either reputation systems or micropayments systems to provide accountability on the server end. An example of a system in this category is the Free Haven design -- each server is contactable via a Mixmaster reply block and a public key, but no other identifying features are available. The final peer-to-peer model on this scale is a dynamic system with fully anonymous operators. A server which is fully anonymous, compared to the pseudonymity described above, lacks even that level of temporary identity. Since an anonymous peer's history is by definition unknown, all decisions in an anonymous system must be based only on the information made available during each protocol operation. In this case, peers cannot use a reputation system since there is no real opportunity to establish a profile on any server. This leaves a micropayment system as the only reasonable way to establish accountability. On the other hand, because the servers themselves have no long-term identity, this may limit the number of services or operations which such a system could provide. For instance, it would be very difficult to offer long-term file storage and backup services. As we can notice from the models of possible systems, there are a number of design considerations beyond that of simple collaborative networking between nodes on the Internet. Below we describe two main goals -- privacy protection and dynamic operation -- which emphasize the difficulty of achieving accountability measures. First, a focus of many systems is the goal of privacy-protecting file sharing or communication. Privacy-protecting file sharing requires a mechanism for inserting and retrieving documents either anonymously or pseudonymously. Privacy-protecting communication, on the other hand, requires a means to communicate -- via email, telnet, ftp, irc, http, etc. -- while not divulging any information that could link the user to his real-world persona. Note that the Internet is not an ideal medium for anonymous communication and publishing systems: the ease of passive sniffing and active attack is greatly simplified. Email headers include the routing paths of email messages, including DNS hostnames and IP addresses. Web browsers normally display user IP addresses; cookies on a client's browser may be used to store persistent user-information. Commonly-used online chat applications such as ICQ and Instant Messenger also divulge IP addresses. Network cards in promiscuous mode can read all data flowing through the local ethernet, and themselves often have world-unique MAC addresses, hard-wired during manufacture. With all these possibilities, telephony or dedicated lines might be better suited for this goal of privacy-protection. However, the ubiquitous nature of the Internet has made it the only real consideration. Second, the dynamic nature and explosive growth of the Internet suggests that any peer-to-peer system must be similarly flexible and dynamic, in order to sustain long-term use and growth. Similarly, the importance of ad-hoc networks will probably increase in the near future, paralleling the further development of inexpensive wireless technology and connectivity. The system must provide a mechanism for peers to smoothly join and leave without impacting functionality. This design also decreases the risk of system-wide compromise, as more peers join the system. (This assertion assumes that servers run a variety of operating systems and tools, so that a single exploit cannot compromise most of the servers at once.) The main goal of accountability is the following: we wish to maximize a server's utility to the overall system, while minimizing the threat it poses. In other words, accountability is used to reduce the damage, intentional or not, that a server can inflict on the system. We can choose to minimize the threat by only incurring risk either equivalent to our benefit from the transaction or proportional to our trust in the parties involved, respectively described in terms of a micropayment and reputation model. The former technique uses a fee-for-service micropayment model. A server makes decisions based on fairly immediate information. Payments and the value of services are generally kept small, so that a server only gambles some small amount of lost resources for any single exchange. If both parties are satisfied with the result, they can continue with successive exchanges. Therefore, parties require little prior information about each other for this model, as the risk is small at any one time. Note that the notion of payment in a micropayment scheme might be distinct from any actual currency or cash. The latter technique uses a reputation model. For a given exchange, a server risks some amount of resources proportional to its trust that the result will be satisfactory. As a server's reputation grows, other nodes become more willing to risk larger amounts with it. The micropayment approach of small, successive exchanges is no longer necessary. Reputation systems require careful development, however, in the presence of impermanent, pseudonymous identities. If an adversary can gain positive attributes too easily and establish a good reputation, she can damage the system. Conversely, if a well-intentioned server can incur negative attributes easily from short-lived operational problems, she can lose reputation too quickly. The system would lose the utility offered by these ``good'' servers. There is a third way to handle the accountability problem: ignore the issue and engineer the system to simply survive some faulty servers. Instead of spending time on ensuring that servers fulfill their function, we can leverage the vast resources of the Internet for redundancy and mirroring. We might not know, and we might be unable to find out, if a server is behaving according to protocol, whether that protocol is storing files and responding to file queries, forwarding email or other communications upon demand, or correctly computing values or analyzing data. Instead, if we replicate the file or functionality through the system, we can ensure that the system works correctly with high probability, despite misbehaving components. [note that this works only for services provided by the network itself. it is meaningless in the context above: exchange of objects of value. -rd] The question remains: where are we successful? That is, which servers in the peer-to-peer system are behaving according to protocol, and which offer "correct" information? This problem is one of trust, described in more depth in another chapter [Insert Trust chapter here]. Aspects of trust which make our situation particularly complicated include the problem of document naming (and associated uncertainty about the actual contents of the document), and development of efficient tests for distributed computations. More broadly, we need to develop general algorithms to verify behavior of decentralized systems -- for instance, Free Haven needs to address this issue to know how frequently to do document queries to see if a given document is still available. In general, the popular peer-to-peer systems take a wide variety of approaches to solving the accountability problem. A few examples follow. * Freenet dumps unpopular data on the floor, so people flooding the system with unpopular data are ultimately ignored. Popular data is cached near the requester, so repeated requests won't traverse long sections of the network. * Gnutella doesn't "publish" your documents anywhere except on your computer, so there's no way you can flood other systems. (This has great impact on the level of anonymity actually offered.) * Publius limits the submission size to 100k. (It remains to be seen how successful this will be; they recognize it as a problem.) * Mojo Nation uses micropayments for all peer-to-peer exchanges. * Free Haven requires publishers to provide reliable space of their own if they want to insert documents into the system. This economy of reputation tries to ensure that people donate to the system in proportion to how much space they use. In the next sections, we further consider the accountability problem, and describe the micropayment, reputation, and other schemes used.