
\documentclass[times,10pt,twocolumn]{article}
\usepackage{latex8}
\usepackage{url}
\usepackage{graphics}
\usepackage{amsmath}

\newcommand{\workingnote}[1]{}        % The version that hides the note.
%\newcommand{\workingnote}[1]{(**#1)}   % The version that makes the note visible.

\renewcommand\url{\begingroup \def\UrlLeft{<}\def\UrlRight{>}\urlstyle{tt}\Url}
\newcommand\emailaddr{\begingroup \def\UrlLeft{<}\def\UrlRight{>}\urlstyle{tt}\Url}

% If an URL ends up with '%'s in it, that's because the line *in the .bib/.tex
% file* is too long, so break it there (it doesn't matter if the next line is
% indented with spaces). -DH

%\newif\ifpdf
%\ifx\pdfoutput\undefined
%   \pdffalse
%\else
%   \pdfoutput=1
%   \pdftrue
%\fi

\begin{document}

\title{Reputation in P2P Anonymity Systems}

\author{Roger Dingledine \\ The Free Haven Project \\ arma@freehaven.net \and
Nick Mathewson \\ The Free Haven Project \\ nickm@freehaven.net \and
Paul Syverson \\ Naval Research Lab \\ syverson@itd.nrl.navy.mil}

\maketitle
%\thispagestyle{empty}

\begin{abstract}

Decentralized anonymity systems tend to be unreliable, because users
must choose nodes in the network without knowing the entire state
of the network. Reputation systems promise to improve reliability by
predicting network state. In this paper we focus on anonymous remailers
and anonymous publishing, explain why the systems can benefit from
reputation, and describe our experiences designing reputation systems
for them while still ensuring anonymity. We find that in each example we
first must redesign the underlying anonymity system to support verifiable
transactions.

\end{abstract}

\section{Introduction}

In decentralized networks, many traditional means of ensuring
accountability between participants become unworkable.  As the
combined resources of individual participants outgrow those of a
(possibly nonexistent) central authority, top-down enforcement of good
behavior becomes unwieldy.  As the number of participants grows,
repeated interactions between participants can become
infrequent, and so individuals cannot rely on their own past
experience to recognize those who behave badly.  Under these
circumstances, reputation is one of few workable mechanisms to keep hostile
and freeloading nodes from damaging the system.

But when the network's very purpose is to provide anonymity, reputation
itself becomes problematic. Firstly, location protection enables an
attacker or freeloader to cheaply throw away a pseudonym that has acquired
a bad reputation. Secondly, it is hard to detect or verify a participant's
behavior while at the same time maintaining
his anonymity. And thirdly, reputation information can be exploited by
an adversary to reduce a participant's anonymity.

So we are left with a conundrum: if the network's stability relies
on the good behavior of individual nodes, reputation may be the
only way to achieve it.  But reputation data is hard to gather in
the presence of anonymity, and even when gathered, poses a potential
vulnerability for an attacker to exploit.

As with security, it is tempting but incorrect to think that
reputation is a simple matter of bolting an extra service to the side
of an existing system.  In the rest of this paper, we illustrate this
point with examples from reputation systems that have been designed
for use in networks that provide pseudonymity and anonymity.  First,
however, we outline the incentive structures involved in deploying
such networks.

\section{An Economics of Anonymity}

Unlike confidentiality (encryption), anonymity cannot be created by the
sender or receiver. Alice cannot decide by herself to send anonymous
messages --- she must trust the infrastructure to provide protection, and
others must use the same infrastructure. Anonymity systems use messages
to hide messages: senders are consumers of anonymity and also providers
of the cover traffic that creates anonymity for others. Thus users are
always better off on crowded systems because of the noise other users provide.

Because high traffic is necessary for strong anonymity, agents must
balance their incentives and find a common equilibrium, rather than
each using a system of their own.
The high traffic they create together
also enables better performance: a system that processes only light
traffic must delay messages to achieve adequately large anonymity sets.
But systems that process the most traffic do not necessarily provide the
best hiding: if trust is not well distributed, a high volume system is
vulnerable to insiders and attackers who target the trust bottlenecks.

Anonymity systems face a surprisingly wide variety of direct
anonymity-breaking attacks \cite{back01,raymond00}. Additionally,
adversaries can also attack the efficiency or reliability of nodes, or
try to increase the cost of running nodes. All of these factors combine
to threaten the \emph{anonymity} of the system.  As Back et al.\ point
out, ``in anonymity systems usability, efficiency, reliability and cost
become \emph{security} objectives because they affect the size of the
user base which in turn affects the degree of anonymity it is possible
to achieve.'' \cite{back01}

Early work on the economics of anonymity \cite{econymics} has focused on
the incentives for participants to act as senders and nodes, providing
three results: Firstly, systems must attract cover traffic (many
low-sensitivity users) before they can attract the high-sensitivity
users. Weak security parameters (e.g. smaller batches or lower latency) may produce
\emph{stronger} anonymity by bringing more users. But to attract this
cover traffic, they may well have to address the fact that most users do
not want (or do not realize they want) anonymity protection. Secondly,
high-sensitivity agents have incentive to run nodes, so they can be
certain their first hop is honest. There can be an optimal level of
free-riding: in some conditions these agents will opt to accept the cost
of offering service to others in order to gain cover traffic. Thirdly,
while there are economic reasons for distributed trust, the deployment of
a completely decentralized system might involve coordination costs which
make it unfeasible. A central coordination authority to redistribute
payments may be more practical, but could provide a trust bottleneck
for an adversary to exploit.

The reputation systems we discuss below would enable users 
to direct their traffic to reliable nodes --- thus giving
high-sensitivity agents incentive to provide {\it reliable} service, and making
the network as a whole more reliable.  Below, we examine several
designs for reputation systems for anonymity networks.

\section{An Example: Remailer Networks}

Remailer networks allow people to send and receive mail while protecting
their identities. Today's remailer networks use a handful of long-lived,
static servers with fairly uniform reliability. Currently deployed
reputation systems \cite{echolot} send periodic test messages through
each remailer to determine which are currently working. This ``pinging''
approach works well enough for a small static list of servers. However,
a network that is made of a small static set of remailers is potentially
vulnerable to a well-funded adversary in a variety of ways, e.g.,
denial of service or remailer compromise. %Any solution based only on
%hardening the nodes runs contrary to the basic premise of remailers
%that trust is distributed rather than mutually assured.
Thus, to better
resist a well-funded adversary, the remailer network must grow so it has
enough nodes to properly distribute trust. Pinging for reputation breaks
down in an environment where the network is made up of many transient
volunteer nodes. On the other hand, an adversary might render a growing,
dynamic network useless by volunteering a flood of unreliable remailers,
or it might manipulate the reputation system to improve the standing of
remailers it owns.

The reputation system presented in \cite{mix-acc} aims to improve
remailer reliability. Remailer reputation is based on both positive and
negative performance. If it were based only on successes, an adversary
could pump up his reputation by sending his own messages through the
node while ignoring other messages. On the other hand, recording only
failures is also insufficient, because new unreliable remailers would be
rated the same or better than remailers that consistently perform well
\cite{cheap-pseudonyms}. In this design, each remailer in the message's
path passes back a receipt to the one behind it. Senders can successively
query for receipts to determine which remailer to blame for delivery
failure. However, a remailer might refuse to provide a receipt for a
particular message either because it didn't try to send the message, or
because it tried but was unable to obtain a receipt from the next hop. We
solve the problem of pinpointing failures by introducing a set of weakly
trusted global witnesses. These witnesses are contacted when the next
hop in the path refuses the message, allowing a remailer to prove that it
made a best-effort delivery attempt. Senders can also tell witnesses about
remailers that silently dropped messages (meaning they got a copy but did
not attempt to pass it on). These witnesses verify and tally failures, and
also send their own test messages to distinguish reliable remailers from
new ones that have not yet been tested. Reputations are %tabulated and
made available to client software, which can use them to choose reliable
remailers for sending anonymous mail.

This reputation system attempts to improve reliability in a long-term
sense, rather than giving provable delivery guarantees for each
message \cite{shuffle}.
On the other hand, it still relies both on proofs of correct behavior
to establish reputations, and trusted witnesses to determine and
keep track of them. These witnesses are both a trust bottleneck and a
communications bottleneck. Further, an anonymity-breaking adversary with
an adequate budget would do best to provide very good service, possibly
also attempting DoS against other high-quality providers. The
usual performance and efficiency metrics cannot distinguish the bad guys.

The reputation system in \cite{casc-rep} does
away with trusted witnesses and proofs in favor of self-rating groups
of remailers. Remailers are arranged in cascades (fixed paths through the
network, so batches of messages go through in synchrony).
New cascades are formed at a regular interval (e.g. daily),
and the formation of cascades is based on a communally generated
random value so that no set of collaborating remailers can predict which
remailer will be in which cascade, as long as at least one remailer is
honest. Remailers send test messages through their own cascades and
can also receive evidence of failure from client senders. Rather than
depending on proofs of remailer performance, a cascade fails when and
only when some member of that cascade has declared it to have failed. All
members of cascades that do not fail during an interval increase in
reputation; all members of cascades that fail decrease. To make it
harder for the head remailer of a cascade to undetectably fail or fail
selectively, each of the cascade members is responsible for a portion of
the messages that go through a cascade in each batch. In effect, each
member is the head for some of the messages. Similarly, the tail of
the cascade sends each outgoing message to each of the other cascade
members rather than just directly to the recipient. All then attempt to
deliver the message to the recipient. (Efficiency of communication for
final delivery can be improved by using delivery receipts when that is feasible.)

In both of these remailer reputation systems, it was necessary to redesign the
remailer protocol so we could track remailer behavior. In the first
case, we added receipts for each delivery to each remailer and to
the ultimate destination, and we added trusted witnesses to verify
delivery and record success or failure. In the second case, we used
the usual remailer cascade protocol for messages inside the cascade,
but we introduced a new protocol for sending messages into the cascade
and delivering messages from the cascade.

\section{An Example: Anonymous Publishing}

Free Haven \cite{freehaven-berk} describes a design for a publishing
system that can resist the attempts of powerful adversaries to find or
destroy any stored data. It provides anonymity for readers and publishers,
and also hides the locations of the servers that store and serve the
documents. Unlike related designs such as Freenet \cite{freenet},
the publisher of a document --- not the servers holding the document
--- determines its lifetime. To counter malicious or flaky servers,
publishers break documents up into shares, any adequately sized subset of
which is sufficient to reconstruct the document. Servers then trade these
shares around, allowing for servers to join and leave smoothly and also
providing a moving target for an adversary hunting a particular share.

To prevent selfish or malicious users from filling up the available disk
space, all who would publish must also provide servers, and servers form
contracts to store each other's material for a certain period of time.
Because servers can cheat and drop data early, Free Haven employs a
reputation system to limit the damage done by servers that misbehave.
Successfully fulfilling a contract increases a server's reputation and
thus its ability to store some of its own data on other servers. This
gives an incentive for each server to behave well --- that is, as long
as cheating servers can be identified.

In such a dynamic and anonymous environment, it is very difficult to
reliably notice if a server drops data early. We might give the original
publisher the task of monitoring the availability of his documents; he can
then broadcast a claim that a particular document has been dropped early.
If we don't want to rely on the original publisher, we can assign a
random server as a ``shepherd'' for a document. Or for a more dynamic
solution, we can use a ``buddy system'', where publishers put in two
copies of each share, and the copies watch each other and broadcast a
complaint if either disappears.

Anyone wishing to claim misbehavior can present a signed contract as
an indication the file should have been kept, but there are a number
of special cases where a signed contract is not sufficient proof to
pinpoint a particular server as the culprit, such as if the server traded
the share away before the contract expired. Because a claim cannot be
taken as absolute proof, servers are left with the grim task of trying
to determine the credibility of the claimer: if two or more servers
disagree, onlookers need a way to determine which to believe. Keeping
track of all claims ever made and tracking their results might give a
server enough information to make reasonable guesses about the validity
of each claim. This approach gets complex very quickly, and leaves lots
of holes for a smart adversary.

Providing a way to verify claims in Free Haven remains an open problem.
Given our experience designing the remailer reputation systems above,
it seems most promising to redesign the entire system from the ground up,
this time with verifiable claims in mind. Designing the original system
without a clear idea of the reputation and verifiability requirements made Free Haven's
design complex and brittle.

It's tempting to scale down the reputation system instead. That is,
servers use only local information from their own transactions and
trades. In this scenario, a new server must separately convince each
server it encounters that it is reputable before that server will
allow it to publish any data. Because news about good and bad behavior
does not propagate, servers must be more conservative about offering
resources to unfamiliar nodes. To prevent stabilization into a static
network with a small number of nodes trading data among each other and
ignoring newcomers, many servers must explicitly risk resources on new
nodes. Rather than the global gossip system where the whole world learns
from a server's first interaction, now each new server has a chance to
waste the time and resources of every other server (the ``screw everyone
once'' attack).

Perhaps the most interesting point of Free Haven reputations is that
servers use their reputation capital to obtain proportional resources
from other servers. In order to build up to a reputation that allows
a server to store a certain amount in the system (think of it like a
credit limit), the server must previously have successfully stored that
same size of other documents. Each server is able to commit at most his
current credit limit in new contracts, and successfully completing a
contract raises his credit limit. Because a server can cheat on at most
the amount of useful work he has already done, each server is forced to
perform at least 50\% useful work. We will say more about this notion of
``spending'' reputation and quantifying risk below.

\section{Reputation and Anonymity}

Although the examples described above associate non-transferable
reputation with long-lived pseudonyms (nyms for short), there are many
ways to vary this formula. Entities may be anonymous or have only
short-lived names; reputations may be short-lived or transferable;
and bindings between nyms, entities, and reputations may be varied.
We have already noted that in Free Haven, a server can both acquire and
spend reputation. A central register or registers can keep track of each
server's current reputation credit level, or each server can maintain its
own view of the system. It is a small step from here to consider systems
in which reputation can be paid in the form of coins or tokens. With such
a system, the same entity can maintain reputation even while changing
nyms. An entity that holds different nyms in different
contexts can benefit from this feature -- either to preserve reputation
independent of private key compromise, or to maintain perfect forward
anonymity (future compromises linking a nym to an entity will not reveal
previous transactions by the entity bearing that nym, even if all past
behavior under all nyms is logged \cite{unlinkable-serial-transactions}).

Along with the advantages of fungible reputation come some potential
problems. For example, an entity can obtain reputation in one context
where it has functioned well and spend it in another context where
it has not. Or the entity can transfer reputation to another entity
entirely. This capability might be controlled by using cryptographic
techniques that, e.g., bind all the reputation to the same nym or the
same entity without revealing which nym (or entity) that is. But there
are other concerns for this approach. For example, how do you diminish
reputations for bad performance in such a model? (If it is possible
to bind reputation tokens to an entity in an anonymity preserving way,
it may also be possible to bind them to a duration so that the amount of
reputation can be evaluated with respect to the time that the reputation
bearer has existed. Thus we can distinguish longstanding, low-performing
entities from untested ones.)

On the other hand, in some contexts transferable reputation may not be a
problem. For example, in a system like Free Haven that ``pays'' you for
good performance with system services from others, as long as the amount
of service credit taken from the system is no more than the amount of
service provided to it, does it matter if that credit is transferred
to others?

\section{Conclusion: New Directions, Misdirections, and Other Questions}

Reputation systems are already gathering momentum at the grassroots of the
Internet. Special-purpose systems --- some more {\it ad hoc} than others --- have
already been rolled into online auction services, messaging protocols, and
online discussion sites. But despite this growing body of experience at
building simple reputation systems and designing more complex ones for
anonymity systems, the questions remain as intriguing as the solutions.

How can we fine-tune a reputation system in response to a specific threat
model? In a relatively low-threat environment (e.g., tracking ISP uptime),
ordinary statistical models will suffice. But most statistical models
assume that data is biased at worst, not maliciously chosen by an adversary
who wants us to make a particular decision. At this point, the emphasis in
current research shifts from predicting behavior to minimizing risk. Is it
really necessary to abandon statistical rigor? The field of machine
learning has a rich history and a lot of experience at solving related
problems. Is there some way to adapt these solutions to an adversarial
context?

Similarly, what can we do when statements aren't verifiable, and where an
adversary can either lie about real interactions, or fabricate spurious
interactions and lie about those?  We could try to make credibility charts
and weight statements by credibility --- but a smart adversary could try to
trick our credibility calculations as well. If somebody finds a way to
establish bounds on malicious influence on such a system, the range of
problems we can solve with reputation would explode overnight.

We have already seen how reputation can enhance privacy. Can we go one step
further, and assign reputation based on an expectation of protecting
privacy? In distributed privacy systems, the privacy provided is typically
based on an assumption that some subset of system components will both
perform duties correctly {\it and} will not reveal some parts of their data
and/or operations. A privacy destroying adversary might offer very reliable
service while using information from various compromised system elements to
compromise privacy. Can we say anything meaningful about reputations based
on reliability at keeping secrets, or are we limited to making statements
about the probability of privacy compromise given the likelihood and
structure of component compromise?

What if we treat reputation as currency? Currency implies economics. How do
you get reputation currency? Do currency-based approaches always imply
transitive trust? Is the supply of reputation currency constant,
increasing, or decreasing? Does it expire, or slowly lose value over time?
Where, ultimately, does a currency come from --- a decentralized Federal
Reserve? Does it materialize as a side-effect of performing work from the
system? Is there credit? Or does currency only appear when the system is
bootstrapped, and if so, how? Is currency global, or do individual servers
mint their own currencies? Can we evolve a viable market that is scalable
based on these local currencies issued by individual servers? Can a global
currency be bootstrapped from local currencies?

The number and diversity of real-world reputation systems is staggering.
Through a byzantine mass of credit reports, product reviews, earnings
statements, flat-out gossip, and a thousand other information channels, we
try to convince one another of our honesty and competence. In the process
we expose far more detail about ourselves than we might wish. Online
reputation systems promise to be still more complex and difficult to build
than their real-world analogues, but they hold out the promise of enabling
decentralized interaction and protecting privacy in ways that today's
systems of trust cannot.

\bibliographystyle{latex8}
\bibliography{econp2p03}

\end{document}

