%% MIX cascade reputations

\documentclass{llncs}

\newcommand{\enc}[2]{\mathtt{enc}(#1,#2)}
\newcommand{\secret}{s}
\newcommand{\key}[1][]{K_{#1}}
\newcommand{\privatekey}[1][]{\key[#1]^{-1}}
\newcommand{\sign}[2]{\mathtt{sign}(#1,[#2])}
\newcommand{\tsbc}[1]{\mathtt{tsbc}(#1)}
\newcommand{\rand}[1]{\mathtt{rand}_{#1}}

\begin{document}

%\title{A Reputation System to Increase MIX Cascade Reliability}
%\title{Reliable MIX cascades with reputation}
%\title{Making MIX cascades reliable with reputation}
%\title{A Reputation System using MIX Cascades}
\title{Reliable MIX Cascade Networks\\ through Reputation}
%\title{Reliable MIX Cascade Networks using Reputations}
%\title{Reliability by Reputation in MIX Cascade Networks}
%\title{MIX Cascade Networks Reputed to Become Reliable}
\author{Roger Dingledine\inst{1} \and Paul Syverson\inst{2}}
\institute{The Free Haven Project
\email{(arma@mit.edu)}
\and
Naval Research Lab
\email{(syverson@itd.nrl.navy.mil)}}
\maketitle
\pagestyle{empty} 
  
\begin{abstract}

We describe a MIX cascade protocol and a reputation system that together
increase
the reliability of a network of MIX cascades. In our protocol,
MIX nodes periodically
generate a communally random seed that, along with their reputations,
determines cascade configuration. Nodes send test messages to monitor their
cascades. Senders can also demonstrate message decryptions to convince
honest cascade members that a cascade is misbehaving.
By allowing any node to declare the failure of its own cascade, we
eliminate the need for global trusted witnesses.

\end{abstract}

Keywords: anonymity, reputation, peer-to-peer, communal randomness

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\section{Introduction}
\label{sec:intro}

Practical anonymous communication systems require high reliability.
Reliability can lead to efficiency because routes are more likely to
succeed. Reliability can also improve anonymity because senders need to
resend fewer messages, and because a reliable system draws
more users and thereby increases anonymity sets. Past
approaches to increasing remailer reliability have included writing
more reliable software \cite{RProcess}, building MIX protocols that
give provable robustness guarantees \cite{desmedt,flash-mix,mitkuro},
and building a reputation system to let users
choose paths based on the published scores for each node \cite{mix-acc}.
%We extend this last approach to remove some of its bottlenecks and
%provide deeper analysis.

The reputation system described in \cite{mix-acc} uses a
MIX-net in which nodes give receipts for intermediate messages. These
receipts, together
with a set of witnesses, allow senders to verify the
correctness of each node and prove misbehavior to the witnesses. Here
we investigate and solve two problems from that design:

\begin{itemize}
\item \emph{The mechanism for verifying a failure claim requires a set of
global witnesses, a threshold of which
need to be involved in confirming every failure claim. These
global witnesses create both a trust bottleneck and a communications
bottleneck.} Our new reputation system avoids these bottlenecks by making
each node a witness to its own cascade.
\item \emph{An adversary trying to do traffic analysis
can get more traffic by gaining a high reputation.}
We protect against such adversaries by choosing from a
pool of ``acceptable'' MIX nodes, and building cascades so we
can bound the probability that an adversary will control an
entire cascade.
\end{itemize}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\section{Overview and Threat Model}
\label{sec:design}

We aim to improve the reliability of anonymous communication systems,
enabling their practical use in fields such as electronic commerce,
where transactions need to be efficient, reliable --- and,
frequently, anonymous.  We order MIX nodes into cascades, where each
cascade presents a fixed path through the network. This approach
allows us to decentralize the use of witnesses to detect failure as
introduced in \cite{mix-acc}, since each node can witness the traffic
through its own cascade. Further, using cascades allows us to better
resist traffic analysis from a pervasive adversary, since sending
traffic through fixed paths makes intersection attacks more difficult.
Our reputation system gives users
%It also allows users to get 
a more accurate picture of which nodes are currently up,
allowing them to choose routes more reliably and to know what level
of protection they're getting.
%and thus choose routes more reliably, even if down nodes are
%inclined to lie about their state.

Specifically, we aim to defend against two adversary goals.
An \emph{anonymity-breaking} adversary tries to discover linkability
between sender and receiver; to identify the sender or receiver of a
given message; or to 
trace a sender forward (or a receiver backward) to any messages.
A \emph{reliability-breaking} adversary tries to deny service to users.
We assume that our adversary can passively watch all traffic, and can
delay, modify, or insert some messages. We also assume that the adversary
has compromised some fraction of the participating MIXes.

Section~\ref{sec:random-selfbuild} shows the protocol by which MIXes
build themselves into cascades in a public and verifiable way, and
also describes the process of generating \emph{communal randomness} so
MIXes don't need to trust a central authority to build the cascades. We
periodically rebuild cascades to
reflect changes in reliability.

Inspired by a comment by Jim McCoy about using reputation
capital to regulate participants in DC-nets \cite{mccoy-dcnets},
we use a very simple reputation system. Any member of a cascade can
declare its own cascade to have failed. Nodes in a successful cascade
each gain one reputation point, whereas all nodes in a failed cascade
lose one point. 
Nodes that misbehave by incorrectly reporting cascade failure thus
damage their own reputations too. We describe this reputation system in
Section~\ref{sec:rep}, including some proposals for limiting the fraction
of adversary-controlled nodes, a discussion
of some pitfalls introduced by our simple reputation system, and our
technique for building cascades to reduce the chance that an adversary
will control an entire cascade.

Section~\ref{sec:when-to-fail} describes our modified MIX cascade
protocol, and also specifies how MIXes can decide when their cascade
has failed. In Section~\ref{sec:attacks}, we describe a variety of attacks
on the system and examine how well our design withstands these attacks.
Finally, we close in Section~\ref{sec:conclusion} with a discussion of
possible directions for future research.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\section{Related Work} 
\label{sec:related}

\subsection{MIX-nets}

Chaum introduced the concept of a MIX-net for anonymous communications
\cite{chaum-mix}. A MIX-net consists of a group of servers, called MIXes
(or MIX nodes), each of which is associated with a public key. Each MIX
receives encrypted messages, which are then decrypted, batched, reordered,
stripped of the sender's name and identifying information, and forwarded on.
Chaum also proved security of MIXes against a
\emph{passive adversary} who can eavesdrop on all communications between
MIXes but is unable to observe the reordering inside each MIX.

One type of MIX hierarchy is a cascade.
In a cascade network, users choose from a set of fixed paths through
the MIX-net. Cascades can provide greater anonymity against a large
adversary, because in a free-route system an
adversary who owns many of the MIXes can use intersection attacks to
dramatically reduce the set of possible senders or receivers for a given
message \cite{disad-free-routes}.

Current research on MIX-nets includes stop-and-go MIX-nets
\cite{kesdogan}, distributed flash MIXes \cite{flash-mix},
and hybrid MIXes \cite{hybrid-mix}.
MIX cascade research includes real-time MIXes \cite{realtime-mix} and
%http://vip.poly.edu/mehdi/papers/00668973.pdf
web MIXes \cite{web-mix}.
%probably also isdn mixes?

\subsection{Robustness and Reliability in MIX-nets}

Previous work primarily investigates the \emph{robustness} of MIX-nets
in the context of a distributed MIX system \cite{flash-mix}. A MIX
is considered robust if it survives the failure of any $k$ of $n$
participating servers, for some threshold $k$. This robustness is
all-or-nothing: either $k$ servers are good and the MIX works, or they are
not good and the MIX likely will not work.

Robustness has been achieved primarily via zero-knowledge proofs of
correct computation.  Jakobsson showed how to use precomputation to reduce
the overhead of such a MIX network to about 160 modular multiplications
per message per server \cite{flash-mix}, but the protocol was later
found to be flawed \cite{mitkuro} by Mitomo and Kurosawa.  Desmedt and
Kurosawa's alternate approach \cite{desmedt} requires many participating
servers. Abe's MIX \cite{abe} provides \emph{universal verifiability} in
which any observer can determine after the fact whether a MIX cheated,
but the protocol is still computationally expensive. Neff recently
made further efficiency improvements to universally verifiable mixing
\cite{shuffle}.

Reliability differs from robustness in that we do not try to ensure
that messages are delivered even when some nodes fail. Dingledine et
al's reputation system for free-route MIX-net reliability \cite{mix-acc}
aims instead to improve a sender's long-term odds of choosing a MIX path
that avoids failing nodes. This work similarly attempts to provide more
reliable long-term service by identifying reliable MIXes and building
cascades from them.

% XXX later, we may want to combine these two paragraphs and put in
% a comment about the paper from Adam, Ulf, and Anton, emphasizing that
% security is a tradeoff with efficiency.
We note that reliability and robustness can be composed: a cascade or
distributed MIX with robustness guarantees can be considered as a single
node with its own reputation in a larger MIX-net.

%\subsection{Remailer Statistics}

\subsection{Approaches to MIX-net Reliability}

MIX-net protocols can give specific guarantees of robustness.
Under suitably specified adversary models, these
results may be quite strong,
e.g. ``this distributed MIX delivers correctly if no more than half of
its participating servers are corrupt.'' Such protocols
are often complicated and inefficient.

Levien's statistics pages \cite{levien} present another approach. They
track both remailer
capabilities (such as what kinds of encryption the remailer supports) and
remailer up-times, observed by pinging the machines in question
and by sending test messages through each machine or group of machines.
Such \emph{reputation systems} improve the reliability of MIX-nets by
allowing users to avoid choosing unreliable MIXes.

Instead of engineering the MIX-net protocol directly to provide
reliability, we make use of reputations to track MIX performance. In
this approach, we specify a set of behaviors that characterize a
functioning or failed MIX.  We are not likely to prove strong theorems
--- the goal of reputation is to make the system ``better'' without
guaranteeing perfection. Like Levien's reputation system for
free-route MIX networks, our published reputations enable users to
find better routes. Further, our reputation system works behind the
scenes to build more reliable cascades --- a process that may even be
entirely transparent to the users.
% However, by making publicly visible the
%output of our reputation system, we can both make the network better
%and facilitate better user choice of routes.

Reliability via protocol is the most well-studied approach, while
reliability via reputations in the form of Levien statistics is the
most widely used. Our work combines the two approaches: we modify the
MIX-net protocol to support easier detection of MIX failures and then
specify a suitable reputation system.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\section{How to Randomly Self-build Cascades}

\label{sec:random-selfbuild}
\label{sec:tsbc}

We periodically rearrange the nodes into cascades, so that
cascades reflect recent changes in reliability, and so nodes
in failed cascades can get back in a working cascade.
We choose to rearrange \emph{all} nodes, not just
those from failed cascades; otherwise reliable nodes will
concentrate in stable cascades and unreliable nodes in
unstable ones, making it difficult for new good
nodes to gain reputation.

Cascades rebuild with period $T$ (e.g., one day). By $T-a-b$, each
participant sends a sealed commitment to the Configuration Server
($CS$). At $T-a-b$, the $CS$ publishes the set of commitments.  By
$T-b$, participants should reveal to the $CS$.  At $T$, the $CS$
publishes the set of reveals, along with the configuration of cascades
for that round. Observers can verify that the $CS$ followed the
configuration scheme and used the contribution from each node.

The $CS$ gives each node $N$ a signed receipt $\sign{CS}{commitment,
  timestamp}$.  The period from $T-a-b$ to $T-b$ should be long enough
that anyone within reasonable clock skew can verify that the
commitment phase has truly closed.  Adequate time must be allowed for
nodes to submit their secrets and to make use of a certified delivery
service if their submissions are not accepted and receipts provided by
the $CS$ in a timely manner.  The $CS$ also provides a receipt for the
reveal. With both receipts, $N$ can prove that he should be included
in the next cascade configuration.

Commitment from $N$: $\sign{N}{N, \mathtt{IP, port, bandwidth pledge},
  \tsbc{\rand{N}}}$.

Reveal from $N$: $\sign{N}{N, \mathtt{IP, port, bandwidth pledge, } \rand{N}}$

Here ``$\tsbc{\rand{N}}$'' is $N$'s temporarily-secret bit-commitment
to his random value (to be explained below, in
Section~\ref{subsec:communal-randomness}).  The $CS$ then builds a
new set of cascades for that $T$, arranged according to an
unpredictable value communally generated by the participating mixes.
Thus, no one can control the configuration of cascades.  Bandwidth
might be divided into four categories: 10Kb/s, 100Kb/s, 1Mb/s, 10Mb/s.
Participants choose exactly one of these values for their bandwidth
pledge, and each of the four buckets is formed into a set of cascades
according to the algorithm described in Section~\ref{sec:rep}, using
the unpredictable communal value.

The communal value can be used as a seed to a PRNG, so the amount of
randomness required from each participant is quite small.  If $N$ is
worried that $CS$ will ignore his commitment or reveal (not provide a
receipt), he can use a certified mail delivery system
\cite{riordan-schneier,zhou96certified} to convince other people that
$CS$ is misbehaving. Alternatively, we can build our own certified
delivery system by pre-assigning a set of witnesses (perhaps fifteen)
from the previous $T$. A threshold of these witnesses is sufficient
to prove misbehavior. As long as the adversary does not control too
high a percentage of nodes then we can trust this system (cf.,
Section~\ref{sec:rep} for discussion of how we limit this).


The set of cascades is determined by the communal value that is
publicly verifiably committed when the $CS$ signs and posts the
commitments and their revealed values.  Because a node could still
control the resulting communal value by choosing whether to reveal its
value, the commitment is done such that if it is not revealed, the
$CS$ can uncover it after a predictable amount of computation. Any
committed value that is not revealed by $T-b$ should be computed by
the $CS$ to be revealed at $T$.


\subsection{Communal Randomness via Temporarily Secret Commitments}
\label{subsec:communal-randomness}

Communal randomness, or more precisely a communally determined unpredictable
value, is obtained by collecting random values from participating
MIXes. These are kept secret until everyone has committed so
that no one can predict the result of combining them. The committed
values can be uncovered without help from the committers if necessary
so that no one can alter the communal result in a predictable way by
failing to reveal what she committed. On the other hand, not all the
committed random values can be uncovered by an adversary in time for
him to craft a commitment that will yield a predictable result.

Various approaches to this capability have been published.  In
\cite{goldschlag98}, the committed unpredictability comes from a long
``delaying'' calculation on the result of a (fast) combination of the
inputs from participants. Another similar scheme is given in
\cite{syverson98}.  Both \cite{boneh00} and \cite{syverson98} give commitment
schemes that are individual, as
here; \cite{boneh00} also describes a zero-knowledge proof of a
well-formed commitment.  \cite{goldschlag-etal} further expands and
develops the work in \cite{goldschlag98} and \cite{syverson98}.

We follow the individual commitment approach of
\cite{goldschlag-etal,syverson98} because it is both shortcut
computable and easily commitment-verifiable. Anyone possessing a
secret (in this case the committer) can compute the committed result
quickly.  Once the committed value is revealed, anyone can easily
check that this is the committed value.  Commitments take the form
$\tsbc{\rand{N}} = \langle \enc{K}{\rand{N}}, w(K) \rangle$, the
encryption of $\rand{N}$ with $K$ followed by a TSBC function used to
commit to $K$.  \cite{rsw96} presents a function such that the
committer $N$ can quickly and easily calculate $w(K)$ as well as
$\enc{K}{\rand{N}}$.  Once $K$ has been revealed (or calculated) for
each of the entries, all the keys can be published by the $CS$. Anyone
can quickly verify the communal unpredictable value by performing the
encryptions and computing their amalgamation.

Neither shortcut computability nor easy
commitment-verifiability hold for the collective commitment that first
appeared in \cite{goldschlag98}. The application of commitments to
communal unpredictability is not the central focus of \cite{boneh00}
and is only described briefly.  It is also only described for two
parties producing a random bit; thus only one commitment from one
party is needed to run the protocol. With multiple parties committing
to many bits, malleability of the timed-commitments
\cite{dolev91} may become a factor if the revealed values are
simply XORed together to produce the result.
% (Basically, malleability
%means that given a commitment to $x$, someone can produce a commitment
%such that the other committed value bears a predictable relation to
%$x$, even if he cannot find out anything about $x$ itself.)
The well-formedness proofs may ultimately play a role in the
non-malleability of the commitments since this is similar to how
non-malleability of commitments is proved; we leave this as future work.
Instead, we avoid the issue of malleability by ensuring that the
amalgamation of the revealed values is not affected by correlations
among those revealed values.
For example, if the revealed values are
concatenated in the order the commitments were posted and then
hashed with a well-designed hash function, the result should
be unpredictable if even one of the committed values is
unpredictable.

We could make use of the well-formedness proofs in
\cite{boneh00} to prevent nodes from submitting malformed
commitments without detection, but the computational overhead is not
necessary.
Whether a commitment is malformed can
be confirmed after a predictable amount of
computation. Once it is known to be malformed, the result can either
be discarded or it can be used as if it had been well-formed (details
in \cite{goldschlag-etal,syverson98}).
Because inputs come from nodes with a vested interest in maintaining
reputation, we can use the reputation system to ensure that malformed
commitments are rare. All nodes must commit to participate in the
network, but to minimize the number of malformed inputs or committed
but not overtly revealed inputs, only commitments from high reputation
nodes will contribute to the communal unpredictable value. We might
use inputs from the top half of the nodes (all bandwidths), to make
sure we have enough inputs to make collusion or compromise of the
result unlikely. Any node that commits but does not reveal or puts in
a malformed commitment loses reputation. To minimize ongoing problems
from a misbehaving node, the random input from that node is not used
for a fixed number of subsequent rounds.

If all nodes contributing to the communal value
have revealed secret values that can
be verified as committed during the entry phase, these can be combined
by whatever means we use to yield a random result, e.g., concatenation
and hashing as above.
If not, we use the delaying calculation to uncover those not revealed.
If all such revealed values correspond to
well-formed TSBC entries, the result still remains easily
commitment verifiable. If any of them are malformed, the
communally determined value will only be verifiable by the
corresponding delaying calculation (with parallel or probabilistic
speedup, once it has been done initially, cf.\
\cite{goldschlag-etal,syverson98} for details).


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\section{Reputation System and Cascade Configuration}
\label{sec:rep}

We would like to develop as simple a scoring system as
possible --- simple systems are easier to analyze for security, and they
allow the users to more easily understand the implications of a given
score.  Our system decrements the reputation of all nodes in a failed
cascade, and increments the reputation of all nodes in a successful
cascade. Thus we don't have to worry about pinpointing the cause of
failure. The hope is that reputations will give a general sense of the
reliability of nodes over the long term.

However, because a node's behavior affects the reputation of its
cascade-mates, this introduces a new attack on the reputation system.
When a cascade fails that has fewer bad nodes than good nodes, it does
more damage to the overall reputation of good nodes than bad. Since
bad nodes can intentionally fail a cascade, they can exploit this
vulnerability to gradually reduce the relative reputation of the good
nodes.\footnote{`Good' and `bad' refer to nodes that are honest or
that are part of the adversary, respectively. Because a bad node may
be reliable to break anonymity or reliability,
% Sometimes a bad node
%may be reliable to compromise anonymity.
% Thus `good' and `bad' 
they do not necessarily correspond to `reliable' and `unreliable'.} The
bad nodes ``creep upward'' on the reputation spectrum, eroding
the reputation of nodes above them --- our visual simulations led us
to refer to this attack as the \emph{creeping death}.

Because the bad nodes can gain reputation faster, they can eventually get
to any point on the reputation scale.  Rather than complicating
the reputation system to prevent our adversary from performing the
creeping death attack, we intentionally keep it simple and develop
an algorithm for building cascades that minimizes impact from this
attack. 

%In particular, we have three goals for cascade configuration:
%
%\begin{itemize}
%\item Neutralize effect of creeping death. \emph{Heuristic: choose
%    nodes evenly from the whole space of acceptable nodes.}
%\item Decrease risk of anonymity-breaking attack. \emph{Heuristic: do
%    not choose all nodes in cascades predictably, especially not from
%    any small fraction of the spectrum.}
%\item Increase cost of reliability-breaking attack. \emph{Heuristic:
%    choose nodes from the high-reputation end of the spectrum.}
%\end{itemize}
%
%These goals are clearly at odds, based on the heuristics described for
%each of them.

Since bad nodes can position themselves anywhere (reputation-wise) to
increase the chance of winning a whole cascade, the optimal strategy
to protect anonymity chooses nodes for each cascade
entirely at random.
% The adversary's ability to position himself
%doesn't help him capture a whole cascade.
But we also want to
increase the cost of reliability breaking, so that the adversary can
only affect cascades likely to be highly desired for use if he runs
reliable nodes himself; simply getting nodes into the system should have
less impact. We thus combine these approaches
and begin choosing cascade nodes randomly (using the random seed
from the method of Section~\ref{sec:tsbc}) from an adequately large set
of nodes, but still of the highest possible reputation. (See
Section~\ref{subsec:cascade-building} for details.)

While the reputation system reduces the impact of an adversary that
merely gets nodes accepted into the system, the creeping death attack
allows a resourceful adversary to quickly move up on the reputation
spectrum. An adversary with many nodes
can still succeed at breaking reliability and anonymity.
We prevent our adversary from
creating a multitude of identities and flooding the system with them
(an attack known as \emph{pseudospoofing})
with an identity-based barrier to entry: a web
of trust like Advogato \cite{advogato}. A web of
trust 
%does not require any controlling or monitoring authority, and
%still
% As opposed to a centralized identity scheme
%(e.g. drivers licenses),
allows us to limit the number of nodes an adversary can get certified.
\cite{advogato} gives a proof that the number of bad nodes accepted by
the web is limited by the number of honest members that might assign
trust to the adversary (\emph{confused} nodes) --- not
the number of nodes the adversary creates.  If we pick the seeds of the
web carefully and make some assumptions about the number and position
of confused nodes, we can bound the fraction of bad nodes in the system.

Alice should certify Bob based on whether she believes him to be
trustworthy (to be a real person with good intentions). If she certifies
based on expected performance, the adversary can simply run convincingly
reliable nodes. Certification aims only to bound the total percentage
of adversary-owned nodes in the system.

On the other hand, we might ask Alice to not certify Bob if she believes
he might be unreliable. However, this imposes a greater burden on Alice,
and also doesn't account for the fact that Bob's behavior can change over
time (whereas certification is a one-time event). Instead, the reliability
of an admitted node will be determined by the reputation system. Nodes
that are the most reliable over time will have the highest reputations.

\subsection{Building Cascades}
\label{subsec:cascade-building}

It is tempting to believe that some alternative reputation system or
cascade algorithm can reduce or simplify the problem introduced by
creeping death. For instance, we might punish nodes incrementally more
as they have more failures on record. But this is exactly the
problem --- honest nodes \emph{do} fail more often than adversary nodes.
For any pattern that we look for, our adversary can arrange it so
honest nodes fit that pattern better or more often than bad nodes.

Consider cascades with only two nodes.
%, and assume cascades with two good
%nodes or two bad nodes try to succeed.
In cascades with one good and one bad,
the bad node can hurt only one good node, so by construction bad nodes
do equal damage to good nodes.  But even this system falls prey to the
creeping death.  Since a cascade can only fail if one of its nodes writes
the ``We failed'' certificate, the both-bad case will \emph{never} fail
(the nodes simply never write the certificate), whereas a both-good case
will sometimes legitimately fail. Thus every both-bad cascade can stop
functioning immediately, yet the bad reputations increase faster than
the good reputations over time.

%We might change the cascade building system so it builds cascades with
%only two nodes. That way there are three cases: honest-honest,
%bad-honest, and bad-bad. In the honest-honest and bad-bad cases,
%presumably the node succeeds. In the bad-honest case, the bad node can
%hurt only one good node, so by construction bad nodes can only do
%equal damage to good nodes. (To maintain anonymity, users would have
%to chain cascades. Basically we're building a buddy system in a free
%route MIX environment.)  But even this system falls prey to the
%creeping death. Since a cascade can only fail if one of its nodes
%writes the ``We failed'' certificate, then a bad-bad case will
%\emph{never} fail (the nodes simply never write the certificate),
%whereas a good-good case will sometimes legitimately fail. So in this
%case every bad-bad cascade can stop functioning immediately, yet the
%bad reputations increase faster than the good reputations over time.

%It appears that we cannot easily do things to reduce or prevent
%creeping death; however,
While we cannot easily reduce or prevent creeping death, we can choose
cascades to produce acceptable and predictable risk and reliability
despite creeping death. Reasonable anonymity protection
may require chaining cascades into longer paths.

We order the nodes by reputation, and choose nodes for the first cascade
randomly from within a pool of nodes at the top of the reputation
spectrum. (Randomness is obtained from the seed chosen in the method of
Section~\ref{subsec:communal-randomness}.)  Next, add to the pool enough
next-highest reputation nodes to maintain its size and pick another
cascade at random. This continues until the last cascade for which an
adequate pool size can be maintained. At that point, the remaining
nodes are formed into cascades at random. 

How do we decide this pool size? Assume the following notation:

\begin{itemize}
  
\item $p$ = fraction of nodes that are bad,
\item $s$ = scare factor: acceptable probability of adversary-controlled path,
\item $r$ = range: size of the pool from which nodes are chosen for a
  single cascade,
\item $l$ = length of a single cascade,
\item $c$ = chain length: number of cascades chained together,
\end{itemize}


For determining the range, we assume a worst case for adversary
distribution, because of creeping death. That is, we always assume the
%adversary resides entirely in the nodes not yet chosen in cascade
%formation. This means
adversary resides entirely in the pool of nodes from which we're
picking a cascade. This means

$$
\left(\frac{p}{r}\right)^{lc} =s \;\;\; \mbox{ and so} \;\;\; r=
\frac{p}{s^{\frac{1}{lc}}}$$

For example, suppose that cascades are of length $4$, there are
not more than 20\% bad nodes altogether, and it is acceptable that one
of every hundred thousand paths (cascade chains) is completely bad ---
meaning messages through it are compromised.  Also assume that we chain
three cascades to reduce the odds that all nodes traversed are bad.\footnote{A
path length of 12 would be absurd with current remailers; but the whole
point of this design is to improve reliability so long paths are feasible.}
Then

$$r = \frac{.2}{(10^{-5})^{\frac{1}{12}}} = 0.522$$

Thus the first cascade must come from nodes in the top 52.2\% of the
reputation spectrum. The next cascade must be chosen
from the same pool, minus the four nodes of the first cascade and
plus the four next highest reputation nodes. Once the lowest reputation
node has been added to the choice pool, the remaining nodes are just
chosen at random until all the cascades are formed.

A chained cascade path is the same as a single long cascade of length
$lc$ with respect to the odds that all nodes in it are bad, but they
are not the same in general. When a chained cascade fails, only the
offending subcascade is removed from the system for that period and its
nodes decremented in reputation. Also, users may choose to chain
cascades or not, may not always choose to chain in the same way,
or may choose a longer or shorter chain. A user may choose not to
chain because of the computational overhead, the latency, etc. This
choice will afford him improvements in those areas, but at an
increased risk to anonymity. Users should be made aware of the risks.
Our system allows an easy explicit presentation of the relative risks
and tradeoffs.  It also allows us to adjust these at the system level.
For example, if we wish to reduce $r$, we can weaken $s$, or adjust
$l$ or the recommended $c$. Of course if $p$ is high enough, $s$ strong
enough, and we limit $lc$ to some practical bound,
$r$ may exceed $1$ and no network is feasible. Or $r$ may be
close enough to $1$ to render the reputation system largely moot --- but
at least we can calculate this and react accordingly.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\section{The Cascade Protocol, or, When to Fail Your Cascade}
\label{sec:when-to-fail}

Opportunies for misbehavior in cascades fall into three classes:

\begin{enumerate}
\item Entry point: Incoming messages might not be accepted.
\item Inside the cascade: Messages might be replaced with dummy
messages.
\item Exit point: Messages might not be delivered.
\end{enumerate}

Each MIX can test its cascade by sending and
receiving messages using ordinary-looking external addresses ---
but spoofing or maintaining plausible external addresses is hard.
Instead, we protect against this ``selectively process the test messages''
attack by relaying traffic through other nodes in the cascade and allowing
them to undetectably insert test messages.
All $l$ nodes in the cascade (typically $l$ might be 4 or 5) accept
$\frac{1}{l}$ of the total traffic, and deliver the messages to the
head of the cascade. The head publishes a snapshot of the batch (a set
of hashes of each message) as he processes it.

A sender Alice
can ask for the snapshot to verify that her message got into the
batch. If not, she concludes that either the head or the node
she used was dishonest, and goes to a different node or cascade.
As an optimization, nodes that accept messages can give Alice a receipt
if they accept her message. If her message does not make it into the batch,
Alice can broadcast the message and the receipt to the other nodes in
the cascade; an honest cascade member will determine that the receipt
should have been honored and fail the cascade.

Because all the messages to the head come from other nodes in the
cascade, these nodes can insert indistinguishable test messages into
the batches. If a test message does not make it to the tail, its sender
fails the cascade. Since other nodes can't tell which messages are test
messages, dishonest nodes risk being caught if they replace even one
message with a dummy message.
% To prevent the head from selectively
%dropping messages from certain senders, cascade nodes address some of
%their test messages from previous senders.
Thus our protocol detects misbehavior at the entry point (Goal 1).

In the naive delivery design, the tail delivers messages and also
broadcasts them to the other nodes in the cascade. Every node attempts
delivery. Since the tail can't tell who wrote a test message, he must
deliver every message to every node in the cascade or risk failing the
cascade. To prevent the tail from selectively dropping messages based
on destination, nodes address some of their test messages to previous
recipients. Thus, the tail must deliver even to a user not known to be
running a node. (This reliability increase must be balanced with possible
spam abuse.)
A more efficient design assumes a PKI which includes all recipients.
In this case, we shortcut the need for broadcasting
when the original delivery attempt produces a signed receipt; we
discuss this more in Section~\ref{subsec:receipts}.
By delivering outgoing traffic to all cascade members, our protocol
detects misbehavior at the exit point (Goal 3).

A dishonest head can
publish a correct batch snapshot but replace its (or a conspirator's)
portion of messages with dummy messages. Because it knows its portion
contains no test messages, all of those
messages will be undetectably lost. We solve this by supporting external
test messages as well. Alice might become suspicious because the cascade
accepts messages but doesn't deliver them; she can
send a test message,
wait a while, and then reveal to everybody how the message should have
decrypted. If at least one node in the cascade is honest, he will agree
that he didn't see the message and fail the cascade. Thus we detect
misbehavior after the batch snapshot is published. (Goal 2).

Senders may want to chain cascades for stronger anonymity.
To make chaining cascades more robust, nodes consider
delivery to a second cascade as a special case. We can
specify the entire cascade rather
than a single node as the next hop in the chain. Each node from the
first cascade chooses a random node in the second cascade and attempts
delivery. Nodes may verify that their message is included in the next
cascade's batch, and claim misbehavior if not. With this
modification, the head of a cascade must be able to detect duplicates
in the batch; however, since all nodes must already detect duplicates to
foil replays, this presents no extra burden.
% replays can't cause false failures, because all nodes remember messages
% they've seen. if they remember them only by hash, are they maintaining
% forward anonymity?

Since senders exposing a faulty cascade have no reason to chain their
test messages through another cascade, some nodes need to explicitly send
external test messages to other cascades and verify their delivery.

Test messages using real addresses help foil time-based intersection
attacks. In a standard MIX network, an adversary with information about
what users are active at what times can quickly narrow down the set of
suspects based on when traffic is seen. An active adversary works even
faster by knocking out suspects until traffic stops. Because cascade
nodes send messages too, an address might get mail at other times as well.

Users worried about profiling should send each message through a
different cascade, so an adversary who owns a few cascades cannot read all
messages. Users worried about message linkability should send all messages
through one cascade: a single compromised cascade can reveal linkability.

A decentralized algorithm would allow users to keep a similar anonymity
set across cascade reconfigurations, further blocking time-based
intersection attacks. On the other hand, an adversary targetting a
specific user benefits from this predictable behavior. We leave this as
future work.

Plaintext messages are distinguishable and so are less reliable. Further,
since test messages with convincing plaintext are hard to write, nodes
are unlikely to address tests to a recipient without a known public key.
Optionally, outside users could contribute convincing plaintext messages
to be used as test messages by a node; that way it could send a plaintext
test message to the recipient and verify its delivery.

%FIXME - analyze the bandwidth overhead of receipts vs no receipts?
%  (i did it in one of my mails, but not cleanly)

\subsection{Delivery Receipts}
\label{subsec:receipts}

Message recipients can give the tail a receipt
when he delivers a message. The tail first attempts to get a receipt,
which he can use to prove that he delivered the message. If he does not
get a receipt (e.g., because the destination address refuses to provide a
receipt or does not exist), he
broadcasts the message to the other members of the cascade, who try to
deliver. In any case they now know that he followed the protocol.

An unhappy sender Alice can contact some cascade node $N$
and claim ``$T$ didn't deliver
my message'', along with a demonstration of the remaining decryptions
%unwrapping the remaining onion
between $N$ and $T$. If $N$ remembers what he passed to $N+1$, Alice
can show $N$ what it should have looked like when it got to $T$.

If $N$ has already heard from $T$ about its attempt to deliver the
message, he knows $T$ is
blameless. If not, he can query $T$ for a receipt --- if $T$ has no
receipt, the cascade failed (either the message never got to $T$, or he
did not deliver it).

Delivery receipts detect misbehavior as long as one of the nodes in the
cascade is honest. If the
honest node is the tail $T$ and the message makes it that far, then the
message will be delivered (cases 1 and 2 handle if the message doesn't
make it that far). If $T$ is bad, either he delivers the message to an honest $N$
and it gets delivered, or he does not and Alice can convince $N$ that
the cascade is misbehaving.

If a recipient is not configured to return a receipt, the delivery still
gets through --- in this case, the message gets broadcast to the other
cascade members, and each of them attempts delivery. In a sense, the
use of receipts is just a bandwidth optimization.

Most related work assumes public keys for recipients are known to all
parties. Without this PKI, the exit node can forge a signed receipt from
the recipient. But since we don't need to link keys to external
identities, a sender Alice can include Bob's signature verification key
in her message, allowing any cascade node to verify Bob's receipt in a
decentralized fashion.
%Our protocol works without
%a widespread PKI, but works better with one.

\subsection{Capacity-attacking Adversary}

Since nodes can refuse incoming messages by falsely claiming to be
full, the number of messages processed by a cascade is at least
proportional to the number of honest nodes in that cascade. By
inserting indistinguishable test messages into its own batches, each
node verifies that the rest of the cascade is successfully decrypting
and passing on that fraction of its bandwidth promise, as well as
guaranteeing that the batch provides a minimum level of anonymity.
Nodes should pad if they don't have a full batch fraction;
by failing the cascade if the amount of traffic coming in from the
previous hop is outside suitable thresholds, each node verifies that
the rest of the cascade is spending (even if wasting) its entire
bandwidth promise.

By not accepting any messages and then not delivering the corresponding
dummy traffic, bad nodes can spend slightly less bandwidth than
good nodes.  But if those bad nodes are delivering messages for the
honest cascade members, they are doing no more damage than if they
simply had not signed up that period.  Thus our design frustrates
a \emph{capacity-breaking adversary} (a special case of the
reliability-breaking adversary).

\subsection{Resource Management and Reputation Servers}

Because the highest-reputation cascade cannot process all of the traffic,
cascades publish available capacity information, including the expected
wait or available quality of service for messages. Users compare
reputation and available QoS from each cascade, thus balancing load
across the cascades.

A group of redundant reputation servers ($RS$) serve current node
state. Nodes give each $RS$ hourly \emph{heartbeat} updates, and deliver
failure messages immediately. Because each node signs and timestamps each
certificate, an $RS$ can at most fail to provide the newest certificate.
Each $RS$ works with the others to ensure correct
data, perhaps by successively signing certificate bundles.
Senders download the entire bundle of certificates
if it is small enough, else they must 
query through the MIX-net or query via Private Information Retrieval
\cite{MALKIN-THESIS} to privately download a random subset. Otherwise,
the adversary could use that information to aid intersection attacks.

Actual deployment is still a complex problem; we leave this as future
work.
%Since deploying the Reputation Servers is still a complex problem,
%we leave this as future work.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\section{Attacks and Defenses}
\label{sec:attacks}

\subsubsection{Attacks on Anonymity}

\begin{description}
\item \emph{Have enough nodes to own an entire cascade.} By using a web
of trust, building cascades from a large enough pool of reliable nodes,
and suggesting a safe minimum chain length,
we control the chance that this attack will succeed.
\item \emph{Gain high reputation to read more traffic.} Similarly, our
cascade building algorithm blocks this attack.
\item \emph{Replay attack, message delaying, etc.} We rely on the standard
defenses offered by MIX-net protocols.
\item \emph{Trickle attack.} If one node in the cascade is honest,
at least $\frac{1}{l}$ of the traffic will be legitimate in every batch.
\item \emph{Intersection attack.} Using MIX cascades rather than
free routes helps to defend against intersection attacks from very large
adversaries \cite{disad-free-routes}. By encouraging users to pad with
dummy messages when not sending traffic, and to continue using
similar anonymity sets across cascade configurations, we can further
complicate intersection attacks. However, a complete solution to the
intersection attack remains an open problem.
\item \emph{Influence cascade configuration externally.} Our algorithm for
generating communally random uncertainty resists individuals and groups,
as detailed in Section~\ref{sec:random-selfbuild}.
\item \emph{Compromise the cascade configuration server.} Because the output
of the $CS$ is publicly verifiable, incorrect behavior can be detected.
\item \emph{Knock down uncompromised cascades to get more traffic.}
While the adversary can knock down other cascades, the low chance of
owning an entire path limits the success of this attack.
\end{description}

\subsubsection{Attacks on Capacity and Reliability}

\begin{description}
\item \emph{Flood nodes with messages.} If this becomes a problem, we
  can integrate a ticket service such as that in \cite{web-mix},
  limiting the number of messages a given identity can generate for
  each batch. Other solutions include proofs of work and other
  micropayment schemes \cite{oreilly-acc,dwork95,breadpudding}.
% \item ask for lots of tickets
\item \emph{Knock down many cascades.} We assume that our adversary is
  not strong enough to knock down all or most of the cascades in the
  system. If he only knocks down some, service continues as normal.
\item \emph{Block commitments to the Configuration Server.} If the
  adversary launches a denial of service attack against the $CS$, the
  participants can together use a decentralized algorithm to simulate
  the $CS$ (all of its operations are public and verifiable).
\item \emph{Flood the $CS$ with commits.} We only consider commits
  from nodes which have been certified in the web of trust, and we
  only need to actually use those commitments from relatively
  high-reputation
nodes.  % see Section~\ref{sec:tsbc}.
\item \emph{Refuse commitments at the Configuration Server.} Because
  we allow certified delivery to the $CS$, refusing commitments can be
  detected by any observer.
\item \emph{Refuse incoming messages as a cascade member.} This attack
  can work to reduce capacity; but the node is still spending most of
  its pledged bandwidth on correctly processing messages from other
  nodes, as well as generating dummy traffic in place of the refused
  traffic.  This attack is very inefficient.
\item \emph{Selectively process only test messages.} Our protocol
  addresses this possibility in Section~\ref{sec:when-to-fail}.
\end{description}

\subsubsection{Attacks on Reputations}

\begin{description}
\item \emph{Beat the web of trust.} The security of the web of trust
is perhaps the most critical assumption of our system. If an adversary
can get a lot of nodes certified without spending effort for each one,
he can start widespread attacks on anonymity, reliability, capacity,
and reputations. On the other hand, getting just a few extra nodes
certified does not buy him much.
\item \emph{Internal selective DoS --- creeping death.} The adversary
can use the creeping death attack (fail the cascade if good nodes lose
more reputation than bad nodes) to gain any position in the reputation
spectrum. Our cascade building algorithm makes this technique ineffective
at breaking anonymity; further, it may prove very expensive in terms of
resources to move a large number of nodes to the top of the reputation
spectrum.
\item \emph{External selective DoS --- knock down the high-reputation
cascades.} Our ``all for one and one for all'' approach to
reputation makes selective denial of service even easier than in
\cite{mix-acc}. Previously, to knock down a reliable node you needed to
successfully flood it or cause it to not process messages correctly. Now
you simply have to locate and knock down the
weakest member of its cascade. However, this vulnerability is acceptable:
removing one cascade does not deny service to the system as a whole,
and as above it does not get you much closer to breaking anonymity.
\end{description}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\section{Future Directions}
\label{sec:conclusion}

We have described a protocol for improving reliability of anonymous
communication networks, based on a MIX cascade design and a simple
reputation system. There are a number of directions for future research:

\begin{itemize}
%\item A working proof-of-bandwidth algorithm (analogous to a
%proof of work) would allow us to reduce our reliance on the web of trust.
\item Better approaches to generating convincing destinations
for dummy traffic, or reducing the bandwidth overhead of the current
approaches, would make the overall design more reasonable.
\item Improved cascade configuration algorithms would allow us to provide
stronger anonymity and reliability.
\item More research on the scope and characteristics of the creeping
death attack might give insight on how to defeat it.
\item More analysis on the attack-resistance of our reputation system
might yield stronger proofs or a better design.  For instance, can we
guarantee bounds on work performed by the adversary under various models?
\item Adapting this design to free-route MIX networks would allow it to
be tested with a wider deployment in the current remailer system.
\end{itemize}

%The improved efficiency and reliability from our MIX-net cascade protocol
%and accompanying reputation system help smooth the way toward practical
%anonymous communication.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\section*{Acknowledgements}

We thank Daniel Barkalow and Nick Mathewson for intelligent discussions
about the reputation systems and cascade configuration; Ian Goldberg
for thoughts on cascade configuration; Geoff Goodell and Marc Waldman
for edits;
Paul van Oorschot for a helpful conversation about hash functions; and
David Molnar for early discussions about the design and for help with
Section~\ref{sec:related}.

\bibliographystyle{plain} \bibliography{casc-rep}

\end{document}


\workingnote{
Another feature of creeping death is not just whether the
bad nodes can position themselves at whatever reputation level they want
but whether they can prevent cumulative reputations from rising,
effectively capping the long-term performance of honest nodes.

Without going into details, using only cumulative reputations it seems
that if more than 21\% of the nodes are bad, the bad nodes can get in
cascades with good nodes often enough to cause them to fail at least
half the time, preventing their reputations from growing.
However, because of the combined cumulative and short-term reputation
system we introduced above, the situation is slightly better. For example,
if a live bad node can only fail a third of the time it is active or
it will return to probation, the percentage of nodes required to
prevent a rise in cumulative reputation is much higher.

%FIXM - expand and smooth this subsec.
%The following trick comments out the whole section for now


%Therefore we need some sort of external monitoring system to enforce
%good behavior of cascades, so we don't let the bad-bad case slip through.
%Let's say we pair up two-node cascades, so there are four nodes thinking
%about each other. One possibility: if both nodes of a pair vote out
%the other pair, then both of the other pair lose reputation. (Say a
%bad-good cascade which disagrees can't vote somebody out.) Yet we still
%have creeping death, where bad-bad cascades *always* vote out good-good
%cascades, and good-good cascades don't always vote out bad-bad cascades.

%(Aside: can we assume that the percentage of bad nodes is so small
%that the chance of getting bad-bad often enough to affect things is
%'too small'? Eg if bad = 1/10, then only 1 in 100 cascades will be
%bad-bad. That might be enough to just let slip through the cracks?)

We can at least argue a simple bound on the fraction of adversary-owned
nodes (or is there a cascade formation strategy better than our current
one-three approach) such that the bad guys can creeping death to the
top, but they can only stay at the top if they let reputations generally
increase.

Assume $q$ of the nodes are bad. In order to push reputations down,
the adversary needs to be able to touch every cascade more than half
the time. If he can't touch cascades half the time, reputations will
trend upward.

In the first approximation, $.5^{\frac{1}{4}} = 84\%$ is the chance of
having all 4 nodes good in a cascade at least half the time (so
$q=.16$). With a ratio less than that, the bad guys simply don't have
enough nodes to touch all the cascades often enough.

It's actually even better than that, since an all-good cascade has 4
good people whereas a 1-bad/3-good cascade has only 3 good people, and
a 2-bad/2-good only 2.

Eg if $q=.21$, and we fail nodes where we have either 1 or 2 bad people in
them, then we will be failing 41.4\% of the cascades with 1 bad (hitting
31\% good nodes), and 16.5\% of the cascades with 2 bad (hitting 8.25\%
good nodes). Thus we'll only be touching 39.3\% good nodes out of the 79\%
total, or 49.77\% of the good nodes---we are touching less than half
the good nodes typically.

But with $q>.21$, it looks like our adversary can touch each honest node at
least half the time. Since failing the above two types of cascades (3 honest
and 2 honest) does no more damage to bad reputations than to good reputations,
the bad nodes go down at most as quickly as good nodes, so they can continue
that behavior indefinitely.


\subsection{how to manage reputations, types of reputation}
\label{subsec:rep-logistics}

% - reputation works with lower bandwidth pledges but not
%   higher.

%reputation for good service, reputation for consistently revealing,
%reputation for not being a pseudospoofer (see web of trust, below),
%...
%
%\subsection{how to use reputations to pick cascades}

}

\workingnote{
We might try to get a bound on the
fraction of adversary-owned nodes that we might see in "live" cascades
(those used by actual users). With such a bound, we can make
statements about the level of reliability or anonymity our system can
provide.  We place new or unreliable nodes in a probationary phase.
Such nodes do not participate in live cascades (those used by
actual users). Nodes in this phase effectively perform a ``proof of
bandwidth and of computation'' to demonstrate that they can reliably
pass bits and peforms acceptable work.

We originally intended to use this phase to force an adversary to
spend resources for each node he signs up, thus limiting the number of
nodes that he can afford to get past the probationary period. In other
words, while we might start out with an arbitrary fraction of
adversary-owned nodes, we hoped to end up with a known bound on that
fraction. We abandoned that route for two reasons:

\begin{itemize}
\item A large or skilled adversary can easily match resources with a group
of volunteers, particularly if the adversary can compromise
other machines around the Internet and commandeer their resources.
\item The adversary can cheat at any of our decentralized proof of bandwidth
schemes by skimping on all the bandwidth between his own
nodes. If he controls $n$ consecutive nodes in a cascade, he need only do work
for one of them (or two, if his nodes are scattered rather than on the same
network). If he owns all the nodes in a cascade, he does no work at
all. An adversary who owns most of the nodes in the system
will often be able to pass proof of bandwidth tests without spending
any bandwidth.
\end{itemize}

By combining the web of trust model with our probationary phase, we
can ensure that people who pass the probationary phase are likely to
be running reliable nodes. We divide all the participants into the
live cascades and the probationary cascades, by placing a ``bar'' on
the reputation spectrum.

We could place the bar at some fixed minimum reputation, so all nodes
with reputations under a certain value $r$ (e.g., five) are matched with
each other and process only dummy messages. Thus any node which
demonstrates its reliability gets used for the live cascades.
In this case, a
reliability-breaking adversary can get above the bar, fail half the
time, and always be assured of staying above the bar.

We could counter this by dividing such that some fraction $L$ of them
(eg 20\%) are below the bar, except we make sure that all nodes with
reputations less than $r$ are also below the bar. Because the bar is a
moving target, our reliability-breaking adversary can at his most
damaging match the performance of the nodes around him---if he does
less, then he will eventually slip below the bar.  Placing the bar
higher than all low-reputation participants guarantees us some minimum
level of reliability. This approach ``wastes'' the people who are
above reputation $r$ but still below the bar; in particular, if the
reputations of participants generally trend upward, new participants
may have to do months of good service before they get above the bar.
Also, absent a significant reliability-breaking adversary, as it
becomes less worth it to try to reach the rising bar, those that
have been performing reliably for some time will suddenly find
themselves in the bottom twenty percent. If they get frustrated and
leave, the whole reputation system could quickly crash.

We could counter that by having only the last so many days count
towards reputation, e.g., thirty. This effectively puts an upper bound
on reputation. While some features of creeping death are complicated
thereby, reliably providing service for years matters no more than
providing it for thirty days.  Having such a short memory for node
performance makes it easier to degrade the
reputation of established nodes in particular and the reliability of
the system in general.

Our suggested solution is to use both cumulative and short-term reputations.
The cumulative reputation builds as previously described. To
participate as a live cascade, a node must have a cumulative reputation
$R$ of, e.g., five.
%FIXM what? "(as well as an adequate trust score)"
These can
build indefinitely; thus over time it will become increasingly expensive
for an adversary to affect a long-term reputation.

A node must also have an adequate short-term reputation $S$
measured both over the sliding window of the past $W$ days and the
past $A < W$ active days in that period. For example, if $S = 5$,
$A = 15$, and $W = 50$, in order to avoid going back to
probation the node must be active for at least $5$ of the last $50$
days and essentially fail no more than a third of the active days
within the last $50$. Days and reputation are calculated over both
probationary and live activity.

Probationary proof-of-bandwidth is done in cascades with other nodes
that are on probation from within the same reputation quadrant (or the
highest reputation nodes from below to complete a partial cascade).
Thus new probationary nodes have limited effect on the
probationary status of high reputation nodes that have been
temporarily inactive or recently buggy.

%FIXM
%We may want to put the people with reputation 0 into their own category,
%so if there are many of them, they don't keep dragging everybody in
%the probationary phase back down to 0. Since pseudospoofing is less of
%a problem due to our web of trust system, we can even consider letting
%people go negative.
%% should think more about this

Finally, since we've added a web of trust system to reduce damage from
pseudospoofing, can we remove our reputation system and let the web of
trust do all the work?  There are two reasons for using a reputation
system in addition to the web of trust:

\begin{itemize}
\item A separate reputation system requires less effort and attention
from human operators. Operators don't need to certify based on expected
reliability, but rather only on perceived honesty. Because unreliable
nodes aren't used for live cascades, operators don't have to worry about
revoking certification for later poor reliability.
\item By choosing one member of each cascade from the high end of the
reputation spectrum, we make it harder for an adversary to get all
the nodes in a cascade. Such cascades provide stronger anonymity. Without
a reputation system, we would have no mechanism for comparing and choosing
the higher-reputation nodes.
\end{itemize}

}


%Further, if we divide the set of participants into
%quadrants by reputation (if there are 4 nodes in a cascade), and build
%cascades one node from each quadrant, then in order to win a cascade,
%the adversary now needs to have a good chance of getting picked from
%each of the four quadrants. (The simple strategy of
%creeping all his nodes in one block to the top will
%mean
%Yet, an adversary who can get a foothold
%on many different cascades can break reliability. For example, if he
%runs a quarter of the participants and he places them all at the
%lowest (cheapest) quadrant, he gets a node in every single cascade. To
%make this harder, we modify the cascade configuration algorithm: we
%begin at the top and build cascades with random choices from an
%acceptably wide range at the top and then repeat going down the
%spectrum until all the cascades are formed.

%We need some mechanism for limiting the fraction of
%adversary-controlled participants in the MIX network, and for making
%sure that we don't use unreliable nodes in our cascades; we discuss
%this issue more thoroughly in Section~\ref{subsec:pseudospoofing}.
%Section~\ref{subsec:cascade-building} considers the issue of the
%creeping death attack in more detail and explains our cascade
%formation and reputation algorithms. 
%We go on in
%Section~\ref{subsec:rep-logistics} to discuss the logistics of how to
%retrieve and update reputations from a centralized or distributed
%reputation server, as well as the types of reputation that we need to
%remember for the MIX network.


%\subsection{Pseudospoofing and consistently unreliable nodes}
%\label{subsec:pseudospoofing}