\chapter{System Design and Overview}

\section{System Summary}

The overall system consists of the publication system, which is responsible for
storing and serving documents; and the communications channel, which is responsible
for providing confidential and anonymous communications between parties. This
chapter focuses on the design of the publication system as a back-end for the
communications channel.

The agents in our publication system are the {\bf author}, the {\bf server}, and the
{\bf reader}. These agents are layered over the communications channel; currently they
communicate with one another via addresses which are implemented as remailer reply blocks.
Authors are agents that produce documents and wish to store them in the service;
servers are computers which store data for authors; and
readers are users (inside or outside the servnet) who retrieve documents from the service.

Free Haven is based on a community of servers (which as a whole is
termed the `servnet') where each server hosts data from the other servers
in exchange for the opportunity to store data of its own in the servnet.
The servnet is dynamic: data moves from one server to another every so often,
based partly on chance and partly on each server's trust of the others. Servers
transfer data by trading. This means that the only way to introduce a new file
into the system is for a given server to use (and thus provide) more space on its
local system. This new file will migrate to other servers by the process of trading.

Each server has a public key and one (or perhaps more) remailer reply
blocks, which together can be used to provide secure, authenticated, pseudonymous
communication with that server.
Every machine in the servnet remembers information for some
of the machines in the servnet (the public key, a remailer block or
address, and some characterization of trust for each machine).  

Only machines in the servnet are allowed to insert files into the network.
The amount of storage space a given machine can use is limited by the
amount of space it is willing to host -- this limitation is loosely enforced by
other servers losing trust in a server that `drops' data.

Authors assign an expiration date to documents when they are published; servers
make a promise to maintain the availability of a given document until its
expiration date is reached. The trust system is used to keep track of which other
servers are likely to keep this promise.


\begin{figure}[htc]
\begin{center}
\leavevmode
\label{design-overall}
% \epsfxsize=3in
\epsfbox{fh-design-overall.eps}
\caption{Structure of a Free Haven server}
\end{center}
\end{figure}

The big picture for the structure of a Free Haven server is shown in Figure \ref{design-overall}.
The control center is located in the `Haven Module': it includes a number of
vital operations such as the trading module and the trust module. The `Comm
Module' is responsible for communications with other Free Haven servers;
this communication is currently performed via a mixnet, but the system is
modular enough that the communications channel could be replaced simply by
replacing part of the Comm Module (without affecting the rest of the server).
Both the Haven Module and the Comm Module speak to the Node Database, which is
a master list of all known servers and all known shares. This Node Database
integrates trust information with incoming information about new shares and new
servers, to provide answers to a wide range of questions. For instance, the
Comm Module might ask the Node Database for the mixnet address associated with a
given public key (pseudonym); or the trust module inside the Haven Module might
ask the Node Database for a list of candidate servers who might be interested in
a share with a certain size and expiration date.

These three modules (the Comm Module, the Haven Module, and the Node Database)
are sufficiently distinct that they are designed to run on separate computers
within an intranet. This means that a single Comm Module can service multiple
Haven Modules, or a single Node Database can be used for several different (fully trusted)
servers.

The Haven Module controls a number of aspects of operation of the
system as a whole:

\begin{itemize}
\item {\bf Storing documents:}
When an author wishes to publish a document, she breaks the document into
shares, where a subset (any $k$ of $n$) is sufficient to reconstruct
the document, and then for each share, negotiates for some server to
publish that share on the servnet. The share module is responsible for
handling the arrival of new documents, and maintaining a list of current
documents on that server's Node Database.
\item {\bf Supplying documents:}
When a reader wishes to retrieve a document from the servnet,
she requests it from any server, including a location and key which can be
used to deliver the document in a private manner. This server broadcasts
the request to all other servers, and those which are holding shares for
that document encrypt them and deliver them to the reader's location.
The share module within the Haven Module is responsible for receiving document
requests, identifying which shares are located on the server, and supplying
these shares to the reader.
\item {\bf Expiring documents:}
The share module within the Haven Module is responsible for recognizing when
shares have expired, as well as maintaining sufficient space on the system
for incoming shares and scratch space for trade requests.
\item {\bf Handling trade requests:}
The servers trade shares around behind the scenes. The trade module
within the Haven Module is responsible for recognizing trade requests,
asking the trust module to evaluate the reliability of the server and
fairness of the offer, and choosing a new share to offer in return.
\item {\bf Initiating trade requests by share:}
The trade module is also responsible for periodically identifying shares
which have been on the server for a sufficient duration. The trade module
should query the trust module to determine a suitable server to which to
offer the share.
\item {\bf Initiating trade requests by server:}
The trade module should also periodically query the trust module to determine
if there are any servers which should be `tested' to increase confidence in the
trust module's measure of trust in that server.
\item {\bf Handling server introductions:}
One of the most important parts of the design of Free Haven is the
capacity to seamlessly integrate new servers.
The trust system is responsible for recognizing when requests and broadcasts
are arriving from unknown servers, and then initiating the correct messages to
the Comm Module and other servers to arrange to receive contact information
for those new servers.
\item {\bf Expiring old servers:}
Unavailable servers need to be flagged in the database such that the Comm Module
does not include them in broadcasts, since continued
mail bombardment from the mixnet to a closed account will eventually slow
down the service (as well as anger a lot of systems administrators). 
The trade module is responsible for recognizing when requests are ignored
and informing the trust module (which will pass it on to the Node Database).
The Comm Module takes care of reactivating a server which has been
flagged as expired; it notices this when a message arrives from that server.
\item {\bf Initiating trust broadcasts:}
Periodically, the trust module will select a server from the Node Database
and broadcast its current opinion of that server's reliability. This broadcast
allows other servers to keep informed of actions which do not affect them
directly.
\item {\bf Handling trust broadcasts:}
When a trust broadcast arrives regarding a given server, the trust module must integrate this new
information into its evaluation of that server. This is based in part on the
current trust of the broadcasting server, the current trust of the subject of
the broadcast, and the actions described in the broadcast, if any.
\item {\bf Initiating buddy checks:}
To maintain some level of accountability, shares employ what is
essentially the `buddy system': servers which drop shares or are
otherwise unreliable get noticed after a while, and are trusted less.
The buddy module within the Haven Module is responsible for keeping track
of the location of the buddy of each share the server currently possesses.
Periodically, it should perform a document request for this share, to verify
continued availability of this share.
\item {\bf Handling buddy checks:}
Buddy checks are identical to normal document requests, as above. They
must be identical to prevent servers from replying only to buddy
checks and thus trick other servers into believing that they are still
supplying the document to readers as well.
\item {\bf Initiating buddy broadcasts:}
When the buddy module concludes that the buddy for one of that server's current
shares is lost, it can choose to perform a `buddy broadcast', which is essentially
a notification to the trust modules of other servers that it believes the share
to have been lost by a given server.
\item {\bf Handling buddy broadcasts:}
The trust module is responsible for receiving and interpreting buddy
broadcasts.  These should modify the trust of the subject of the
broadcast, based again on current trust of the broadcasting server,
the current trust of the subject of the broadcast, and other relevant
factors.
\end{itemize}

% \input ./fh-design-comm.tex
% empty because it's now a separate chapter

\section{Supported Operations}

There are two operations which any publishing service must provide: adding a
document to the system, and retrieving a document from the system. Free Haven
provides these operations, and also supports a number of behind the scenes
operations such as trading, expiration, and rudimentary support for enforcing
accountability without sacrificing anonymity.

%\subsection{Notation}
%
%We developed a notation for describing movement of shares between servnet
%nodes and location of shares with respect to servnet nodes, to help in
%concisely characterizing the results of a given operation. We refer the
%reader to the Glossary of Notation in Appendix \ref{app:notation} for a more thorough
%summary of the notation.

\subsection{Storing}

When an author (call her Alice) wishes to publish a new file into Free Haven, she
must first identify a Free Haven node which is willing to store this
document for her. The exact method of identifying and contacting this
node is outside the scope of the Free Haven design per se; some possibilities
include:
\begin{itemize}
\item Alice is running a node herself. In this case, she would presumably be
willing to store her own file.
\item Some of the nodes are publically contactable, via a website or some other
public interface. These nodes are willing to introduce Alice to other
private nodes, are willing to store Alice's file, or are willing to sell Alice
some space on their systems.
\item Some of these nodes have reply blocks (and their associated key) published
on some site associated with Free Haven, and might be willing to publish files
that Alice delivers through the mixnet.
\end{itemize}

Alice may choose to deliver her file to the given node (if she herself isn't
the node) via an anonymous remailer or other one-way anonymous communications channel, to
provide strong author-anonymity as described in Chapter \ref{chap:anon}.
Alice may also choose to encrypt the file before transmitting it -- since
any reader can retrieve any file if he knows its lookup key (as below), we
leave this level of privacy up to Alice's discretion. Alice may also choose
to divide the file into new files if it's too long, or pad it if it's too short,
based on what sort of file sizes the server she finds is willing to accept.
(Prudent choices from the server's perspective are discussed in more depth
below under Trading.)

When the publishing server (call him Phil) wants to introduce the new file
into the servnet, he breaks the file into shares using
Rabin's information dispersal algorithm \cite{rabin-ida},
%{\footnote {\tt
%http://www.eecs.harvard.edu/\~{}india/ida/}}
and then for each share, he finds a machine on the
% (more information
%about the IDA, including the math behind it, can be found in Appendix
%\ref{app:ida})
servnet which he trusts and which is willing to make the trade for that share.
More precisely, he performs the following steps.

\begin{enumerate}
\item Break the file $F$ into shares $f_1, \dots, f_n$ where any $k$ shares are
sufficient to recreate the file.
\item Generate a public/private key pair $(PK,SK)$ to be used for signing each of the
shares.
\item For each share, build a data segment
as described below under `Composition of a Share'
and sign this segment with SK.
%In this case, index is some deterministic value which indicates which share this is.
\item Enter the shares into the local server's space.
\end{enumerate}

Really, Alice may perform steps 1 and 2 herself, and thus build the shares
herself. A well-behaved server should be able to handle doing the IDA process
itself as well as receive shares that have already been split. Note that
accepting pre-built shares seems to require more trust of Alice than receiving her
file, examining it, and then building a set of shares from it.
% XXX should examine further

% discuss k (say, $k=\frac{n}{2}$).
The choice of the robustness parameter $k$ is a crucial part of adding a document
to the system. A large value of $k$ relative to $n$ makes the file more brittle,
because it will be unrecoverable after a few shares are lost. On the other hand,
a smaller value of $k$ implies a larger share size, since more data is stored
in each share. This parameter $k$ should probably be chosen based on some compromise
between the importance of the file and the size and available space.

A more considered value for $k$ should be available based on the results of
modelling the mixnet and servnet to determine how many documents are lost in
what sort of situations.
The brittleness of documents in the servnet is affected by a variety of
factors, including percentage of misbehaving nodes, reliability of the
network itself, trading frequency, and the rate at which new servers
join the servnet and old servers disappear.

\subsection{Retrieving}

In order to retrieve a file, a client must first know the $PK$ which
was used to sign the shares.  He learns this from a post to Usenet or
some similar external means (or because he was the original author of
the document) -- see Section \ref{sec:directories} on how to integrate
Directory Services with Free Haven.  From here, he must locate a servnet
server that is willing to do the query for him.  The server can be
contacted over its remailer address, but presumably there will be public
cgi's available which automate the process of doing the document query.
The reader comes up with a key pair $(PK,SK)$ for this transaction,
as well as a one-time remailer reply block.  The servnet server does a
broadcast of ``(`request', $PK_{file}$, $PK_{client}$, reply block)''
to all servnet nodes that it knows about. These broadcasts can be queued
and then sent out in bulk to conserve bandwidth (since the mixnets are
going to introduce some delay as it is, adding a bit more by sending
requests perhaps once an hour will not significantly affect the latency).

Each server that receives the query will check to see if it has any
shares with the requested $PK_{file}$, and if it does it will encrypt
each share in the enclosed public key $PK_{client}$, and then send the
encrypted share through the remailer to the enclosed address. These
shares will magically arrive out of the ether at their destination;
once enough shares arrive, the client recreates the file and is done. If
not enough shares arrive, then the client has failed to obtain the file
(most likely, it is lost).  
%Hopefully this case will never happen,
%since we'll take active steps to maintain availability 
%for each share in the face of compromised servers.

The broadcast request can optionally specify which share index is desired.
This allows clients who only want one share to be able to query them
specifically. For instance, a node wanting to confirm that a given share
is still `alive' might query only for that share. This extension must be
implemented with care, though -- if a particular operation (such as buddy
checking, below) is implemented by always querying for a specific share
of a document, then nodes may be able to differentiate between different
classes of queries (e.g., querying to retrieve a document and querying to
determine if a document is still available) and selectively respond. This
would allow them to convince an auditor that the server is obeying its
protocol, while at the same time the server does not answer any document
requests from actual readers attempting to retrieve those documents.
This means that in simple implementations, querying by specific share
should be disallowed; in more complex implementations, all operations
should support (and have an equal probability distribution for) querying
either by entire document or by specific share.

Additionally, the broadcast document request can optionally be signed
by the servnet node performing the broadcast. This has a number of
ramifications. First of all, it introduces more complexity in the
problem described above, namely in trying to ensure that queries cannot
be selectively answered by a server attempting to gain trust without
serving documents to readers. On the other hand, it allows slightly more
accountability on the part of servers: this might be useful in the case
of denial of service flooding attacks, wherein an attacker might flood
a given node with document requests with the intent either of spamming
some helpless victim, or of saturating the bandwidth resources of the
server. In this circumstance a server might gain some defense by dropping
incoming requests that are not signed by a trusted peer server.

%More formally, in a document request an anonymous reader sends a
%message to every servnet node with a description of the
%document it wants and a reply block. Then every node
%which has shares of this document encrypts the corresponding shares,
%and sends them to the specified reply block.

%$$\forall A_?,   \emptyset \rightarrow A_? : \{ T, \Phi.key, <B>, E \} $$

%$$\forall i \forall A_{\Phi_i},  A_{\Phi_i} \rightarrow <B> : \{ E(\Phi_i) \} $$

%In this case, a message appears `out of the ether' to every node in the system.
%For each node that contains a share of the document $Phi$, this node encrypts
%the document using the specified encryption function $E$, and then delivers that
%document to the reply block described by $<B>$.

\section{Expiration}

Each share includes an expiration date. This is an absolute (as opposed
to relative) timestamp which indicates the time after which the hosting server
may delete the share with no ill consequences.
This means that if an operator wants to cease providing his node as a
Free Haven server, the protocol provides him with a polite way of
exiting the system: wait until all the shares he has expire. One way
to hasten this would be to trade away for a very large share that
expires soon -- see below under Trading for more details on this.

Expiration dates should be chosen based on how long the
introducing server wants the data to last, compromising based on size of
file and likelihood of finding a server willing to make the trade. Servers
should be wary of accepting shares that they will not be able
to trade away easily, because then they might be committed to keeping
that file on their system until it expires.

This last point has strong implications for the stability of shares in the
system -- shares which are nonstandard either in size or in duration may
well be more fragile than shares which are closer to average. One reason
for this is that shares with particularly long durations are simply subjected to
more chances to be destroyed over the course of their lifetime. A more subtle
reason for this is that a very large share may be difficult to successfully
trade away; indeed, it may turn out that servers which accept extraordinarily
large shares have a greater tendency to be unreliable. (Perhaps, for instance,
the server accepts such shares due to negligence on the part
of the server operator, which might be indicative of other stability problems.)

\subsection{Revocation}

The ability to revoke or unpublish shares would provide much greater
flexibility to the Free Haven system. Specifically, this would allow
a much more realistic emulation of an actual read-write filesystem,
where published documents could be updated as newer versions became
available. Indeed, it also allows political dissidents who publish under
their real name to realize their mistake and unpublish the documents.

Revocation could be implemented by allowing the author to come up with
a random private value $x$, and then publishing $H(x)$ inside each
share. If the author wanted to unpublish a document, he would broadcast
an unpublish request along with his original value $x$ (and also $H(x)$
for the sake of convenience and efficiency), and all servers which were
currently holding shares of the document would expire them.

However, there are a number of extra attacks this allows on the system:
\begin{itemize}
\item It complicates the buddy system greatly, since we are not sure that
the unpublish request would reach the buddy of a given share. Indeed,
an adversary might send unpublishing requests to some members of the
servnet and not others, in an attempt to cause havoc in the trust system,
or even to try to gain insight into the current location of some shares.
\item Authors might use the same hash for new shares, and thus `link'
documents. Adversaries might also use the same $H(x)$ even though they
are unaware of the value of $x$: this would cause artificial linking,
as observers might conclude that the publisher of the original document
also published the later documents.
\item The presence of an unpublishing tag $H(x)$ in a share assigns
a sort of `ownership' to a share that is not present otherwise. This
may have subtle implications towards publisher and reader anonymity --
for instance, a publisher who remembers his $x$ has evidence on his
computer that he was associated with that share, thus breaking perfect
forward author-anonymity.
\end{itemize}

In addition, if revocation exists, then 
%the Church of Scientology (or
%other relevant intelligence agency) 
a corrupt police force or intelligence agency
has an incentive to track down the
original author of the document, because chances are good that he still has
the value $x$ which would allow them to remove the document from Free
Haven. Even if the author immediately destroys his $x$, the adversary has
sufficient reason to suspect that he still has it that it is worthwhile
for them to spend resources tracking him.

This problem can be ameliorated by making the unpublishing tag
optional.  This means that the share itself will make it clear whether
that share can be unpublished, so if no unpublishing tag is present,
there should be no reason to try to track down the author.  If an
adversary wishes to create a pretext to hunt down the publisher of a document,
however, he can still republish the document {\em with} a revocation
tag, and use that as `reasonable cause'.

An alternate solution, given that we are willing to accept some amount of
linkability in exchange for the ability to perform document revocation,
is to simply allow the publisher to remember the key $K$ with which he
originally signed each share of the document. Thus instead of generating a
separate value $x$ simply for the sake of revocation, we could allow the
publisher to revoke his document by transmitting $\sigma_K$(`revoke').
We can further allow the `non-revocable' notion on a share by allowing an
explicit tag called {\tt {<revocable>}} which if present would indicate that
`revoke' messages should be honored. Publishers could similarly yield their
original key $K$ to some trusted agency. However, this alternate mechanism
for revocation is still susceptible to many of the above attacks.

Because the ability to revoke shares potentially puts the original
publisher in increased physical danger, as well as increasing the set
of attacks on the servnet infrastructure, we chose to leave revocation
out of the current design.

\section{Accountability and Redundancy}

Without some sort of server accountability, shares could get swallowed by
malicious nodes and nobody would ever notice. This lack of accountability has two ramifications.
First of all, over time shares will disappear, eventually causing
files to be unrecoverable. Secondly, malicious nodes can continue to
be malicious if there are no mechanisms in place for identifying and
excising ill-behaved nodes.

One solution to this would be for the publisher of each share to maintain
his own local copy, and periodically query the servnet for the share. If
the share has disappeared, he will simply drop some of the data currently
on his system and insert his backup copy of the share. (At this point, he
might stop trusting the server to which he traded that share.) However,
this makes the entire system very fragile, as well as making it slow to
react to malicious nodes.

A better solution would be to somehow keep track of the current location
of each share.  Thus a given server would know exactly when
a share disappeared and which server was responsible for it at the
time. Broadcasting knowledge of bad nodes would quickly update the
servnet on the presence and behavior of suspected malicious nodes,
limiting the amount of damage that can be done. This solution could be
implemented by assigning a fixed and permanent `shepherd' to each share
upon publication (it would be permanent because the share is signed with
the shepherd inside); every time a share is traded around the servnet,
then both servers involved in that trade would send updates to the
shepherd indicating that one of them had relinquished responsibility
for the share and the other had taken responsibility for it. These updates would
allow the shepherd to maintain a good idea of the current location of
its associated share.

However, having a static location for keeping track of location
information for each share defeats the purpose of having them scattered
throughout the servnet. Specifically, this provides a centralized target
for anyone wishing to learn more about (or attack) a given share. (One might
argue that as Free Haven scales, some form of directory capability is going
to be necessary, and thus maintaining knowledge of location of shares will
be necessary anyway. However, this is not the case: a directory service can
be implemented without any notion of the location of shares. See Section
\ref{sec:directories} below for more details on Directory Services.)

One possible solution to this is to associate pairs of shares in a given
document with each other, and use the `buddy system'. In this case, each
share would be responsible for maintaining information about the location
of the other share. When a share moves, it should send notification to
the other indicating this move.  More correctly, the server sending the
share should send notification, and then after the transfer the server
receiving the share should send notification.  Each notification includes
a `receipt' as described below under Receipts.

Periodically, a server holding a given share should query for its buddy, to
make sure its buddy is still alive.  Should its
buddy stop responding, then the remaining share (or more correctly, the host
currently holding that share) is responsible for
announcing which node had responsibility for it when it disappeared,
as described below under Section \ref{sec:design-trust}.

We considered allowing abandoned shares to optionally spawn a new share
if their buddy disappears. This would make the service much more robust,
since shares that are destroyed can be regenerated.  But it is also
possible that it could cause an exponential population explosion of
shares: if a share is out of touch for a little while but isn't dead,
then both shares will end up spawning new copies of themselves. This is
a strong argument for not letting shares replicate.

Another more subtle problem with having both shares with non-fixed
locations is that the communications channel we have chosen currently
has nontrivial latency.  This means that a share might have already been
traded away by the time notification arrives from the other share. We
could solve this by having some sort of locking mechanism on a share, such
that they are forced to alternate trading. However, this solution is prone
to attacks which `stagnate' a share or otherwise attack its location.

Instead, we solve this problem by maintaining `forwarder' addresses
after a trade has taken place. Since these forwarder addresses take
up very little space, and since servers must retain receipts for the
given share anyway, these forwarder addresses are kept until the share's
expiration date. (Actually, this forwarder address can be parsed out of the
receipt, so servers really don't need to keep the forwarder as a separate
value.)

Note that when a buddy notification comes in, the forwarder is checked
and the notification is forwarded if appropriate. This forwarding is {\em
not} done in the case of a document request (which includes a buddy check
operation), since this document request has presumably been broadcast
to all nodes in the servnet -- this means that the node on the other
end of the forwarder will also get its own copy of the request.

We have attempted to distinguish between the design goals of robustness
and accountability.  The fact that a document cannot be lost until
a certain threshold of its shares have been lost provides a simple
robustness.  Accountability, in turn, is provided by the buddy checking
and notification system among shares, which protects against malicious or
otherwise ill-behaving nodes.  Designers can choose the desired levels of
robustness and accountability independently of each other.  Robustness
can be increased by increasing the number of shares for a document or
decreasing the associated $k$ parameter; accountability can be increased
by increasing the number of buddies or the frequency of checking.

\subsection{Buddy Numbering}

The choice of how to number buddies during share creation is a complex one.
Our original approach was to create two near-identical buddies for each share
of the document; these buddies would be the same except one would be labelled
as buddy 0 and the other as buddy 1. However, we soon realized that this contradicts
the above goal of using buddies solely for accountability and using extra shares
solely for robustness. The better alternative is to match pairs of shares during
document creation: 0 and 1 are matched, 2 and 3 are matched, etc. 

Since shares have information about the location of their buddies, there are
a number of attacks that a server can launch to collect both buddies. Once
both buddies are located on the same malicious server, that server can delete
them without fear of any repercussions. Thus once a given share of a document
happens to arrive at a malicious node, that node can do trade requests to try
to obtain the buddy. A more subtle approach would be for a conspiring node or
set of nodes to do the trade requests instead, lest the other server become
suspicious.

An alternative approach to buddy numbering would be to build a chain: share $i$
has $i+1$ as its buddy, $i+1$ has $i+2$ as its buddy, and so on. This might
make the system more robust, since the buddies are connected to each other as
more than just pairs. On closer inspection, however, this approach makes the
document much more vulnerable to deletion: once an adversary obtains even a
single share, he can iteratively obtain control over each new share in the
`chain', eventually controlling a sufficient percentage of the document that
he can effectively destroy it. Further, after he owns some share $i$ and also $i+1$,
he can delete share $i$ because he controls $i+1$ -- this is the only share which is
watching $i$.

An attempt to salvage this idea might be to build a doubly-linked
chain, wherein a given share $i+1$ watches both $i$ and $i+2$. 
However, this approach
greatly increases the complexity of share monitoring, and does not give
any clear advantage over the simpler schemes -- in fact, it may even make the
exploit easier for the adversary since he can attack the chain from both ends.

Building an effective accountability system is a complex and challenging
problem. The buddy system is a good start, but it will require a lot more
thought and engineering genius before it can reliably enforce good behavior
for servers.

\input ./fh-design-share.tex

\section{Trading}

Share trading is an integral part of the structure of the Free Haven
network.  There are a number of reasons why servers trade:

\begin{itemize}
\item {\bf Greater anonymity}: if trades are common, then there is no
reason to assume that somebody offering a trade is the publisher of
a share. Even if the mixnet successfully protects the identity of a
servnet operator, nodes could still start rejecting trades from other
nodes based on content of shares they previously provided in trades.
%``what they've provided in the past''.
\item {\bf More dynamic}: frequent trading makes adding and dropping
nodes transparent. If shares were just traded once, servnet nodes would
have to support extra protocols for dropping out of the network politely
(negotiating a new keeper for the share, informing the publisher of the
move, etc). This support for a dynamic network is crucial, since many of
the participants in Free Haven will be well-behaved but transient relative
to the duration of some of the longer-lived shares.
\item {\bf Can handle longer expiration dates}: long-lasting shares would
be difficult to trade away and rely on, if trading them involved finding
a server that promised to be up and available for the next several years.
\item {\bf Accomodates ethical concerns from servnet operators}: if
there's a particular piece of data with which an operator does not wish to be
associated and he notices that he is storing a share of that data, frequent
trading makes it easy and unsuspicious to trade it away. Operators that
do not have this flexibility would end up just dropping data they don't
like, or not participating in the servnet.
\item {\bf Provides a moving target}: we rely on the security of the
mixnet to protect the identity of servnet operators. However, if trading
doesn't happen, an organization will learn that a given document lives
at the other end of a certain mixnet address, and that it has five years
to break it. Encouraging shares to move from node to node through the
mixnet means that there is never any specific target to attack.
\end{itemize}

Trades are considered `fair' based on the two-dimensional currency of
`time*duration', plus a function of the preferences of the servers
involved in the trade. For instance, one server might consider a 1
megabyte share that expires in 2 months
to be roughly equivalent to a 2 megabyte share that expires in 1 month.
However, another server might be biased towards short-lived files, since
it prefers to have the opportunity to leave the servnet at short notice.
However, the economics don't have to work this way: it could well be that
a given server requires a trade in its advantage before it's willing to
complete a transaction, because it doesn't have much trust in the server
offering the trade.  Alternatively, it could be that a server would be
willing to make trades where it ends up with more megabyte-days; such
servers might build up a reputation of being easy to trade with.

\begin{figure}[htc]
\begin{center}
\leavevmode
\label{design-trade}
% \epsfxsize=3in
\epsfbox{fh-design-trade.eps}
\caption{Trade handshake timing diagram}
\end{center}
\end{figure}

%$$ A_{\Phi_i} \rightarrow B_{\Lambda_j} : \{ T_{T_1}, \Phi_i,
%Description(\lambda_j)
%\}$$

%$$ B_{\lambda_j} \rightarrow A_{\Phi_i} : \{ T_{T_2} ,\lambda_j, H(\Phi_i)
%\}$$

%$$A_{\Phi_i} \rightarrow B_{\lambda_j} : \{ T_{T_3}, H(\Phi_i),
%H(\lambda_j), receipt \}$$

%$$B_{\Phi_i} \rightarrow A_{\lambda_j, \Phi_i} : \{ T_{T_4}, H(\Phi_i),
%H(\lambda_j), receipt \}$$

When a server $A$ wants to make a trade (frequency of trade should be a
parameter set by the server operator), it chooses another server $B$ from
its list of known servers (based on trust and history), and offers a
share $\Phi_i$ along with some set of hints describing the share that it is
interested in receiving in return.
If $B$ is interested, it responds with a share $\lambda_k$
of its own. The negotiation is finalized by each server sending an
acknowledgement of the trade (including receipt, as described below)
to the other. In the above diagram, there are also two other parties
involved: $C$, which is holding $\Phi_j$, the buddy of $\Phi_i$; and
$D$, which is holding $\lambda_l$, the buddy of $\lambda_k$. The `buddy
system', plus the accountability which it provides, is described in more
detail above, in the Accountability and Robustness section.

If the offered share $\Phi_i$ is not acceptable, $B$ can simply not
respond, or it can respond with some description of why it refused the
trade. This gives some indication of which trades might be accepted in
the future. If $B$ chooses not to respond, however, he runs the risk of
having $A$ conclude that he is a dead node; see Section \ref{subsec:node-exp} below on Node
Expiration.

By providing the receipt on the third round of the trading handshake,
$A$ makes a commitment to store the share $\lambda_k$. Similarly, the
receipt that $B$ generates on the fourth round represents a commitment
to store the share $\Phi_i$. $B$ could attack $A$ by failing to continue
the protocol after the third line: in this case, $A$ has committed to
keeping the share from $B$, but $B$ has not committed to anything. At this
point, $A$'s only recourse is to broadcast a complaint against $B$ and hope
that the Trust system does its job to cause others to recognize that $B$
has misbehaved.

%XXX why do we have both messages going? why is the timing the way it is?

We suggest that when a server $A$ trades a share to a server
$B$, server $A$ should keep a copy of the share around for a while,
just in case $B$ proves untrustworthy. This will increase the amount
of overhead in the system by a factor of two or so, and provide greatly
increased robustness.  In this case, when a query is done for a share,
the system responding should include a flag for whether it believes itself
to be the `primary provider' of the data, or just happens to have a copy still
lying around. This flag will help with accountability.

One of the design ideas that we considered and rejected was to produce
variable-sized shares, and distribute these shares proportionally to
other servers based on trust, reliability, or response
time \cite{davidweaknesses}.
%{\footnote {\tt http://freehaven.net/archives/freehaven/dev/Feb-2000/msg00001.html}}
This way we would provide more shares to servers who are fast and likely to keep them
safely, so document requests would be faster and more reliable. On the other hand, due
to the expected frequency of trading, shares will be mixed relatively thoroughly around
the servnet -- which server the share was first traded to should not make any
difference. Indeed, this process also seems to add a lot of complexity to the system:
micromanaging the behavior of each servnet node is not conducive to a powerful,
flexible, and decentralized network. 

\section{Receipts}

Note that we really are not treating the receipt as {\em proof} of a transaction,
but rather as an indication of a commitment to keep safe a given share. This is because
the most a given server can do when it detects a server misbehaving is to broadcast
a complaint about it, and hope the Trust system handles it correctly.

When server $A$ trades share
$\Phi_i$ to server $B$ in exchange for share $\lambda_k$, then server $A$ will generate
a receipt for $\lambda_k$ with the following entries:
\begin{itemize}
\item ``I am'': $A$
\item ``I traded to'': $B$
\item ``I gave away'': $H(\Phi.key), i, \Phi_i.exp$, size
\item ``I received'': $H(\lambda.key), k, \lambda_k.exp$, size
\item ``Timestamp'': $\tau$
\end{itemize}

This entire set of five elements is signed by server $A$. This signature
means that $B$ is not able to forge this receipt. This means that $A$ really must have
created the receipt, so if a broadcast is performed by $B$ (or any other node) complaining
about the behavior of $A$, then furnishing this receipt along with the complaint will
provide some rudimentary level of `proof' that $B$ is not fabricating its complaint.

Note that the expiration date of both shares is included within the receipt, and
the signature makes this value immutable. Thus, other servers observing a receipt
can easily tell whether the receipt is still `valid'. The size of each share is also
included, so other servers can make an informed decision about how influential this
transaction should be on their trust of the two servers involved in the trade.

%\section{Adding new servers}

%One of the most important parts of the design of Free Haven is the capacity to seamlessly
%integrate new servers. These servers need to be able to join and participate in the
%servnet simply by 

\input ./fh-design-trust.tex

\section{Directory Services}
\label{sec:directories}

One issue we have not yet addressed is the question of directory services.
In particular, how do people get a directory of what's listed in the Free
Haven? How is this directory maintained? Doesn't it need to be secure,
distributed, and anonymous, just like the other documents, so we should
put it inside Free Haven?

A document directory has no business being inside Free Haven, for several reasons:

\begin{itemize}
\item The nature of the directory is that it's always changing, whereas the data
in Free Haven is designed to be immutable.
\item Free Haven emphasizes storage, not accessibility. A directory is generally
intended to be retrieved frequently. Indeed, if the communications medium we
choose has high latency, retrieving a directory will be a very slow process
anyway.
\end{itemize}

We expect that a number of independent directory services will pop up, in
many different jurisdictions. These directories can be updated (via remailers)
by servers as they put new documents into the Haven, either truly anonymously
or signed by a key that a server uses solely for announcing new documents.
(This might even develop a sort of economy of trust for how easily a directory
service believes a new submission.)

If a given directory service is shut down, then the others will persist.

Servnet operators can build up their own directory services based on what
shares pass through their system. If servnet operators help to synchronize
and verify shares listed on an external directory service (informing the
service of new shares or incorrect current shares), this may make the service
more stable and useful.

Indeed, even if there are no directory services at all the system will
continue to perform its basic function: as a reliable mechanism for anonymously
storing data. Even if only the original publisher of a document knows how to
retrieve it, then the Free Haven system is still performing a useful service.
%to specification.

\input ./fh-design-ui.tex


