\documentstyle[twoside, fancyheadings, doublespace, fullpage, epsf]{article}
\setstretch{1.5}
\textheight 8.8 in
\voffset 0.2 in

\lhead{Michael J. Freedman ({\it mfreed@mit.edu}) \\
6.199 AUP Proposal}
\rhead{Free Haven Project \\
March 3, 2000}

\begin{document}

\pagestyle{fancy}
\pagenumbering{arabic}

\section{Motivation: The Free Haven Project}

The Free Haven Project is motivated by the following
assumptions.\footnote{Roger Dingledine.  Free Haven
Abstract.
http://www.seul.org/archives/freehaven/dev/Feb-2000/msg00067.html}  

\begin{quote}

The internet is moving in the direction of increasing freedom
of information, and increasingly blurred national boundary lines.
At the same time as a strong sense of global community is growing,
technical advances have provided greatly increased bandwidth and
an enormous amount of computing power and well-connected storage.
However, the increases in speed and efficiency have not brought
comparable increases in privacy and anonymity on the internet -- indeed,
governments and especially corporations are beginning to realize that
they can leverage the internet to provide detailed information about the
interests and behaviors of existing or potential customers.  Court cases,
such as the Church of Scientology's lawsuit against Johan
Helsingius{\footnote {\tt http://www.penet.fi/press-english.html}}
or the more recent OpenDVD debate{\footnote
{\tt http://www2.linuxjournal.com/articles/culture/007.html}}
(and subsequent arrest of DeCSS author Jon Lech Johansen),
demonstrate that the internet currently lacks an
adequate infrastructure for truly anonymous publication or distribution
of documents or other data.

Commercial enterprises as well as free software projects
are hoping to help solve this problem: examples include Zero Knowledge
Systems, a company building its own closed-source private network
of low-latency mixnets, and the Freenet project, a group of Internet
programmers designing a network that will duplicate frequently retrieved
information and thus make it difficult to delete information. Most
current works suffer either from closed or unfinished source. More
importantly, though, their designs sacrifice anonymity for accessibility.
The Free Haven Project aims to design and deploy a system which uses a
secure mixnet for communication, and which emphasizes distributed, reliable,
and anonymous storage over efficient retrieval. Some of the problems we
address include providing sufficient accountability without sacrificing
anonymity, building trust between servers based entirely on their observed
behavior, and providing user interfaces that will make the system usable for
end-users.
\end{quote}

My specific role in the Free Haven Project is the design and implementation
of the communication module, incorporating mixnets or another anonymous
communications channel into the larger system design to ensure anonymous
broadcasts and point-to-point communications between servers.  


\section{Protocols and Modules}

The Free Haven design is based on a community of servers - termed the
``servnet'' - where each server hosts data from the other servers in
exchange for the opportunity to store data of its own in the
servnet. Inherent to the anonymity and reliability of the system are
several design requirements. 

\begin{itemize}
\item Users may anonymously insert files into the servnet.
\item Users may anonymously retrieve files from the servnet.
\item Servers may be added and dropped from the servnet.
\item Files must be recoverable given server failure.
\item The current location of files should not be known.
\item The system must be decentralized to maintain efficiency, security,
and reliability.
\end{itemize}

An anonymous communications medium is necessary to allow the anonymous
insertion and retrieval of files in the servnet.  Also, any broadcasts
and point-to-point communications between servnet nodes need to be
performed anonymously to maintain privacy of a node's identity.

An information dispersal algorithm and trust network should be used to
ensure file recovery robustness. Given a file broken into $n$ shares,
any $k$ shares should be sufficient to recreate the file.  Therefore, files
should be recoverable even given the loss of a large number of shares
due to passive server failure or active adversary attack.  A trust
network should also be developed to enforce good practice by nodes:  Node A's 
trust in node B affects A's willingness to store shares on B.  Thus, the
damage an adversary can perform is limited, as bad behavior reduces
trust, thereby reducing trading between nodes.

\section{Communications Module}

A mixnet will be used to carry messages between servnet nodes over
anonymous channels.  We must provide an interface between these nodes
and the anonymous channels.  To ensure the design's modularity and
adaptability in the open-source paradigm, the code base should allow
the easy incorporation of different mixnet implementations.  

\subsection{Mixnets}

Mixnets provide a fundamental means of communication anonymity.
Proposed by David Chaum in 1981,\footnote{David Chaum. Untraceable
electronic mail, return addresses, and digital pseudonyms. {\it
Communications of the ACM}, 24(2), Feb 1981.
ftp://ftp.csua.berkeley.edu/pub/cypherpunks/papers/chaum.digital-mix.gz} 
a mixnet is a collection of 
servers which relay messages while removing identifying information at
each hop.  A sender dispatches a message into the mixnet medium, the
message is transferred between several mixnet nodes to ensure anonymity,
and finally the message is relayed to its receiver.

A number of current mixnet implementations are currently available.  The 
Mixmaster remailer uses {\tt sendmail} for data
transmission.\footnote{Mixmaster Remailer.  
http://www.obscura.com/\~{}loki/}  Babel uses the STMP transport
protocol.\footnote{Ceki Gulcu and Gene Tsudik. Mixing E-mail with
BABEL. {\it Proceedings of the ISOC Symposium on Network and
Distributed System Security}, Feb 1996.}  Both Onion 
Router\footnote{Onion Router. http://www.onion-router.net/} and Zero
Knowledge Systems' Freedom\footnote{Zero-Knowledge
Systems. http://www.zeroknowledge.com/} provides TCP/IP-level transport.  

\subsection{Mixnet Routing}

Routing information through the mixnet is available by reply blocks or
persistent identification of the destination.  Reply blocks supply
next-hop info to each mixnet node, persistent IDs may be used to lookup
similar routing blocks.

Several practical designs of mixnets use the concept of ``onion
routing'' to ensure anonymity.  First, the sender chooses a route
through the mixnet:  Source S $\rightarrow$ Node A $\rightarrow$ Node B
$\rightarrow$ Node C $\rightarrow$ Destination D. Second, the sender
signs the message with its private key $pk_S$ and possibly encrypts via
the destination's public key $PK_D$.  Then, the sender encrypts the
message and next-hop information along the mixnet in reverse:  encrypt
with $PK_C$, then encrypt with $PK_B$, and finally encrypt with $PK_A$.
Therefore, only the proper node in the mixnet can decrypt the top layer
of the message, exposing next-hop information and the proper onion
message to relay.  In other words, the encryption layers are unwound (or
peeled like an onion) during mixnet transmission, before the message is
relayed to its destination.  The destination decrypts the message via
$pk_D$ and verifies the sender's signature.


\subsection{Comm API}

The mixnet communication interface should supply a few primitives:

\begin{verbatim}
     Node* get_available()             /* Returns a list of nodes with which */
                                       /* server knows how to communication  */

     int broadcast(char* message)      /* Sends message to all available     */
                                       /* nodes. Return error code upon      */
                                       /* failure.                           */

     int send(Node D, char* message)   /* Sends some message to node D       */
                                       /* through anonymous mixnet.  Return  */
                                       /* error code upon failure.           */
    
     int send(char* addr, char* file)  /* Sends the recovered file to the    */
                                       /* specified nym or address           */

     void receive(char* message)       /* Pulls message from incoming queue, */
                                       /* dispatches to haven.               */

     void generate_statistics()        /* Provide method of supplying        */
     int get_statistics(Node D)        /* statistical mixnet information to  */
     int* get_statistics(Node* list)   /* haven, to be used to properly      */
                                       /* timeout communications with other  */
                                       /* nodes across high-latency mixnet.  */
\end{verbatim}

Our initial implementation of the communications module shall utilize
the open-source Mixmaster remailer.  The communications API, however,
provides a layer of abstraction such that any viable mixnet may be
incorporated, either by ourselves or other developers.

Layered encryption, via the onion-routing methodology as described,
shall be performed by the communications module.  Upon receiving a
message or share from the haven module, the comm module will determine
the necessary encryption protocols given the type of mixnet used. The
message or share will be signed and encrypted over a reverse trace of
the desired mixnet route.  The module will then proceed to send the
message off into the mixnet.  

Encryption is performed in the comm module for two reasons: 
First, any mixnet-specific information is abstracted away from the
haven module.  Second, broadcasts can be requested by the haven module
by merely calling a {\tt broadcast} primitive, as opposed to querying
for a list of available nodes, extracting each node's route and
encrypting the message according to that route, then performing a {\tt
send} operation on each message.  A single broadcast call decreases
the quantity of socket-level requests from $O(n)$ to $O(1)$, where $n$
is the number of available nodes.

\subsection{Comm State}

The communications module has several structural requirements:

\begin{itemize}
\item Outgoing Queue:  Messages to be sent into the mixnet.
\item Incoming Queue:  Messages received from the mixnet, to be
dispatched to the haven module.
\item Node Database:  Store information on all-known servnet nodes.
\begin{itemize}
\item Server Name (for ease of reference)
\item Public Key
\item Mixnet Type (i.e., Mixmaster, ZKS)
\item Routing Info (reply blocks, persistent IDs)
\item Statistics (for timeout and availability considerations)
\end{itemize}
\end{itemize}

\subsection{Further Areas of Research}

Several aspects of mixnets need to be further studied for
incorporation into the Free Haven system.  

\begin{itemize}
\item {\bf Latency:}  Mixnets have high latency, in the range of
several to many hours for slower Mixmaster remailers.\footnote{Mixmaster 
Statistics. http://anon.efga.org/Remailers/mlist}  Although TCP-based
systems provide very low latency, current implementations do not
incorporate reply blocks according to our needs.
\item {\bf Lossyness:} Mixnets drop a significant percentage of packets
during transmission due to a low node reliability.  A robust information
dispersal algorithm should still allow us to recover files.  Still, this
has possible effects on the trust network when loss is not due to server
failure, but rather to an unreliable communications medium.
\item {\bf Alternate Routing Schemes:}  If the chosen route during a
mixnet goes down, the message is lost in transmission.  An alternative
implementation would be ``garlic routing.''  A garlic packet looks like
an onion packet until it is unwrapped, whereby a node finds several
onions to transmit, instead of the normal single onion.  Reliability is
therefore increased through redundancy.  Additionally, reply block size
will only grow linearly with the total number of nodes in onion and
garlic routes. However, this modification would likely require
reimplementing a mixnet, and its actual benefit is yet unknown.
\item {\bf Efficient Broadcasts:} The Free Haven model uses broadcasts
for such things as file reconstruction requests, trust meta-data
announcements, key refresh notices, and key revocation notices.  The
following requirements are necessary for the broadcast mechanism through
the mixnet:  
coverage, authenticity, message similarity among all nodes, termination
awareness, low computational complexity, and low communication
complexity.\footnote{David Molnar.  Notes of Broadcast Requirements.
http://www.seul.org/archives/freehaven/dev/Feb-2000/msg00052.html} 
\end{itemize}
We wish to continue to examine the existing code base for mixnets which
best fit our requirements.  For example, while Mixmaster provides the
necessary transport requirements and security, the system has high
latency and is somewhat unreliable.  For future development of the Free
Haven system, we might prefer to implement our own TCP/IP-based mixnet
that supports reply blocks or persistent IDs. 


\section{Threat Model}

A cryptographic system is only as secure as its weakest link.  If an
adversary can compromise any part of the system or protocol, then the
entire system or protocol is effectively compromised.

The Free Haven threat model should identify all security concerns
inherent to the system's design.  Once an adequate threat model is
developed, we can better describe the protocol's strengths and
weaknesses. Non-protocol attacks are possible:  physical, social,
governmental, and legal attacks might seek to undermine trust in the
security of the system, discourage use of the servnet, shut down servnet
nodes, or perform similar such actions.  However, we shall only consider
the adversaries that directly attack the system's design from a computer
and network security standpoing.  We have identified four main types of
adversaries: 

\begin{itemize}
\item {\bf Servnet Node Adversaries:}  These adversaries act as a node on
the servnet, but behave contrary to normal specification.  Such nodes
may attempt to gain trust quickly before falsely accusing other nodes
of bad behavior, counterfeit receipts, access and modify information
describing stored shares, corrupt shares or make them unrecoverable,
or perform marking or flooding attacks to learn other node identities. 
\item {\bf Adversaries within the Mixnet:}  Mixnet adversaries can
monitor traffic within the mixnet, recovering and rereading mixnet
messages, launching replay or flood attacks to learn identities,
perform timing or traffic size analysis to determine routing
information, or attack nodes to increase mixnet lossyness.
\item {\bf Adversaries between the Servnet Nodes and Mixnet:} These
adversaries exist at the last-hop, where single-layered ciphertext or
plaintext is being sent from the mixnet to the servnet recipient.
Normal attacks on the crypto primitives may be used, as well as 
marking last-hop routes to try to determine senders.
\item {\bf Rogue Users:} Adversaries can exist outside of the normal
system, and may cause harm through denial of service attacks on servnet
nodes via requesting or publishing files, or attacks on the mixnet via
packet flooding.
\end{itemize}

As described, there are many different ways an adversary may attack
the system: cryptoanalysis, denial of service, traffic analysis,
man-in-the-middle attacks, corruption of the trust system, and
more. Specifically, an adversary might have one or more of the
following objectives:  

\begin{itemize}
\item Destroy a sufficient number of shares to make a file
unrecoverable. 
\item Identify the source or destination of a broadcast or
point-to-point message.
\item Identify a file's publisher or retrievers.
\item Determine a node's private key for counfeiting signatures or
decrypting private transmissions.
\item Corrupt the trust network such that adversaries can perform sufficient
damage to the system, by either falsely gaining trust or
successfully accusing well-behaved nodes.
\item Slow or stop mixnet communication in order to halt messages
between nodes.
\end{itemize}

An accomplishment of any of these goals can undermine trust
in the security of the system.  Users will be less motivated to run
servnet nodes.  A loss of this trust is particularly dangerous:
volunteerism is a necessary part of the Free Haven design, in order to 
establish globally-distributed servnet nodes. 

The importance of a threat model is clear.  The model attempts to
systematically identify all areas of possible weakness and attack.
Then, it can be used to show how our design is sufficiently robust
and reliable, such that these factors to do not pose any real security
threat.

\section{Conclusion}

The Free Haven project seeks to deploy a system of servers that allow
for the anonymous publishing, storage, and retrieval of documents.
Stronger anonymity is due both to communication and file storage
protocols.  My work concerns developing the communications module for
anonymous broadcasts and point-to-point messages between nodes.  We
shall also formalize the system's trust model to describe protocol
security.  This work is being performed collaboratively with developers
of the trust network, the haven module for share storage, file
publishing, and file retrieval, and the user interface.  The ultimate
completion of these modules would provide adequate functionality for an
initial Free Haven release.

\section{Acknowledgements}

I would like to thank a number of people for their assistance with this
project.  First, Roger Dingledine for the formulation of the Free Haven
Project and allowing me the opportunity to research and design certain
aspects of the project.  Next, David Molnar for his close collaboration
in developing the mixnet API, studying means for anonymous and efficient
communication, and describing the trust model. Finally, Professor Ron
Rivest for his role as my project advisor and his suggestions for the
overall Free Haven design. 

\end{document}