$Id: pynchon-spec.txt 104 2007-04-24 12:47:49Z rabbi $
                  Pynchon Gate Protocol Draft Specification

                 (Spec draft by Nick Mathewson with 
                  contributions from Len Sassaman;
            original design by Bram Cohen and Len Sassaman)


Table of Contents

   1.       Overview
   1.1.     Notation and terminology
   1.2.     Cryptographic functions
   2.       Encrypting and packaging messages
   2.1.     Key maintenance
   2.2.     Packaging messages
   2.2.1.   Message synopses
   2.3.     Managing pending messages
   2.4.     Decrypting packaged messages
   3.       The bucket pool
   3.1.     Bucket pool formats
   3.2.     Constructing the bucket pool
   3.3.     Distributing the bucket pool
   4.       Retrieving the bucket pool
   4.1.     PIR protocol
   4.1.1.   Error codes
   4.1.2.   Optimizing the distributor implementation
   4.2.     Retrieving a bucket
   4.3.     Retrieving a cycle's worth of messages
   5.       Account administration
   6.       System information


1. Overview

   The Pynchon Gate Protocol (PynGP) uses Private Information Retrieval to
   provide strong pseudonymous email delivery.  The design works (roughly)
   as follows:

       ----------     ----------
       I Sender I ... I Sender I
       ----------     ----------
            I  email    I
            V           V
       ----------------------
       I Nymserver/Collator I<--------------------
       ----------------------                    I Administrative
           I bucket pool  I                      I messages
           V              V                      I
    ---------------     ---------------          I
    I Distributor I ... I Distributor I          I
    ---------------     ---------------          I
          I  PIR buckets I                       I
          V              V                       I
          ----------------                       I
          I   Recipient  I------------------------
          ----------------

   Messages arrive at a nymserver, addressed to pseudonymous identities.  The
   nymserver encrypts the messages as they arrive, and stores them until the
   end of the current fixed "cycle".  Once a cycle has elapsed, the nymserver
   collates the messages into a "bucket pool," and relays the entire bucket
   pool for the cycle to a set of distributors.

   To retrieve her mail, the recipient retrieves a description of each
   cycle's bucket pool, and performs a PIR operation with several
   distributors to retrieve "her" buckets.  If she performs the operation
   correctly, and if any of the distributors she chooses are honest, then the
   distributors cannot learn which buckets she has downloaded.

   The recipient can administer her account by sending "control messages" to
   the nymserver through an anonymizing layer such as the Type III remailer
   network.

1.1. Notation and terminology

   H(x) -- A cryptographic digest of a string x.
   x|y  -- The string x concatenated with the string y.
   LEN(x) -- The length of a string x.
   R(n) -- n octets of random data.
   INT(v,n) -- The nonnegative integer "v", expressed as a n-octet value
       in network (big-endian) order.
   PRNG(s,n) -- A cryptographically strong pseudorandom sequence of n octets,
       generated using the seed s.
   Z(n) -- A sequence of n zero octets.
   ENC(s,k) -- Encrypt the string s using a stream cipher, with the key k.
   DEC(s,k) -- Decrypt the string s using a stream cipher, with the key k.
   COMPRESS(s) -- Compress the string s using the "deflate" algorithm and the
       zlib message format as defined in RFCs 1950 and 1951.

   "abc" -- A three-byte string, containing the ASCII values for the
        characters "a", "b", and "c".  Unless otherwise noted, strings are
        _not_ NUL-terminated.

   [61 62 63] -- A three-byte string, containing bytes with the hexadecimal
        values 61, 62, and 63.

   x[a:b] -- A substring of the string x, starting on the a'th character of
        x, and ending with the b-1'th character of x.  (Thus, x[0:LEN(x)] =
        x.)

   HASH_LEN -- The length of the output of our cryptographic digest function.
   SEED_LEN -- The length of the seed for our PRNG.
   KEY_LEN -- The length of the keys used by our encryption functions.

   K -- A client-selected security parameter determining how many
        distributors the client will connect to.

   (We require that HASH_LEN >= KEY_LEN and HASH_LEN >= SEED_LEN)

1.2. Cryptographic functions

   To implement PynGP, we recommend AES-128 in counter mode for encryption
   (so that ENC == DEC); SHA-256 for our hash function; and ENC(Z(n),s) as
   our PRNG.

2. Encrypting and packaging messages

2.1. Key maintenance

   For every recipient, the nymserver stores a long-term public key, and a
   short-term shared secret.  Shared secrets are set at account creation
   time, and can be reset by recipients later on.

   The shared secret is updated every cycle, such that, if S[i] is the
   shared secret in cycle i, then
      S[i+1] = H(S[i]|"NEXT CYCLE").

   From each S[i], the nymserver derives a set of sub-secrets for individual
   messages received that cycle.  The j'th sub-secret on day i is:
      SUBKEY(j+1,i) = H(SUBKEY(j,i) | "NEXT SECRET")
      SUBKEY(0,i) = H(S[i] | "NEXT SECRET")

   {Rationale: We use a separate chain of keys for each cycle so that it is
   easier for a user to resynchronize after missing a few cycles.}

   Also, the nymserver derives an opaque UserID for each recipient each
   cycle:
      UserID[i] = H(S[i] | "USER ID")

   Implementations SHOULD discard old secrets as soon as they are no longer
   needed.  Thus, at the start of each cycle i, a nymserver SHOULD derive
   S[i+1], UserID[i], and SUBKEY(0,i), and immediately discard S[i].  After
   using each SUBKEY(j,i), the nymserver SHOULD calculate
   SUBKEY(j+1,i) and discard SUBKEY(j,i).  After each cycle, the nymserver
   SHOULD discard the last SUBKEY, and UserID[i].

2.2. Packaging messages

   Every cycle, a user receives zero or more encrypted 'messages' from the
   nymserver.  These messages can be emails, padding, or other status
   messages.

   The messages in a cycle i are encrypted in order, with the j'th message
   received on i encrypted and identified using derivatives of SUBKEY(j,i) as
   follows:

         MsgID(j,i) = H(SUBKEY(j,i) | "MESSAGE ID")
         MsgKey(j,i) = H(SUBKEY(j,i) | "MESSAGE KEY")

   MsgKey(j,i) is used as a key to encrypt the message, and MsgID(j,i) is
   prepended to the encrypted message in order to identify it to the user
   (Message IDs are necessary because messages may be packaged out of order.)
   Encryption uses a stream cipher, so that encrypted message length is the
   same as plain-text length.  Since keys are not re-used, no IV is
   necessary.

   {Rationale: We derive MsgKey from SUBKEY rather than using SUBKEY
   directly, so that a user can store MsgKey(j,i) without worrying that a
   compromise will expose MsgKey(j+1,i).}

   [XXXX Should we just have MsgID(j,i) = H(MsgKey(j,i)) ? - NM ]

   The 0'th subkey is reserved for the INDEX message.  The 1'st subkey is
   reserved for the SUMMARY message.

   All messages begin with a single-octet 'TYPE' field.  The values for TYPE
   are:

         INDEX     [00]
         PAD_REST  [01]
         MAIL      [02]
         ACK       [03]
         SUMMARY   [04]
         ERROR     [FF]

   All message types except for PAD_REST use in the following format:

         TYPE     Message type    [1 octet]
         DATA     Message body    [LEN octets]
         HASH     Hash of message [HASH_LEN octets]

   HASH is equal to H(TYPE|DATA).  It is redundant.

   PAD_REST uses a variable length format:

         TYPE     PAD_REST type   [1 octet]
         PADDING  Random data     [Up to end of the current bucket]

   PAD_REST can only appear as the last of a user's messages in a given
   cycle; thus, it does not need a length field.

   Other message types have bodies as follows:

         TYPE: INDEX [00]
         DATA: A description of the contents of all messages in today's
               buckets.  It begins with the number of messages in today's
               buckets, encoded as a 4-byte big-endian integer.  Then,
               for each message, in includes:
                    ID   -- The MsgID for the message  [HASH_LEN octets]
                    LEN  -- The length of the encrypted message [4 octets]
         NOTES: There MUST be exactly one index message in each cycle's worth
                of buckets, and it MUST appear first.  The index message
                itself is not included in the count of messages.

                The index message is always encrypted with MsgKey(0,i).

         TYPE:  MAIL [02]
         DATA: One or more email messages, concatenated, then compressed.
               Each message M is prefixed with a length field, as
               INT(LEN(M),4)|M .
         NOTES: Nymservers SHOULD generate MAIL messages for incoming email
                as greedily as possible, to minimize the amount of time that
                unencrypted mail is stored.

         TYPE:  ACK [03]
         DATA:  C         Cookie         [HASH_LEN octets]
         NOTES: Acknowledges that the nymserver has received a control
                message from the user and processed it.  The 'cookie'
                must match the cookie sent in the control message being
                acknowledged.

         TYPE:  SUMMARY [04]
         DATA:  Message synopses for messages not included in this cycle's
                bucket.  See 2.2.1 below.

         TYPE:  ERROR [FF]
         DATA:  ERR      Error code     [2 octets]
                MSG      Message        [Variable]

2.2.1. Message synopses

   When it is not possible for the nymserver to package all of a user's
   pending mail, it instead packages as much as it can, and uses the rest of
   the user's space to include a synopsis for each undelivered message it
   received in that cycle.

   The synopsis for an email consists of one or more of the email's headers,
   in RFC822 format.  The following headers MUST be included in synopsis when
   present: "From", "To", "Cc", "In-Reply-To", "Message-ID", "Subject".
   Other headers MAY be included or omitted.  Nymservers MAY add additional
   headers for user convenience, for example by running SpamAssassin or
   similar tools to mark suspicious messages as probably spam.

   The nymservers SHOULD compress and encrypt each synopsis at the same time
   as it encrypts its corresponding messages.  To encrypt the synopsis for
   message j, received on cycle i, the nymserver uses the key: SynopKey(j,i)
   = H(SUBKEY(j,i) | "SYNOPSIS KEY")

   To pack multiple synopses into a SUMMARY message, each is first prefixed
   with its corresponding message ID, then a 4-octet integer size field.

2.3. Managing pending messages

   When a nymserver receives an email for a user, it encrypts that message
   and its synopsis as described in 2.2 and 2.2.1. above, and stores them,
   indexed by the user, by the message ID (described in 2.2) and by the cycle
   in which they were received.

   When building a user's buckets for a given cycle, the nymserver tries
   to include all the pending messages for that user.  If this is not
   possible, the nymserver instead includes:

      - As many pending messages as possible.
      - All ACK and ERR messages.
      - A SUMMARY message listing all pending messages *not* included in
        the current cycle.

        [XXXX Should we list everything that's pending, or should we list
        everything that's pending and new? -NM]

   When choosing which messages to include, the nymserver SHOULD by default
   include oldest first, but MAY allow users to configure other preferences,
   and MUST allow users to specifically override its choice of messages by
   requesting particular messages to be delivered or deleted (see section 5
   below).

   The nymserver SHOULD delete pending messages that it has not been able
   to deliver for a long time, and MAY delete messages to force disk quota
   compliance.

2.4. Decrypting packaged messages

   Upon receiving their messages for a cycle, clients decrypt the messages
   as follows.

       1.  First, decrypt the INDEX message with MsgKey(0,i).  Use the
           INDEX message to split the bucket into messages; associate each
           message with its message ID.

       2.  For each message ID in the INDEX message, check to see whether we
           have stored that message ID from a previous cycle.  If so, use the
           corresponding message key to decrypt the message.

       3.  See whether we have a message with MsgID(1,i).  If so, this is
           the SUMMARY message.  Decrypt it.

       4.  Generate MsgID(2,i) ...  MsgID(N,i) until a message ID is found
           which is not listed in the SUMMARY message or the INDEX message.
           Decrypt all messages listed in the INDEX message.  For each
           MsgID in the summary, store it and its corresponding key until
           the message is received in a later cycle, or until the user tells
           the nymserver to delete the message, or until the message is "too
           old"

3. The bucket pool

3.1. Bucket pool formats

   Users find and validate their buckets through a three-level indexing
   structure:

          ------------------------------
          I        Meta-index          I
          ------------------------------
           I                      I
           V                      V
      ----------------     ----------------
      I Index Bucket I ... I Index Bucket I
      ----------------     ----------------
       I ... I ... I ...
       I     V     V
       V
   ------------------   ------------------
   I Message bucket I-->I Message bucket I--> ...
   ------------------   ------------------

   For each cycle, all users download a common signed meta-index, from which
   they learn:
         1. Which index bucket points to their messages.
         2. The hash of that index bucket.

   Users retrieve the correct index bucket via PIR, and learn:
         1. Which message bucket(s) contain their messages
         2. The hash of the first such message bucket.

   Finally, users retrieve their message buckets via PIR.  Each message
   bucket contains:
         1. A portion of the user's packaged messages.
         2. A hash of the user's next message bucket (if this message bucket
            is not the user's last).

   Metadata:

      V      Protocol version        [2 octets]
      NSID   Nymserver ID            [HASH_LEN octets]
      CYC    Cycle number            [4 octets]
      BS     Bucket size             [4 octets]
      MLen   Length of metaindex     [4 octets]
      MI     Metaindex               [MLen octets]
      SLen   Length of signature     [2 octets]
      SIG    Signature               [SLen octets]

   The protocol version for this version of the spec is 0.  The cycle number
   is the index of cycle for which this metadata applies; cycle numbers
   should begin at 0 and increase by 1 for each cycle.

   SIG is an RSA signature of the hash of the metadata up through MI, using
   OAEP padding and PKCS1 encoding.

   Metaindex entry

      UID    First UserID in index bucket [HASH_LEN octets]
      IDX_H  Hash of index bucket         [HASH_LEN octets]

   Index entry

      UID    UserID                               [HASH_LEN octets]
      M_IDX  Index of user's first message bucket [4 octets]
      M_H    Hash of user's first message bucket  [HASH_LEN octets]

   Message bucket

      H      Hash of next message bucket          [HASH_LEN octets]
      DATA   Packaged message data                [BS-HASH_LEN octets]

3.2. Constructing the bucket pool

   At the end of every cycle, the collator takes each recipient's packaged
   messages for that cycle and assembles them into buckets, as follows:

      1. Choose a size in bytes for the cycle's buckets.  Call this size
         BUCKET_SIZE.

      2. Sort all users who will receive messages in this cycle by their
         UserIDs.  Let USER[i] = the i'th such user.

         Let N be the number of users who will receive any messages in this
         cycle.

      3. Determine the number N_INDEX of "index buckets" that
         will be needed.  This will be equal to

            N_INDEX = CEIL( (N * INDEX_ENTRY_LEN) / USERS_PER_BUCKET )

         where
            USERS_PER_BUCKET = FLOOR(BUCKET_SIZE / INDEX_ENTRY_LEN)

      4. For each user USER[i], determine the number of buckets needed to
         hold all of U's packaged messages.  This will be equal to:

            N_BUCKETS[i] = CEIL( VOLUME[i] / (BUCKET_SIZE - HASH_LEN) )

         where VOLUME[i] is the total length of User[i]'s packaged messages.

         Let FIRST_MSG_BUCKET[i] = N_INDEX + SUM{j=0..i-1} N_BUCKETS[j]

         Let TOT_MSG_BUCKETS be the sum of N_BUCKETS[i] for all U.

      6. Construct the message buckets.

         For each element of USER[i], in reverse order:

             Concatenate the packaged messages of USER[i] into a string M.

             Let PAD_LEN = BUCKET_SIZE-HASH_LEN *
                           CEIL( LEN(M) / (BUCKET_SIZE-HASH_LEN) ) - LEN(M)

             Append a PAD_REST message to M, of length PAD_LEN.

             Divide M into N_BUCKETS[i] strings of length BUCKET_SIZE-HASH_LEN

             For j = N_BUCKETS[i]-1 down to 0:

                Let INDEX = FIRST_MSG_BUCKET[i] + j

                Let BUCKET[INDEX] = H(BUCKET[INDEX+1]) | S[j]

      7. Construct the index buckets.

         For j = 0 to N_INDEX-1,

             Let FIRST_USER[j] = j*(USERS_PER_BUCKET)
             Let LAST_USER[j] = (j+1)*(USERS_PER_BUCKET) - 1

             Let B = ""

             For i = FIRST_USER[j] through LAST_USER[j],

                 Let IdxEnt = UserID[USER[i]]            |
                              INT(FIRST_MSG_BUCKET[i],4) |
                              H(BUCKET[FIRST_MSG_BUCKET[i]])

                 Let B = B | IdxEnt

             Let BUCKET[j] = B.

      8. Construct the meta-index.

         Let MI = ""

         For j = 0 to N_INDEX-1:

            Let MetaIdxEnt = UserID[FIRST_USER[j]] | H(BUCKET[j])

            Let MI = MI | MetaIdxEnt

         The meta-index is equal to the final value of MI.


   Note: in practice, collators should run steps 6, 7, and 8 in parallel,
   so that they only need to process each message bucket once.

3.3. Distributing the bucket pool

   Every distributor needs a copy of the bucket pool.  To get it, at the
   end of each cycle, the distributors download it via Bittorrent.

   {XXXX Specify how they know what URL to use; how they know when it's
    ready, etc.}

   After downloading the pool, distributors check that all of its hashes
   match the values computed above.

4. Retrieving the bucket pool

4.1. PIR protocol

   The protocol's transport layer is TLS: users negotiate a TLS connection
   with a given distributor, and then relay PIR messages as described below.

   Implementations SHOULD prefer cipher-suites that support ephemeral keys,
   and MUST NOT support weak TLS cipher-suites.

   Distributors MUST provide a two-level certificate chain with their TLS
   handshake.  One certificate MUST be a self-signed certificate for their
   long-term public key, and the other certificate should be signed by the
   first one.  The connection MUST use the private key of the second
   certificate.  Distributors SHOULD rotate these private keys regularly.

   Clients MUST validate these certificates when they first connect to each
   distributor, and MUST close any connection to a distributor whose
   certificates are non-conformant.  Clients MAY cache the results of
   certificate validation.

   [XXXX Say something about TLS sessions.]

   [XXXX Do we want to do away with SHORT_PIR_REQUESTS?]

   The PIR message format is as follows:

       TYPE   [1 octet]
       LEN    [4 octets]
       DATA   [Len octets]
       HASH   [HASH_LEN octets]

   HASH is equal to H(TYPE | LEN | DATA).

   The format of 'DATA' is dependent on 'TYPE'.  Recognized values of TYPE
   are:

       0 -- VERSION
       1 -- SHORT_PIR_REQUEST
       2 -- LONG_PIR_REQUEST
       3 -- PIR_RESPONSE
       4 -- GET_METADATA
       5 -- METADATA
     255 -- ERROR

   The corresponding formats of DATA are:

       TYPE: VERSION [0]
       DATA: Sequence of
                 VER  Protocol version  [2 octets]
                 VER  Protocol version  [2 octets]
                   ...

       TYPE: SHORT_PIR_REQUEST [1]
       DATA: NYMID   Nymserver ID    [HASH_LEN octets]
             CYC     Cycle number    [4 octets]
             SEED    PRNG seed       [SEED_LEN octets]

       TYPE: LONG_PIR_REQUEST [2]
       DATA: NYMID   Nymserver ID    [HASH_LEN octets]
             CYC     Cycle number    [4 octets]
             MASK    Bitmask         [CEIL(N_BUCKETS / 8) octets]

       TYPE: PIR_RESONSE [3]
       DATA: XB      Xor-ed buckets  [BUCKET_SIZE octets]

       TYPE: GET_METADATA [4]
       DATA: NYMID   Nymserver ID    [HASH_LEN octets]
             CYC     Cycle number    [4 octets]

       TYPE: METADATA [5]
       DATA: M       Metadata        [variable length]

       TYPE: ERROR [255]
       DATA: CODE    Error code      [2 octets]
             MSG     Message         [variable length]


   The protocol exchange proceeds as follows.

      1.  The first message MUST be sent by the client, and must be a VERSION
          message listing all the protocol versions that the client
          understands.  (Currently, the only known version is 0.XXXXX)  
          The server MUST reply with a VERSION message choosing a single
          one of those versions, or with an ERROR message (code
          'BAD_VERSION').

          Once the client receives the VERSION message from the server, it
          then sends one or more messages to the server.  The server replies
          to each of these messages in sequence.  The client need not wait
          for a reply before sending its next message.

      2.  If the client sends a GET_METADATA message, the server tries to
          find the signed metadata for the identified nymserver and cycle.
          If the metadata is found, the server answers with a METADATA
          message containing the metadata.

          If the server does not recognize the nymserver ID, it answers with
          an ERROR message (code 'BAD_NYMSERVER').  If it does recognize the
          nymserver, but it no longer has the data for the requested cycle,
          or it has not yet retrieved the data for the requested cycle, it
          answers with an ERROR message (code 'CYCLE_EXPIRED' or
          'CYCLE_NOT_YET') respectively.

      3.  If the client sends a LONG_PIR_REQUEST message, the server first
          tries to find the bucket pool for the requested nymserver/cycle
          pair.  If it cannot, it answers with an ERROR message as in [2]
          above.

          Assuming the bucket pool is found, the server checks whether
          Len(MASK) == CEIL(N_BUCKETS/8).  If not, it answers with an ERROR
          message (code 'BAD_MASK_LEN').

          Finally, the server treats MASK as sequence of bits, running from
          the MSB of the first byte of MASK to the LSB of the last byte of
          mask.  Let Bit[i] be the i'th such bit.  The server computes a PIR
          response as follows:

              Let XB = Z(BUCKET_SIZE).

              For i = 0 .. N_BUCKETS-1:
                 If Bit[i] is 1:
                    Let XB = XB XOR BUCKET[i]

          The server then answers the client with a PIR_RESPONSE message
          whose body is XB.

      4.  If the client sends a SHORT_PIR_REQUEST message with a given SEED,
          the server computes

               MASK = PRNG(SEED, CEIL(N_BUCKETS/8))

          and responds as if the client had sent a LONG_PIR_REQUEST message
          with the computed mask.

4.1.1. Error codes

   The values for error codes are given below.

   BAD_VERSION    [00 00]  (No version was recognized)
   BAD_NYMSERVER  [00 01]  (The specified nymserver was not recognized)
   CYCLE_EXPIRED  [00 02]  (The requested cycle's data is no longer retained)
   CYCLE_NOT_YET  [00 03]  (The requested cycle's data is not yet retrieved)
   BAD_MASK_LEN   [00 04]  (The provided mask was too long or too short)

   OTHER          [FF FF]  (Something else happened.)

4.1.2. Optimizing the distributor implementation.

   To avoid thrashing disk and memory caches, distributors should consider
   handling all pending PIR requests in parallel, as follows:

   Repeat:
      For i = 0 .. N_BUCKETS-1:
        For j = 1 .. N_PENDING_REQUESTS:
          If Request[j] has Bit[i]=1, then:
             Let XB[j] = XB[j] XOR BUCKET[i]
          If End_bucket[j] == i:
             Send XB[j] as a response to Request[j], and remove Request[j]
             from the pending list.

        If any new request has arrived:
          Put it at some index r.
          Let XB[r] = Z(BUCKET_SIZE).
          Let End_bucket[r] = i.

4.2. Retrieving a bucket

   To download a specific bucket B from cycle CYC of nymserver NYMID, clients
   should behave as follows: XXXX

      1. Connect to K chosen distributors and negotiate protocol versions, if
         not already connected.
      2. Generate 2 sets of K-1 random seeds ALPHA_SEED[1..K-1], 
         ETA_SEED[1..K-1].
      3. Let MASK = ALPHA_SEED[1] XOR ALPHA_SEED[2] ... XOR ALPHA_SEED[K-1]
      4. Flip the B'th bit of MASK.
      5. Generate ETA_MASK as a random string of bitlength idential to 
         MASK.
      6. Permute the chosen distributors into a randomly selected order.

         Send the first K-1 distributors SHORT_PIR_REQUEST messages, each
         with a different one of the K-1 elements of both ALPHA_SEED and 
         ETA_SEED, randomizing the which set's element is chosen to be sent
         first..

         Send the last distributor a LONG_PIR_REQUEST message with MASK for
	 both sets, randomizing which set's MASK is sent first.

      7. Cache the values of each element of ETA_SET and the corresponding
         distributor to which it was sent. 

      8. Compute the XOR of the ALPHA_SET responses.  This is the value of 
         the bucket. If this fails, see the section on Byzantine server 
         detection (Section 4.4).

4.3. Retrieving a cycle's worth of messages.

   To download _all_ of its messages for a given cycle CYC of nymserver
   NYMID, clients should behave as follows:

      1. Connect to K randomly chosen distributors.

      2. Ask a randomly chosen distributor for the metadata for NYMID,CYC.
         Make sure that the metadata is signed correctly, and has the correct
         values for NYMID and CYC.

      3. Compute our UserID for the CYC.  Call this "U".

      4. Find the largest I such that the I'th element of the metaindex has
         UID <= U.

      5. Use PIR to retrieve index bucket I.  Verify that the hash of this
         bucket is equal to IDX_H in the I'th element of the metaindex.

      6. Find the largest J such that the J'th element of the index bucket
         has UID <= U.

         If the UID at J is U, then we have messages today.  Otherwise, we
         don't, but we need to retrieve messages anyway to avoid leaking our
         identity.

         Set M_IDX and M_H from the selected index entry.

      7. Use PIR to retrieve MAX_BUCKETS buckets starting at M_IDX.

         Verify that, for each downloaded BUCKET[j]
             M_H = HASH(BUCKET[j])                      {if j == M_IDX}
             BUCKET[j-1][0:HASH_LEN] == H(BUCKET[j])    {otherwise}

         (Clients MUST verify all buckets, even buckets for messages they
         don't want, so that compromised distributors can't learn whether the
         client really wanted MAX_BUCKETS buckets or not.)

      8. Unpack the messages in the bucket.

//   If a client retrieves a bucket with an in-correct hash, it must have
//   received an incorrect PIR response from at least one distributer.  The
//   client then re-downloads the offending bucket, as follows:
       {XXXX how exactly do we reattempt a bucket?  Is it better to try a
        completely different set of servers?  Or to try replacing a just a
        couple of the distributors in the current PIR set?}

       {XXXX Do the Byzantine detection, and correct for it. However, I think
        that round is a wash, and you should either punt to the next one,
        or re-download with a new set of servers, minus the Byzantine ones.
        Thoughts? --LS.}

4.4. Byzantine server detection.

   {XXXX We need to introduce the validator in the architecture section.}
   
   If a client retrieves a bucket with an incorrect hash, it must have
   received an incorrect PIR response from at least one distributor. The
   client can then attempt to identify which distributor or distributors
   provided the incorrect response as follows:

     1. Connect to the validator.
     2. Submit ETA_SEED[1..K-1] and ETA_MASK. 
     3. Compare the results from the validator with the previously cached 
        distributor responses for the elements of ETA_SET. If mis-matched, 
        flag the offending distributor(s) as having returned invalid 
        responses.
     [XXXX 4. Add the Byzantine distributor to a list of servers never to
      use? After some number of repeat offenses? Should we send it random
      junk instead of an ALPHA_SET? What about notification to the system
      as a whole? --LS.]

     [What about performance issues on the validator? --LS.]

5. Account administration

   {XXXX Write me.  Clients need a way to set preferences, create accounts,
   request messages to be included, request message to be deleted.}

6. System information

   {XXXX Write me.  This section needs to include:
      - A way for clients to learn distributor identities and locations
      - A way 
   }