outline for a paper on accountability and MIXes ----------------------------------------------- Goals: 1) Show current and perceived problems in the remailer network which stem from a lack of "accountability." 2) Define "accountability" for MIXes. Define ways of saying whether a MIX network is efficient or "works well." 3) Describe the state of the art for accountability in MIXes. 4) Argue that the state of the art is not very good. Show solutions which "work better." I. Intro II. Goal 1 -- Current and Perceived Problems in remailers Overall Note: We make the point that previously in MIXes, efficiency is neglected in favor of stronger anonymity. As Adam Back pointed out, this may be a misguided policy. If the system is not efficient or not reliable enough, the anonymity may be compromised because *too few people use it*. II.1 Public remailers are subject to spam attacks. These are typically deliberately aimed at shutting down the remailer by causing it to run afoul of ISP. This is a variant of a flooding attack. II.2 Remailers may be compromised or unreliable. The network is so unreliable that many messages never get through and have to be sent many times. Not a good idea for traffic analysis. Senders need some way of determining which remailers are reliable and which are not, so as to minimize the number of messages which need to be re-sent. There are currently several 'remailer stats' pages available, but it's not at all clear that they're reliable, fair, or secure against manipulation. Remailer stats pages: http://anon.efga.org/Remailers Has a pointer to 4 different stats pages : Frog, publius.net, gretchen, Green. http://www.publius.net/mixmaster-list.html http://www.tahina.priv.at/~cm/stats/ http://www.neuropa.net/~gretchen/rlist2.html http://generalprotectionfault.net/green/ (not working right now) See also the "geographic location page" http://generalprotectionfault.net/orange/remap.htm II.3 "Selective" denial of service. http://www.eff.org/pub/Privacy/Anonymity/1999_09_DoS_remail_vuln.html As stated, the attack requires compromising or interposing between many different remailers and the outside world. This may not be feasible. An easier attack would be to fool with the statistics given out by the efga.org or other servers; make sure that their test packets go astray. II.4 Remailer network transient; permanent nodes are rare. People pop up, run a remailer, realize that it's not worth the hassle and costs $$$, then disappear. This is a case for redeemable micropayments. III. Goal 2 -- Define "accountability" for remailers and what it means for an accountability system to "work." III.1 Remailer accountability can be considered in two parts. 1) The remailer operator must be accountable to the sender of a message. If the remailer fails to deliver a message, the sender needs to know that. Danger: if he concludes that a dropped message means he should cease using that remailer, then this opens the door for an adversary to run a 100% reliable remailer and try to reduce the reliability of the others. 2) The sender must be accountable to the remailer operator. The sender must not be able to send junk messages it doesn't "care about" through the remailer. At least not enough to flood the remailer. III.2 An accountability system "works" if it maximizes the "efficiency" of the remailer system. First sketch: Consider efficiency in terms of a sender Alice A and many recipients Bob, with each Bob denoted by B_i. Define the random variables Y = Alice's message gets through the mixnet B = takes on the integer values i for the B_i A notion of "efficiency" of a remailer system for Alice is E[Y]. We can calculate E[Y] by calulating E[E[Y|B]]. So if we can compute P(Alice's message gets through mixnet when sent to B_i) then we can compute this efficiency. We use the law of total expectation together with the probability that Alice sends to any particular B_i. E[Y] = f_B(0) E[Y|B = 0] + f_B(1) E[Y|B = 1] + ... where f_B(i) is the density function of B. f_B(i) us what the probability is of Alice sending to B_i. If the adversary adds new player(s) B_evil, then Alice doesn't know about them at first and won't send messages to them. f_B(evil) = 0 So their values of E[Y|B = evil] do not affect the efficiency at all. Is this a reasonable model, or is it too recursively defined? We can use a notion of efficiency to show what efficiency would be like in a "state of nature" without any reputation or accountability mechanisms whatsover. Then we can evaluate a mechanism for goodness based on how much better it is than the "state of nature." For a simple case, consider this simplified model: There are n nodes N_i 1 <= i <= n. Each node is corrupt with probability p. There is a single sender Alice and m Bobs B_i 1 <= i <= m In a "state of nature", each B_i picks a reply block of k nodes uniformly at random. The block is "bad" if any one of the servers in it is bad. (I think this means this is a binomial random variable with parameters k and p). Say a message never gets through a bad block. Now we know exactly what E[Y|B = i] is for each B_i. If we then know f_B(b), we can compute E[Y]. I should just assume a uniform f_B(i) = 1/m and calculate this; maybe later today. In general, though, we may be able to ask these things about "efficiencies" * are they local or global? * Do they "make sense" - Can an adversary drive them to a particular value without "doing any work"? - Do they capture what we actually want to measure/improve? * Are they easy to calculate? * Others? which may give us some guidance on what to consider as efficiencies. IV. Goal 3 -- Describe the state of the art. IV.1 Levien's original uptime and reliability list. IV.2 Modern day descendants (Frog, anon.efga.org) IV.3 What ZKS is doing V. Goal 4 -- Argue state of the art can be improved. Suggest improvements. V.1 Lack of micropayments and lack of incentive to run a node. Models for micropayment use in mix-net systems: Consider Alice sending some message to Bob, through some number of intermediate nodes (both the number and identity of these nodes known only to her, modulo each node knowing predecessor and successor.) V.1.1. End-to-end model: The simplest approach is to make Alice send Bob some form of payment and not worry about what happens to any intermediaries. The intermediate nodes lack any protection from attack. Bob might even be fictitious; Alice can attack any number of intermediate peers by routing her queries through them, using up their bandwidth, or attempting to perform active traffic analysis. Consider mix-nets such as Mixmaster, which keep a pool of outgoing messages to prevent timing attacks. Only when this threshold is passed will the remailer choose one message at random and send it out. We wish to prevent adversary from being able to flood this pool with (t-1) messages. This problem is precisely one of our motivations for using micropayments! V.1.2. Pairwise model: Throw micropayments into every transaction between every pair of nodes: one long route can be modeled as a number of pairwise transactions. This might appear to protect each recipient of payments, but is also fundamentally flawed. Using fungible micropayments, each peer earns one unit from its predecessor, then spends one unit with its successor. Assuming equal costs throughout the network, Alice is the only net debtor and Bob the only net creditor. But if a single, malicious operator is in charge of both Alice and Bob, these two peers have managed to extract work from the intermediate nodes without paying -- a more-subtle DoS or flooding attack! Using non-fungible micropayments, Alice remains a net debtor, but so are all intermediate peers. Alice can make use of greater computational resources (centralized or distributed) to flood intermediate peers with POWs. Being properly-behaving nodes, these peers attempt to make good on the micropayment exchange, and start churning out POWs for the next hop in the protocol. Eventually Alice can exhaust the resources of a whole set of smaller, honest peers. V.1.3. Amortized pairwise model: Taking what we learned about the risks of the previous model, we design a more sophisticated one that amortizes Alice's cost throughout the network route by iteratively decreasing the cost of transactions as they move through the system. In the case of non-fungible POWs, we still lose. First of all, Alice can still make use of greater wealth, economies of scale, distributed computing, etc. in order to attack intermediate nodes. While the load decreases as it moves though the system, peers X, Y, and Z still need to devote some of their own resources; they may be unable to afford that load. For fungible payments, this model appears more promising. Intermediate nodes end up as net creditors: their resources are paid for by the cut they take from Alice's initial lump-sum payment. This model has another large weakness from a security point-of-view: we leak information regarding the route-length. If we use amortized payments, the distribution of wealth from Alice's initial lump-sum payment is some monotonically (or at least linearly) decreasing function. Intermediate peers can extrapolate how many hops are in the route, as well as their relative position in the chain. V.1.4. All points model: We hypothesize that the only model to offer adequate accountability and resource protection is an all-points model: Alice pays each peer engaged in the protocol, intermediate and endpoint alike. Obviously, we wish to use micropayments which do not leak any information about Alice: systems built on hash-collisions or other cryptographic puzzles, systems using blinded public key encryption, systems using proofs-of-knowledge, etc. There are several existing considerations for forward-only mixnets. First of all, we lose our capacity to perform on-demand or interactive payments. "On depand" payments are according to the client puzzle model, in which micropayments are only required if the node detects a possible connection depletion attack. Interactive processes cannot be carried out adequately through the mixnet. Therefore, we suggest that payments should *always* accompany messages, and need to be of a non-interactive variety. To stop double-spending, the scheme must use either a centralized bank server model (for instance, Chaumian ecash), use offline fraud-tracing (for instance, Brands' system, which allows non-interactive payment through out-of-band recipient-specific nonce), or have recipient-specific information encoded in the payment (for instance, hash cash). Any recipient-specific information should further be hidden from view, so as to protect an eavesdropper from being able to piece together the route by looking at the micropayments. Ideally, recipient-hiding cryptosystems [David Hopwood. hopwood@zetnet.co.uk. Recipient-Hiding Blinded Public-Key Encryption. Draft Manuscript. ] help ensure that the act of encrypting the micropayment does not itself leak information about to whom the data is encrypted. Payment schemes such as Brands' system don't actually transmit the encrypted "coin" itself, but rather a proof-of-knowledge payment transcript. V.2 Possibility of corrupted uptime services. Also, do they really help us make better decisions about where to send messages? Outline how reputation systems in general will address this. Tallying reputations for pseudonyms allows * Improved reliability for messages * Improved efficiency for overall remailer system while maintaining sufficient pseudonymity of users and nodes. Tradeoffs. Possible models: * Have a few people doing the reliability checks and building public reputation lists from them. Still relatively centralized and vulnerable to influence. But much simpler to design/implement/analyze. * Any mixnet user can 'rate' any mixnet node. More 'fair' and decentralized. But much more complex -- pseudospoofing and outright lying can make the system fall apart? (This is a problem familiar to many anonymous systems.) * Using 'maximum flow' model (eg Advogato.org) to limit damage done by pseudospoofing. Developing a scoring system -- how to aggregrate ratings into scores: * Ways of verifying transactions, to require rating credibility * Ways of collecting ratings -- retain anonymity but also remain verifiable * 'Local' reputation: allow each user to weight the reputation calculation based on preferences * Privacy and information leaks with publishing scores -- does it introduce any new traffic analysis attacks? * Reputation of the reputation lists? Can we prove that we included each rating in our score? Automating the use of these reputation scores in mixnet client software. VI. Conclusion and future work