Reputation

Roger Dingledine, Michael J Freedman, David Molnar, David Parkes, Paul Syverson
April 2003

Section 1: Introduction

Reputation is a tool to predict behavior based on past actions and characteristics. We use reputation regularly in our daily lives --- reputations of individuals, as when we choose a physician; groups, as when we decide that individuals above a certain age can be trusted to purchase alcohol; and collective entities, as when we decide whether Ford is a company that will sell us good cars.

Reputation is based on linkability. When we can link actions to an identity, and link actions by that identity to other actions by that same identity, then we can begin to make predictions about the identity's future actions. We distinguish between the problem of verification (establishing identity and tying it to a real-world entity) and the problem of reputation (judging the behavior of a given identity). This chapter deals mainly with the latter, but we also tackle some of the problems with verification that impact reputation.

The concept of reputation, and the associated themes of building and assessing reputations, are not in themselves new ideas. Reputation, often signaled via a brand such as the New York Times, is an important concept in environments in which there is asymmetric information. Newspapers are a good example -- a reader must pay for the good before evaluating the good, and the seller and publisher likely have better information about the quality of the product. Without the ability to brand a product the newspaper market would likely spiral towards Akerlof's "Market for Lemons", with high quality products unable to differentiate themselves from low quality products and only low quality products surviving in the long-term. Reputation provides what Axelrod refers to as "the shadow of the future", with participants in a system considering the effect of current misbehavior on future interactions, even if future interactions are with different people. Reputation systems (systems that attempt to attach reputation to an identity and make an ongoing assessment of that reputation) promise to "unsqueeze the bitter lemon" (Resnick et al. 2000) by automating word-of-mouth reputation for electronic networks.

While reputation itself is not new, we now have the ability to provide electronic intermediaries, or reputation systems, that efficiently collect, aggregate, and disseminate reputation information. The ability to automate traditional word-of-mouth reputation usefully complements the new opportunities provided by electronics markets such as eBay. Well-functioning reputation systems enable efficient electronic markets, allowing geographically dispersed and pseudonymous buyers and sellers to trade goods, often in one-time transactions and with payments made before the physical delivery of the goods. Reputation systems are used on news and commentary sites such as Slashdot to help readers to filter and prioritize information from multiple, often unmoderated, sources. Dellarocas (2002) notes that the appeal of electronic reputation systems is that they can facilitate cooperation without a need for costly enforcement; the cheap collection and dissemination of feedback information can replace the roles of litigation and enforcement in preventing the abuse of information asymmetries.

We are all familiar with reputations in the physical world, but using the same approaches online introduces some pitfalls. Many of our social norms were established when we were connected to a relatively small number of people, but the Internet connects to millions of people. Because our preexisting notions of how to build trust do not apply as well to this more dynamic world, traditional reputation approaches allow some new attacks. Still, the internet provides us with a staggering amount of raw data about preferences, behavior patterns, and other details, if only we could rely on it. To build a good reputation system, we must find ways to use these data without naively believing in their quality. In particular, online reputation systems must consider several issues:

How broad is the context we're trying to build a reputation for? Are we trying to predict how well a given person will perform at a very specific task (eg delivering a letter to the post office), or are we trying to predict his performance at a broad range of tasks such as delivering letters, running a company, and babysitting our kids?
What's at stake? Are we talking about predicting whether he'll put a stamp on his envelope, or whether he'll betray our government's secrets?
Can we tie people to their online identities, or is it possible for people to create many online identities that are not obviously related to each other?
Assuming we've solved the issue of tying identities to real people, how can we measure reputation?
Finally, what algorithms can let us make accurate predictions based on these reputation values?

Unfortunately, the field of reputation research is still young, and there are no clear technological solutions (now or on the horizon) robust enough for most government transactions. In this chapter we lay out the problems in reputation theory and discuss the lessons we can learn from existing models. Section 2 presents some fielded reputation systems. Section 3 generalizes some of the issues and presents some of the current roadblocks in current reputation systems, and Section 4 presents some more roadblocks for the specific context of pseudonymous systems.

In section 5, we make some simplifying assumptions to develop the notion of an "economy of reputation." Reputation can be earned, spent, and traded --- after all, people listen to somebody who has a good reputation when he recommends somebody else. Furthermore, reputation entails costs: if you make it hard to provide reputation information, people won't bother, whereas if you reward them in some way they'll give more information. Reputation systems motivate people in ways similar to monetary economies: as a simple example, just as the fear of losing money can inspire someone to take preventative measures, so can a fear of loss of reputation. And finally, people can try to game the system by planting false reputation, or buying and selling reputation.

Section 6 finishes with some conclusions and further thoughts.

Section 2: Fielded reputation systems

Who will moderate the moderators: Slashdot

The news site Slashdot.org is a very popular news service that attracts a particular kind of "Slashdot reader" -- lots of them. Each and every Slashdot reader is capable of posting comments on Slashdot news stories, and sometimes it seems like each and every one actually does. Reputations based on this interaction can help a user figure out which of the many comments are worth reading.

To help readers wade through the resulting mass of comments, Slashdot has a moderation system for postings. Certain users of the system are picked to become moderators. Moderators can assign scores to postings and posters. These scores are then aggregated and can be used to tweak a user's view of the posts depending on a user's preferences. For example, a user can request to see no posts rated lower than 2.

The Slashdot moderation system is one existing example of a partially automated reputation system. Ratings are entered by hand, using trusted human moderators, but then these ratings are collected, aggregated, and displayed in an automatic fashion.

Although moderation on Slashdot serves the needs of many of its readers, there are many complaints that a posting was rated too high or too low. It is probably the best that can be done without trying to maintain reputations for moderators themselves.

Reputations worth real money: eBay

The eBay feedback system is another example of a reputation service in practice. Because buyers and sellers on eBay usually have never met each other, neither has much reason to believe that the other will follow through on their part of the deal. They need to decide whether or not to trust each other.

To help them make the decision, eBay collects feedback about each eBay participant in the form of ratings and comments. After a trade, eBay users are encouraged to post feedback about the trade and rate their trading partner. Good trades, in which the buyer and seller promptly exchange money for item, yield good feedback for both parties. Bad trades, in which one party fails to come through, yield bad feedback which goes into the eBay record. All this feedback can be seen by someone considering a trade.

The idea is that such information will lead to more good trades and fewer bad trades --- which translates directly into more and better business for eBay. This isn't always the case in practice, but it is often enough to give eBay a reputation of its own as the preeminent web auction site.

Codifying reputation on a wide scale: The PGP web of trust

One of the first applications to handle reputations in an automated fashion on a genuinely large scale was the "web of trust" system introduced in Phil Zimmermann's Pretty Good Privacy (PGP). This was also the first program to bring public key cryptography to the masses (see the Crypto chapter for more details on public key crypto).

With public key cryptography comes the key certification problem, a type of reputation issue. Reputations are necessary because there is no way to tell from the key alone which public key belongs to which person.

For example, suppose Alice would like people to be able to send encrypted messages to her. She creates a key and posts it with the name "Alice." Unbeknownst to her, Carol has also made up a key with the name "Alice" and posted it in the same place. When Bob wants to send a message to Alice, which key does he choose? This happens in real life; as an extreme example, the name "Bill Gates" is currently associated with dozens of different PGP keys available from popular PGP key servers.

So the key certification problem in PGP (and other public key services) consists of verifying that a particular public key really does belong to the entity to whom it "should" belong. PGP uses a system called a web of trust to combat this problem. Alice's key may have one or more certifications that say "Such and such person believes that this key belongs to Alice." These certifications exist because Alice knows these people personally; they have established to their satisfaction that Alice really does own this key. Thus Alice's key builds up a reputation as being the right key to use when talking to Alice. Carol's fake "Alice" key has no such certifications, because it was made up on the spot.

When Bob looks at the key, his copy of PGP can assign it a trust level based on how many of the certifications are made by people he knows. The higher the trust level, the more confidence Bob can have in using the key. But because there's a limit to how many people Alice and Bob can know, in reality Bob's software will look for broader connections, such as a "certification chain" that is less than, say, 4 hops long, or how many independent paths through the web go through at most 4 people.

There are still a number of tricky issues that make the PGP web of trust concept hard to use securely: for example, what exactly did Bob mean when he certified Charlie's key, and does Charlie mean the same thing when he certifies David's key? But the key point to remember here is that the web of trust depends on reputation to extend trust to new parties.

Attack-resistant trust metrics: Google

In the 1990's, web search engines downloaded and catalogued millions of web pages. Engines responded to queries based on simple metrics, such as whether the page included the search phrase, and how many times a page included the search phrase.

Over time, people began "attacking" these search engines to steer visitors to their respective pages. People created pages with many common key words: if a page contained enough attractive words, search engines would be more likely to return it to users searching for that phrase. The attack was improved by creating many copies of a given page, thus increasing the likelihood that it would be returned.

In response to these attacks, and in hopes of developing more accurate search engines, researchers began evaluating alternative trust metrics for search engines. In 1998, Stanford researchers opened a new search engine, called Google, that uses reputation technology similar to projects such as IBM's "Clever" project. Google's fundamental innovation is to determine a page's usefulness or page rank based on which pages link to it.

Google's trust algorithm can be visualized as follows: Begin on the front page of a very popular website (e.g., Yahoo, CNN, Slashdot, etc.). Now randomly follow links from the front page for a few minutes. The page you end up on (it may still be somewhere within the original website, or you may be somewhere else on the WWW) receives a small amount of reputation. Repeat this operation millions of times. Over time, pages which are commonly linked to will emerge as having higher reputation because they are "more connected" in the web. Pages on the fringes of the WWW will have lower reputation because few other pages point at them. Sites which are linked from many different reputable websites will be ranked higher, as will sites linked from extremely popular websites (e.g. a direct link from the main Amazon page to an offsite page).

In actuality, Google's page rank computation is more efficient than the random walk approach outlined above, and Google's actual algorithm is complicated by its ability to consider factors such as whether the words on a page are relevant to the words in the query. Though complex, Google's reputation-based scheme results in far better results to search queries than other search engines.

Attack-resistant trust metrics: Advogato

Raph Levien's "Advogato" trust metric applies this same approach to the PGP web of trust problem. One problem with the basic web of trust algorithm is that trust decisions that Alice might make, such as, "If he's within 4 hops of me then I trust him," leave her open to attacks. Imagine David manages to get enough certifications that his key is within 3 hops of Alice. Then, if he creates a new identity (key) and certifies it with his "David" key, Alice will trust the new identity. In fact, he can create as many new identities as he wants, causing Alice to trust more and more new "people." This attack is similar to the search engine attack above, where a website author creates many different web pages to increase the chance of drawing users.

Advogato's trust metric aims to withstand this attack. Its operation is similar to Google's above: start with a few trusted people in the web of trust (called seeds), and walk out randomly along the web of trust. People frequently walked to are considered reputable. In the scenario above, David's key would be a trust bottleneck, meaning that the total amount of trust that David's alter egos can gain is limited by the number of certifications that David can trick out of honest users. That is, the amount of trust that David can pass on to his new nodes is bounded by the amount of trust that is passed to David himself.

One of the interesting features of these trust metrics is that the seed doesn't always have to be the same. In theory, google could let each user specify a website or websites which should be the "starting point" for reputation calculations; similarly Advogato could let a user ask "If I'm the seed, then which nodes count as reputable?"

Section 3: General issues with reputation systems

Attacks and adversaries

Two questions determine how we evaluate the security of reputation systems. First, what's the context for the reputation system -- what's at stake? Second, who are the adversaries? The capabilities of potential adversaries and the extent to which they can damage or influence the system dictate how much energy should be spent on security.

In mid-2000, a group of people engaged in eBay auctions and behaved well. As a result, their trust ratings went up. Once their trust ratings were sufficiently high to engage in high-value deals, the group suddenly "turned evil and cashed out." That is, they used their reputations to start auctions for high-priced items, received payment for those items, and then disappeared, leaving dozens of eBay users holding the bag.

If a corporation planning a $50 million transaction bases its decisions on website that computes and publishes reputations for companies, it could be worth many millions of dollars to influence the system so that a particular vendor is chosen. Indeed, there are quite a few potential adversaries for a company tracking online reputations for large businesses. A dishonest vendor might want to forge or use bribes to create good feedback to raise his resulting reputation. In addition to wanting good reputations, vendors might like their competitors' reputations to appear low. Exchanges -- online marketplaces that try to bring together vendors to make transactions more convenient -- would like their vendors' reputations to appear higher than those of vendors that do business on other exchanges. Vendors with low reputations -- or those with an interest in people being kept in the dark -- would like reputations to appear unusably random. Dishonest users might like the reputations of the vendors that they use to be inaccurate, so that their competitors will have inaccurate information.

Perhaps the simplest attack that can be made against a reputation system is called shilling. This term is often used to refer to submitting fake bids in an auction, but it can be considered in a broader context of submitting fake or misleading ratings. In particular, a person might submit positive ratings for one of her friends (positive shilling) or negative ratings for her competition (negative shilling). Either of these ideas introduces more subtle attacks, such as negatively rating a friend or positively rating a competitor to try to trick others into believing that competitors are trying to cheat.

Shilling is a very straightforward attack, but many systems are vulnerable to it. A very simple example is the AOL Instant Messenger system. You can click to claim that a given user is abusing the system. Since there is no support for detecting multiple comments from the same person, a series of repeated negative votes will exceed the threshold required to kick the user off the system for bad behavior, effectively denying him service. Even in a more sophisticated system that detects multiple comments by the same person, an attacker could mount the same attack by assuming many different identities.

Vulnerabilities from overly simple scoring systems are not limited to "toy" systems like Instant Messenger. Indeed, eBay's reputation system, used naively, suffers from a similar problem. If the reputation score for an individual were the good ratings minus the bad ratings, then a vendor who has performed dozens of transactions, and cheats on only 1 out of every 4 customers, will have a steadily rising reputation; whereas a vendor who is completely honest but has only done 10 transactions will be displayed as less reputable. A vendor could make a good profit (and build a strong reputation!) by being honest for several small transactions and then being dishonest for a single large transaction. Fortunately, this attack does not work against eBay -- ratings from users include a textual description of what happened, and in practice even a few vocally unhappy customers mean that a vendor's reputation is ruined.

More issues to consider

In some situations, such as verifying voting age, a fine-grained reputation measurement is not necessary -- simply indicating who seems to be sufficient or insufficient is good enough.

Also, in a lot of domains, it is very difficult to collect enough information to provide a comprehensive view of each entity's behavior. It might be difficult to collect information about entities because the volume of transactions is very low, as we see today in large online business markets.

But there's a deeper issue than just whether there are transactions, or whether these transactions are trackable. More generally: does there exist some sort of proof (a receipt or other evidence) that the rater and ratee have actually interacted?

Being able to prove the existence of transactions reduces problems on a wide variety of fronts. For instance, it makes it more difficult to forge large numbers of entities or transactions. Such verification would reduce the potential we described in the previous section for attacks on AOL Instant Messenger. Similarly, eBay users currently are able to directly purchase a high reputation by giving eBay a cut of a dozen false transactions which they claim to have performed with their friends. With transaction verification, they would be required to go through the extra step of actually shipping goods back and forth.

Proof of transaction provides the basis for Amazon.com's simple referral system, "Customers who bought this book also bought..." It is hard to imagine that someone would spend money on a book just to affect this system. It happens, however. For instance, a publishing company was able to identify the several dozen bookstores across America that are used as sample points for the New York Times bestseller list; they purchased thousands of copies of their author's book at these bookstores, skyrocketing the score of that book in the charts. [David D. Kirkpatrick, "Book Agent's Buying Fuels Concern on Influencing Best-Seller Lists," New York Times Abstracts, 08/23/2000, Section C, p. 1, col. 2, c. 2000, New York Times Company.]

In some domains, it is to most raters' perceived advantage that everyone agree with the rater. This is how chain letters, Amway, and Ponzi schemes get their shills: they establish a system in which customers are motivated to recruit other customers. Similarly, if a vendor offered to discount past purchases if enough future customers buy the same product, it would be hard to get honest ratings for that vendor. This applies to all kinds of investments; once you own an investment, it is not in your interest to rate it negatively so long as it holds any value at all.

Transferability of reputations

Another important issue in reputation is tranferability. In some contexts, transferability is inherent. Group reputation is only useful if it is based on behavior of members of a group and can be assigned to most members of that group, as in legal drinking age determination. Even here there can be many factors, for example there may be good evidence that a reputation is strongly associated with a group, but the reputation might only apply to some small percentage of the group. Alternatively, a reputation might only accurately reflect individuals while in the group, which turns out to be highly dynamic. These are some of the difficulties faced in combining reputation inputs in order to meaningfully answer a reputation question.

Of course in the context of a single national identifier, much of this collapses. A simple reputation economy seems easier to manage, hence stronger than one involving multiple identities, pseudonyms, or full anonymity. However, this simplicity is not without drawbacks and other complexities. Individuals may currently have multiple legitimate online identities associated with distinct merchants, when communicating with friends, when engaged in their occupations, when interacting with different parts of various governments and jurisdictions. Combining all of these identities creates a number of problems. Aside from the obvious privacy concerns, it inherently increases vulnerability to identity theft. In the face of a single national identifier, identity theft becomes both a larger threat from more widespread exposure and a more devastating threat because more is at stake in a single identity. And, because we would need to bootstrap a single national identifier from the current system of many identities per person, we still need to deal with the issue of transferability of reputation from one identity to another. Thus we must consider multiple identities if only temporarily.

Ubiquitous anonymity is not a simple answer either. Currently, there are no reputation systems that work anonymously (much as cash functions anonymously in economic systems) and remain safe. Theoretically it might be possible to develop such a reputation system; this is an active area of a research currently. The security, key management, and credential systems that would be necessary to make this feasible are no more in place for this purpose as yet than for any other.

Digital credentials are one means of storing and managing reputation. That is, credentials can be used to sum up reputation. We are already familiar with such usage in the physical world: reputation as a longstanding resident, as gainfully employed, as a responsible driver are often established by means of credentials. Digital credentials are in principle no different. In practice there are many issues, some of which we have already mentioned. The management of these credentials is outside the scope of this work. Here we consider how to assess and formulate what a credential attests to. Credentials figure in only to the extent that they are used in determining and managing reputation. Reputation systems are more readily thought of as those systems that attempt to attach reputation to an identity and make an ongoing assessment of that reputation.

With reputation, besides the possibility of transferring reputation between multiple identities of the same individual we must consider the possibility of transferring a single reputation between multiple individuals. This may result in a reputation that correctly attaches to no one person. For example, consider roles such as CTO for a company, or an agent in charge of handling a given contract for the company. These identities must be able to be transferred smoothly to another person --- or shared. Whether or not this is acceptable would depend on what the reputation is intended to do. Of course even deciding when two individuals are the same, or more generally, when two actions are by the same individual can be quite difficult online. A good reputation system should either provide adequate counters to the above identity attacks such as pseudospoofing, or provide useful information when such answers cannot be adequately given.

Section 4: Reputation systems in a pseudonymous environment

Here we focus on the additional problem of operating from some middle point in the spectrum of identification. Instead of a single global permanent identifier, individuals may expose some pseudonym identifier to the system, to which sets of attributes or credentials are bound. So, the basic problem from this scenario changes: Given a set of statements not linked directly to real people, how can we believe them, i.e., trust that they both valid and complete?

Of course, one obvious solution to this problem is the introduction of centralized, trusted authorities to authenticate individuals when they register a pseudonym with the system. Cryptographic techniques known as blinding may be used to prevent the authority from linking entities to pseudonyms while still ensuring that an entity can only present one pseudonym to some system environment at any given time. Such a task is certainly not free or fool-proof (consider Verisign's weakness if their root key is compromised), and such security often raises an individual's barrier to entry.

In this section we describe some limitations and open problems of reputation systems in a pseudonym-based setting. In particular, whether systems are centralized or distributed (peer-to-peer) has a lot of impact on how easy it is to handle reputations online.

Verifying the uniqueness of identities

One fundamental limitation of reputation systems in online settings is the difficulty of ensuring that each entity only presents a single identity to the system.

Classic authentication considers whether a presented identity corresponds to the expected entity, but not whether the authenticated identities are distinct from one another. In a virtual setting, a single person may create and control multiple distinct online identities. In some situations, a honest person certainly should be able to have multiple pseudonyms that are used for different aspects of the person's online life (consider credit cards, library cards, student badges, hospital IDs, etc). However, within the context of a specific reputation environment, a dishonest person may establish multiple, seemingly distinct, pseudonyms that all secretly collaborate with each other to attack the system.

Therefore, each entity should be represented with equal weight, rather than being able to inflate his importance. To do this, we must bound the number of pseudonyms that an adversary may control (see also the above discussion about Google and Advogato). In many cases, e.g., within a specific category of reputation, we seek to establish a one-to-one correspondence between entity and pseudonym. This "pseudospoofing" problem is the hardest issue facing the security of reputations systems in large-scale, decentralized environments.

One technique in a distributed setting to bound the number of pseudonyms that an adversary controls involves requiring "proofs of work" for participating.

To make reputation attestations, users must perform time-consuming operations. For example, reviews at online communities (which go toward the reputation of products or services) generally take the form of written descriptions of users' experiences, as opposed to merely assigning some numeric score. To prevent the bulk submission of scores, one may take a more computational approach by requiring users' machines to solve computationally-difficult problems to create pseudonyms or make statements to the system. Other online systems may attempt to establish that users are actually humans rather than automated scripts, such as the graphical "reverse Turing tests" employed by Yahoo Mail, where the user must read a distorted word from the screen, or describe what word characterizes a given picture (see www.captcha.net).

However, while these approaches may bound the number of pseudonyms or attestations, they certainly cannot ensure the one-to-one correspondence that we want: the participants in large-scale systems are simply too heterogeneous. Individuals are willing to commit different amounts of their times, and computers run at greatly differing speeds or bandwidth.

Validating statements

Validation means making sure that the statement was actually made by the person to whom it is attributed. In a system that prevents pseudospoofing by authenticating pseudonyms as distinct, the practice of validating statements is easy. For example, in web portals that require users to login -- i.e., a centralized storage and authentication facility -- posted statements are implicitly valid and bound to pseudonyms. In more diffuse peer-to-peer environments, digital signatures can similarly guarantee the authenticity of statements -- although the problem of verifying large sets of signatures may require more complicated system engineering.

One implication of shilling attacks is that a user cannot trust attested statements carte blanche. Colluding identities can cooperatively raise each others' reputations, or degrade the reputations of their competition. Therefore, a reputation system must consider the credibility of the identities making statements. The PGP Web of Trust and Advogato's Trust Metric are both systems that explore the automated transitivity of trust.

Ensuring reports are complete

When centralized, trusted storage or aggregation entities are assured, users posting statements under their respective pseudonyms know that their rating will be included in their target's report or aggregated totals. For example, Ebay ensures that they expose all the feedback on a given user. Furthermore, they can limit feedback to those identities performing a transaction, i.e., the winner of an auction, to limit shilling attacks.

A distributed system, on the other hand, has a difficult time ensuring that reports or rating aggregations are complete. Participants must broadcast their statements to all users. However, sending individual messages to every participant does not scale, and Internet multicast is not a viable option in its current deployment. There are cryptographic techniques to ensure secure broadcast, but they are too expensive both in terms of computation and bandwidth.

Bootstrapping the system and users' initial reactions

Lastly, reputation-based trust must have some method to bootstrap operation. After a system starts but before sufficient ratings have been collected, how do you make decisions? In noncommercial domains, it may be fine to list some entities and declare no knowledge or preference. In others, it may be more reasonable to list only entities for which a sufficiently certain score is known. Initial ratings could be collected by user surveys or polls relating only to known out-of-band information, such as if the entity exists in some alternate real-world domain. As the user continues to participate, the system can collect more feedback based on transactions, interactions, etc.

Bootstrapping is a different issue in a centralized authentication system than in a decentralized system. In a decentralized system, each user develops his own picture of the universe: he builds his trust of each entity based on his own evidence of past performance and on referrals from other trusted parties. Every new user effectively joins the system "at the beginning," and the process of building a profile for new users is an ongoing process throughout the entire lifetime of the system. In a centralized environment, on the other hand, ratings are accumulated across many different transactions and over long periods of time. New users trust the centralized repository to provide information about times and transactions that happened before the user joined the system.

Section 5: Economics of Reputation

Despite the many theoretical problems and pitfalls with online reputation systems in a pseudonymous context, we can make significant progress with reputation system design from simplified models. This section explores such simplified reputation systems in an economic context.

From an economic perspective, the first question is whether a reputation system promotes good behavior by participants, for example truthful reporting of quality by sellers in an electronic market or good-faith comments by participants in an electronic discussion forum. Some other interesting challenges in the design of well-functioning reputation systems include: (a) eliciting the right amount of feedback at the right time; (b) eliciting truthful feedback; and (c) preventing strategic manipulation by participants. Recent years have seen some interesting theoretical analysis, that addresses different methods to tackle each of these challenges. Although the analysis is often performed for quite stylized models, e.g. with simplified assumptions about what agents can do or how decisions are made, we can nevertheless draw some useful conclusions. In particular, social and economic incentives influence design criteria in ways that a purely technological discussion of reputation systems and pseudonymous identities don't consider.

A number of studies have considered eBay-style reputation systems, in which ratings are positive, negative or neutral, and information is provided in aggregate form (a tally of the amount of feedback in each category over the past 6 months). Simple stylized models allow us to consider the incentives for a seller to misreport the quality of a good, or a buyer to provide feedback based on differences between reported quality and actual quality.

A basic question, addressed by Dellarocas (2001), is whether eBay-style reputation mechanisms can be well-functioning (stable), such that sellers adopt a consistent strategy of (mis)reporting quality levels of goods for sale, and truthful reporting, such that the assessment of quality by buyers was equal to the true quality of a good. Assuming truthful feedback, in which buyers provide negative feedback if and only if the actual quality is sufficiently less than the reported quality, the analysis demonstrates that binary reputation mechanisms can be well-functioning and provide incentives for sellers to resist the temptations of "misrepresentation and sloth" (Miller et al. 2002). Recently, Dellarocas (2002) has considered the role of the type of information that is disseminated by reputation systems based on feedback information, and demonstrated that it is sufficient to provide eBay-style aggregate information instead of a full feedback history. The model is again stylized, with cooperation by buyers in providing truthful and complete feedback.

One common theme is the interaction between reputation systems and concepts of pseudonymity, which are important in environments in which fluid identities are either necessary for reasons of privacy and free discourse, or unavoidable for technological or economic reasons. Friedman and Resnick (2001) consider the social cost of cheap pseudonyms, and note that participants that can costlessly adopt a new pseudonym should rationally do so whenever their reputation falls below that of a newcomer to a system. In this sense, pseudonyms incur a social cost because all else being equal it would be more efficient to trust than distrust strangers in a world in which most people are trustworthy. As one solution, Friedman and Resnick propose that participants can choose to adopt {\em once-in-a-lifetime} (1L) pseudonyms for particular arenas, such as product classes on eBay or discussion threads on Slashdot. These 1L pseudonyms allow a participant to commit to maintaining a single identity and can be implemented with a trusted intermediary and the standard encryption technique of blind signatures.

Turning to the problem of providing incentives for reputation systems to effectively elicit the right amount of feedback, and at the right time and with the right level of detail, this can be considered within the standard framework of social-choice theory that seeks to implement good system-wide outcomes in systems with self-interested participants. Avery et al. (1999) consider the problem of eliciting the right amount of feedback about a product of a fixed quality, such as a new restaurant or a new Broadway show. Noting that the value of information in early feedback is higher than in late feedback because it can improve the decisions of more individuals, payment schemes are proposed to provide incentives for a socially-optimal quantity and sequencing of evaluations. Early evaluators are paid to provide information and later evaluators pay to balance the budget, to mitigate the tendency for an underprovision of evaluations and internalize the system-wide value-of-information provided by early feedback. This "market for evaluations" continues to assume full and honest evaluations.

Recently, Miller et al. (2002) suggest the use of proper scoring rules to elicit honest and truthful feedback from participants in a reputation system. The basic idea is to make payments based on how well feedback predicts the feedback from later participants. Thus, in addition to collecting, aggregating, and distributing information, the reputation system also provides rewards and imposes penalties based on the feedback provided by participants. In the setting of an electronic market, feedback information provided about a particular seller by a buyer is used by the intermediary to infer posterior distributional information about the future feedback on that seller. The buyer then receives a payment that is computed, according to a proper scoring rule, so that the expected payment is maximized when the buyer provides truthful feedback and maximizes the predictive accuracy of the posterior distribution and honest reporting is a Nash equilibrium. Extensions are also proposed to induce the right level of effort in providing feedback information, and to compensate for costly reporting of information. One problem that remains is that there are many Nash equilibrium of the proposed payment scheme, with collusive reports and "herding" style behavior (Bikchandani et al. 1989) another possible outcome.

In summary, useful reputation mechanisms cannot and should not be designed without regards to the economics of the problem, but rather through a careful consideration of both the computational and incentive aspects of well-functioning reputation systems. Design considerations include: the type of feedback information that is collected (e.g. binary, continuous, etc.); the form of information feedback that is distributed (e.g. aggregated, complete history, time-windowed, etc.); the role of the reputation system in providing incentives for the provision of honest and accurate feedback information, for example via payment schemes; the interaction between psuedonymity and the economics of reputation; and the robustness of a system against strategic manipulation by participants.

Section 6: Brief conclusion

Reputation is a complex but powerful tool to predict an entity's behavior online. Effective reputation systems must use algorithms suited to the situation ("how much is at stake? who wants to attack the system?"), and they must also consider social and economic incentives.

We've gained experience from a variety of fielded systems, ranging from auction sites like eBay that track reputation of their users for being good at paying on time or delivering the claimed goods, to web search engines designed to determine the most suitable web page for a query while simultaneously preventing people from easily influencing which page is returned. While the field of reputation system research is still quite young, the idea of making predictions based on observed behavior is intuitive and flexible. Reputation must play a role in any identity management solution.

References

P. Resnick R. Zeckhauser, E. Friedman, K. Kuwabara.
Reputation Systems: Facilitating Trust in
Internet Transactions.   Comm'ACM. 40(3) 56-58, 2000.

C. Dellarocas, "Analyzing the Economic Efficiency of eBay-like Online
Reputation Reporting Mechanisms", in Proc. 3rd ACM Conf. on Electronic
Commerce (EC'01), 2001.

C. Dellarocas, Efficiency and Robustness of Mediated Online Feedback
Mechanisms: The Case of eBay.  SITE'02.  (working paper).

E. Friedman & P. Resnick. The Social Cost of Cheap Pseudonyms.
J. Economics and Management Strategy 10(2) 173-199, 2001.

George A Akerlof. The Market for Lemons: Quality Uncertainty and the
Market Mechanism. Quarterly Journal of Economics. 84:488-500, 1970.

C. Avery, P. Resnick, R. Zeckhauser (99). The Market for Evaluations.
American Economic Review 89(3): 564-584.

Nolan Miller, Paul Resnick and Richard Zeckhauser. Eliciting Honest
Feedback in Electronic Markets. August, 2002. SITE'02.

R. Axelrod. The Evolution of Cooperation. 1984. New York: Basic
Books.

S. Bikchandani, D. Hirshleifer, and I. Welch. A Theory of Fads,
Fashion, Custom and Cultural Change as Informational
Cascades. J. Pol. Econ., 100(5) 992-1026.