📝 Enhance Impact Evaluators with comprehensive mechanism toolkit and insights

+34 -27

1 changed file

expand all

+34 -27

Impact Evaluators.md

··· 1 1 # Impact Evaluators 2 2 3 - Impact Evaluators are frameworks for [[Coordination|coordinating]] work and aligning [[Incentives]] in complex [[Systems]]. They provide mechanisms for retrospectively evaluating and rewarding contributions based on actual impact, helping solve coordination problems in [[Public Goods Funding]], research evaluation, and decentralized systems. 3 + Impact Evaluators are frameworks for [[Coordination|coordinating]] work and aligning [[Incentives]] in complex [[Systems]]. They provide mechanisms for retrospectively evaluating and rewarding contributions based on impact, helping solve coordination problems in [[Public Goods Funding]]. 4 4 5 - It's hard to do [[Public Goods Funding]], open-source software, research, etc. that don't have a clear, immediate financial return, especially high-risk/high-reward projects. 5 + It's hard to do [[Public Goods Funding]], open-source software, research, etc. that don't have a clear, immediate financial return, especially high-risk/high-reward projects. Traditional funding often fails here. Instead of just giving money upfront (prospectively), Impact Evaluators create systems that look back at what work was actually done and what impact it actually had (retrospectively). It's much easier to judge the impact in a retrospective way! 6 6 7 - Traditional funding often fails here. Instead of just giving money upfront (prospectively), Impact Evaluators create systems that look back at what work was actually done and what impact it actually had (retrospectively). It's much easier to judge the impact in a retrospective way! 7 + ## Notes 8 8 9 9 - The goal is to **create strong incentives for people/teams to work on valuable, uncertain things** by promising a reward if they succeed in creating demonstrable impact. 10 - - Impact Evaluators work well on concrete things that you can turn into measurable stuff. They are powerful and might overfit. 11 - - When the goal is not exactly aligned, they can be harmful. E.g: Bitcoin wasn't created to maximize the energy consumption. 12 - - Allow Community Feedback Mechanisms. Implement robust feedback systems that allow participants to report and address concerns about the integrity of the metrics or behaviors in the community. This feedback can be used to refine and improve the system continuously. 10 + - Impact Evaluators work well on concrete things that you can turn into measurable stuff. 11 + - They are powerful things and will overfit. When the goal is not well aligned, they can be harmful. E.g: Bitcoin wasn't created to maximize the energy consumption. An Impact Evaluators might become an Externalities Maximizers. 12 + - **Community Feedback Mechanism**. Implement robust feedback systems that allow participants to report and address concerns about the integrity of the metrics or behaviors in the community. Use the feedback to refine and improve the system. 13 13 - Designing IEs has the side effect of making impact more legible, decomposed into specific properties, which can be represented by specific metrics. 14 14 - Something like l2beat as a leaderboard 15 15 - IEs should [make "making the next L2beat" a permissionless process](https://vitalik.eth.limo/general/2024/09/28/alignment.html) for the space. Independent entities should arise to evaluate how projects met the IE criteria. 16 16 - Do more to make different aspects of alignment legible, while not centralizing in one single "watcher", we can make the concept much more effective, and fair and inclusive in the way that the Ethereum ecosystem strives to be. 17 - - Impact Evaluators need to be (permissionless) forkable. Anyone could setup a fork with their own pool? 17 + - Impact Evaluators need to be (permissionless) forkable. 18 + - Anyone should be able to [fork the evaluation system with their own criteria](https://vitalik.eth.limo/general/2024/09/28/alignment.html), preventing capture and enabling experimentation. 18 19 - Start local and a small community and grow from there. 19 20 - Impact evaluation should be done by the community at the local level. They should define their own metrics and evaluation criteria. 20 21 - IEs should start small (community) and simple. Iterate as fast as possible with a learning feedback loop (there isn't a community one in deepfunding)! ··· 58 59 - Power asymmetries - Large suppliers face millions of individual consumers with take-it-or-leave-it contracts 59 60 - Externalities - Individual flourishing depends on community wellbeing, but contracts remain individualized 60 61 - Information asymmetries - Suppliers control the metrics and optimize for growth rather than user outcomes 62 + - When collecting data, [pairwise comparisons and rankings are more reliable than absolute scoring](https://anishathalye.com/designing-a-better-judging-system/). Humans excel at relative judgments, but struggle with absolute judgments. 61 63 - Legibility is key. E.g: How much did X contribute to Ethereum? The goal is to transform alignment from an exclusive social game into a merit-based system with clear, measurable criteria. 62 64 - Support organizations like L2beat to track project alignment 63 65 - Let projects compete on measurable criteria rather than connections ··· 80 82 - Privacy pools that exclude provably malicious actors 81 83 - Multiple independent "dashboard organizations" preventing capture 82 84 - They should be flexible as it's hard to predict ways the evaluation metrics will be gamed. 85 + - [The simpler a mechanism, the less space for hidden privilege](https://vitalik.eth.limo/general/2020/09/11/coordination.html). Fewer parameters mean more resistance to corruption and overfit and more people engaging 86 + - Demonstrably fair and impartial to all participants (open source and publicly verifiable execution), with no hidden biases or privileged interests 87 + - Don't write specific people or outcomes into the mechanism (e.g: using multiple accounts) 83 88 - An allocation mechanism can be seen as a measurement process, with the goal being the reduction of uncertainty concerning present beliefs about the future. An effective process will gather and leverage as much information as possible while maximizing the signal-to-noise ratio of that information — aims which are often at odds. 84 89 - In the digital world, we can apply several techniques to the same input and evaluate the potential impacts. E.g: Simulate different voting systems and see which one fits the best with the current views. This is a case for the system to have a final mechanism that acts as a layer for human to express preferences. 85 90 - [Every community and institutions wants to see a better, more responsive and dynamic provision of public goods within them, usually lack information about which goods have the greatest value and know quite a bit about social structure internally which would allow them to police the way GitCoin has in the domains it knows](https://gov.gitcoin.co/t/a-vision-for-a-pluralistic-civilizational-scale-infrastructure-for-funding-public-goods/9503/11). 86 91 - IE's helps a community with more data and information to make better decisions. 87 92 - Open Data Platforms for the community to gather better data and make better decisions. 88 93 - Can open data be rewarded with an IE? What does a block reward mean there? 94 + - Legible Impact Attribution. Make contributions and their value visible. [Transform vague notions of "alignment" into measurable criteria](https://vitalik.eth.limo/general/2024/09/28/alignment.html) that projects can compete on. 89 95 - Seeing like a State blinds you to the realities that are complex. Need a way to evolve the metric to be anti-Goodhart's. 90 96 - Not even anti-goodharts. Research says the best thing to do is to give all money to vaccine distribution, ... 91 97 - **Embrace plurality over perfection**. [No single mechanism can satisfy all desirable properties](https://en.wikipedia.org/wiki/Arrow%27s_impossibility_theorem) (efficiency, fairness, incentive compatibility, budget balance). Different contexts need different trade-offs. ··· 117 123 ## Principles 118 124 119 125 - Retrospective Reward for Verifiable Impact 120 - - Judge work by its actual impact, not promised outcomes. This reduces speculation and gaming while rewarding genuine value creation 121 - - Legible Impact Attribution 122 - - Make contributions and their value visible. [Transform vague notions of "alignment" into measurable criteria](https://vitalik.eth.limo/general/2024/09/28/alignment.html) that projects can compete on 123 126 - Credible Neutrality Through Transparent and Simple Mechanisms 124 - - [The simpler a mechanism, the less space for hidden privilege](https://vitalik.eth.limo/general/2020/09/11/coordination.html). Fewer parameters mean more resistance to corruption and overfit and more people engaging 125 - - Demonstrably fair and impartial to all participants (open source and publicly verifiable execution), with no hidden biases or privileged interests 126 - - Don't write specific people or outcomes into the mechanism (multiple accounts) 127 127 - Comparative Truth-Seeking Over Absolute Metrics 128 - - [Pairwise comparisons and rankings are more reliable than absolute scoring](https://anishathalye.com/designing-a-better-judging-system/). Humans excel at relative judgments 129 128 - Anti-Goodhart Resilience 130 129 - Permissionless Scalability 131 - - Anyone should be able to [fork the evaluation system with their own criteria](https://vitalik.eth.limo/general/2024/09/28/alignment.html), preventing capture and enabling experimentation. 132 130 - Plurality-Aware Preference Aggregation 133 131 - Collusion-Resistant Architecture 134 132 - Credible Exit and Forkability ··· 145 143 - Machine Learning 146 144 - Voting Theory 147 145 - Process Control Theory 148 - - LLM Evals 146 + - Large Language Models Evaluation 149 147 150 148 ## Mechanism Toolkit 151 149 152 - - **Staking and slashing**. Require deposits that get burned for misbehavior. Simple but requires upfront capital. 153 - - **Pairwise comparison engines**. Convert human judgments into weights using [Elo ratings or Bradley-Terry models](https://www.keiruaprod.fr/blog/2021/06/02/elo-vs-bradley-terry-model.html). 154 - - **Unprovable-vote schemes (MACI)**. Use zero-knowledge and key-revocation games so ballots can't be sold or coerced. 150 + - **Staking and Slashing**. Require deposits that get burned for misbehavior. Simple but requires upfront capital. 151 + - **Pairwise Comparison Engines**. Convert human judgments into weights using [Elo ratings or Bradley-Terry models](https://www.keiruaprod.fr/blog/2021/06/02/elo-vs-bradley-terry-model.html). 152 + - **Unprovable Vote Schemes (MACI)**. Use zero-knowledge and key-revocation games so ballots can't be sold or coerced. 155 153 - **Collusion-safe games**. Rely on identity-free incentives (PoW-like) or security-deposit futarchy where bad coordination is personally costly. 156 - - **Robust unique-human identity**. Multifactor "proof-of-personhood" that cannot be credibly rented, blocking sybil farms and bribe pools. 157 - - **Fork-and-exit rights**. Make systems easy to split so minority users can counter-coordinate against cartels. 158 - - **Quadratic mechanisms**. [Funding](https://vitalik.eth.limo/general/2019/12/07/quadratic.html) and voting that make influence proportional to square root of resources, reducing plutocracy. 159 - - **Prediction and decision markets (futarchy)**. ["Vote values, bet beliefs"](https://medium.com/ethereum-optimism/retroactive-public-goods-funding-33c9b7d00f0c) - conditional markets choose policies that maximize agreed-upon metrics. 160 - - **Distilled-human-judgement markets**. A jury scores a small sample, open AI/human traders supply full answers, rewards fit; scales expertise cheaply. 161 - - **Deep funding credit-graphs**. Donations or protocol issuance auto-flow along weighted dependency edges set by the mechanism, rewarding upstream contributors. 154 + - **Fork-and-exit**. Make systems easy to split so minority users can counter-coordinate against cartels. 155 + - **Quadratic Mechanisms**. [Funding](https://vitalik.eth.limo/general/2019/12/07/quadratic.html) and voting that make influence proportional to square root of resources, reducing plutocracy. 156 + - **Prediction and Decision Markets (Futarchy)**. ["Vote values, bet beliefs"](https://medium.com/ethereum-optimism/retroactive-public-goods-funding-33c9b7d00f0c) - conditional markets choose policies that maximize agreed-upon metrics. 157 + - **Distilled-Human-Judgement Markets**. A jury scores a small sample, open AI/human traders supply full answers, rewards fit; scales expertise cheaply. 162 158 - **Engine-and-steering-wheel pattern**. Open competition of AI "engines" acts under a simple, credibly-neutral rule-set set and audited/reinforced by humans. 163 159 - **Research Augmented Bonding Curves (ABCs) / Curation Markets**. Automated market makers that route fees to upstream dependencies based on usage. 164 - - **Information-elicitation without verification**. [Peer-prediction mechanisms](https://jonathanwarden.com/information-elicitation-mechanisms/), [Bayesian Truth Serum](https://www.science.org/doi/10.1126/science.1102081), and other techniques to get truthful data from subjective evaluation. 165 - - **Token-curated registries (TCRs)**. Stakeholders deposit tokens to curate lists; challengers and voters decide on inclusions, with slashing/redistribution to discourage bad entries. 160 + - **Information-Elicitation without Verification**. [Peer-prediction mechanisms](https://jonathanwarden.com/information-elicitation-mechanisms/), [Bayesian Truth Serum](https://www.science.org/doi/10.1126/science.1102081), and other techniques to get truthful data from subjective evaluation. 161 + - **Token-Curated Registries (TCRs)**. Stakeholders deposit tokens to curate lists; challengers and voters decide on inclusions, with slashing/redistribution to discourage bad entries. 166 162 - **Deliberative protocols**. [Structured discussion processes](https://jonathanwarden.com/deliberative-consensus-protocols/) that surface information before voting. 163 + - **Harberger Taxes/COST (Common Ownership Self-assessed Tax)** - Entities self-assess value and pay tax on it, but must sell at that price if someone wants to buy. Useful for allocating scarce positions/rights in evaluation systems. 164 + - **Dominant Assurance Contracts** - Entrepreneur provides refund + bonus if funding threshold isn't met, solving the assurance problem in public goods funding more elegantly than traditional crowdfunding. 165 + - **Conviction Voting** - Preferences gain strength over time rather than snapshot voting. Voters continuously express preferences and conviction builds, reducing governance attacks. 166 + - **Retroactive Oracles** - Designated future evaluators whose preferences are predicted by current markets. Separates the "who decides" from "what they'll value" questions. 167 + - **Sortition/Random Selection** - Randomly selected evaluation committees from qualified pools. Reduces corruption and strategic behavior while maintaining statistical representativeness. 168 + - **Optimistic Mechanisms** - Actions are allowed by default but can be challenged within a time window. Reduces friction for honest actors while maintaining security. 169 + - **Vickrey-Clarke-Groves (VCG) Mechanisms** - Generalized truthful mechanisms where participants pay the externality they impose on others. Could be adapted for impact evaluation. 170 + - **Streaming/Continuous Funding** - Instead of discrete rounds, continuous flows based on current evaluation state. Reduces volatility and gaming of evaluation periods. 171 + - **Liquid Democracy** - Delegation of evaluation power to trusted experts, revocable at any time. Balances expertise with democratic control. 172 + - **Threshold Cryptography/Secret Sharing** - For private evaluation scores that only become public when aggregated. Prevents anchoring and collusion during evaluation. 173 + - **Augmented Bonding Curves with Vesting** - Time-locked rewards that vest based on continued positive evaluation over time, aligning long-term incentives 167 174 168 175 ## Ideas 169 176

Configure Feed

Configure Feed