# Pascal’s Muggle: Infinitesimal Priors and Strong Evidence

**Followup to:** Pascal’s Mugging: Tiny Probabilities of Vast Utilities, The Pascal’s Wager Fallacy Fallacy, Being Half-Rational About Pascal’s Wager Is Even Worse

**Short form: **Pascal’s Muggle

*tl;dr: If you assign superexponentially infinitesimal probability to claims of large impacts, then apparently you should ignore the possibility of a large impact even after seeing huge amounts of evidence. If a poorly-dressed street person offers to save 10 ^{(10^100)} lives (googolplex lives) for $5 using their Matrix Lord powers, and you claim to assign this scenario less than 10^{-(10^100)} probability, then apparently you should continue to believe absolutely that their offer is bogus even after they snap their fingers and cause a giant silhouette of themselves to appear in the sky. For the same reason, any evidence you encounter showing that the human species could create a sufficiently large number of descendants—no matter how normal the corresponding laws of physics appear to be, or how well-designed the experiments which told you about them—must be rejected out of hand. There is a possible reply to this objection using Robin Hanson’s anthropic adjustment against the probability of large impacts, and in this case you will treat a Pascal’s Mugger as having decision-theoretic importance exactly proportional to the Bayesian strength of evidence they present you, without quantitative dependence on the number of lives they claim to save. This however corresponds to an odd mental state which some, such as myself, would find unsatisfactory. In the end, however, I cannot see any better candidate for a prior than having a leverage penalty plus a complexity penalty on the prior probability of scenarios.*

In late 2007 I coined the term “Pascal’s Mugging” to describe a problem which seemed to me to arise when combining conventional decision theory and conventional epistemology in the obvious way. On conventional epistemology, the prior probability of hypotheses diminishes exponentially with their complexity; if it would take 20 bits to specify a hypothesis, then its prior probability receives a 2^{-20} penalty factor and it will require evidence with a likelihood ratio of 1,048,576:1 - evidence which we are 1048576 times more likely to see if the theory is true, than if it is false—to make us assign it around 50-50 credibility. (This isn’t as hard as it sounds. Flip a coin 20 times and note down the exact sequence of heads and tails. You now believe in a state of affairs you would have assigned a million-to-one probability beforehand—namely, that the coin would produce the exact sequence HTHHHHTHTTH… or whatever—after experiencing sensory data which are more than a million times more probable if that fact is true than if it is false.) The problem is that although this kind of prior probability penalty may seem very strict at first, it’s easy to construct physical scenarios that grow in size vastly faster than they grow in complexity.

I originally illustrated this using Pascal’s Mugger: A poorly dressed street person says “I’m actually a Matrix Lord running this world as a computer simulation, along with many others—the universe above this one has laws of physics which allow me easy access to vast amounts of computing power. Just for fun, I’ll make you an offer—you give me five dollars, and I’ll use my Matrix Lord powers to save 3↑↑↑↑3 people inside my simulations from dying and let them live long and happy lives” where ↑ is Knuth’s up-arrow notation. This was originally posted in 2007, when I was a bit more naive about what kind of mathematical notation you can throw into a random blog post without creating a stumbling block. (E.g.: On several occasions now, I’ve seen someone on the Internet approximate the number of dust specks from this scenario as being a “billion”, since any incomprehensibly large number equals a billion.) Let’s try an easier (and *way *smaller) number instead, and suppose that Pascal’s Mugger offers to save a googolplex lives, where a googol is 10^{100} (a 1 followed by a hundred zeroes) and a googolplex is 10 to the googol power, so 10^{10100} or 10^{10,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000} lives saved if you pay Pascal’s Mugger five dollars, if the offer is honest.

If Pascal’s Mugger had only offered to save a mere *googol *lives (10^{100}), we could perhaps reply that although the notion of a Matrix Lord may sound simple to say in English, if we actually try to imagine all the machinery involved, it works out to a substantial amount of computational complexity. (Similarly, Thor is a worse explanation for lightning bolts than the laws of physics because, among other points, an anthropomorphic deity is more complex than calculus in *formal* terms—it would take a larger computer program to simulate Thor as a complete mind, than to simulate Maxwell’s Equations—even though in mere human words Thor sounds much easier to explain.) To imagine this scenario in formal detail, we might have to write out the laws of the higher universe the Mugger supposedly comes from, the Matrix Lord’s state of mind leading them to make that offer, and so on. And so (we reply) when mere verbal English has been translated into a formal hypothesis, the Kolmogorov complexity of this hypothesis is more than 332 bits—it would take more than 332 ones and zeroes to specify—where 2^{-332} ~ 10^{-100}. Therefore (we conclude) the net expected value of the Mugger’s offer is still tiny, once its prior improbability is taken into account.

But once Pascal’s Mugger offers to save a *googolplex* lives—offers us a scenario whose value is constructed by twice-repeated exponentiation—we seem to run into some difficulty using this answer. Can we really claim that the complexity of this scenario is on the order of a googol bits—that to formally write out the hypothesis would take one hundred billion billion times more bits than there are atoms in the observable universe?

And a tiny, paltry number like a googolplex is only the beginning of computationally simple numbers that are unimaginably huge. Exponentiation is defined as repeated multiplication: If you see a number like 3^{5}, it tells you to multiply five 3s together: 3×3×3×3×3 = 243. Suppose we write 3^{5} as 3↑5, so that a single arrow ↑ stands for exponentiation, and let the double arrow ↑↑ stand for repeated exponentation, or *tetration*. Thus 3↑↑3 would stand for 3↑(3↑3) or 3^{33} = 3^{27} = 7,625,597,484,987. Tetration is also written as follows: ^{3}3 = 3↑↑3. Thus ^{4}2 = 2^{222} = 2^{24} = 2^{16} = 65,536. Then pentation, or repeated tetration, would be written with 3↑↑↑3 = ^{33}3 = ^{7,625,597,484,987}3 = 3^{3...3} where the … summarizes an exponential tower of 3s seven trillion layers high.

But 3↑↑↑3 is still quite simple computationally—we could describe a small Turing machine which computes it—so a hypothesis involving 3↑↑↑3 should not therefore get a large complexity penalty, if we’re penalizing hypotheses by algorithmic complexity.

I had originally intended the scenario of Pascal’s Mugging to point up what seemed like a basic problem with combining conventional epistemology with conventional decision theory: Conventional epistemology says to penalize hypotheses by an exponential factor of computational complexity. This seems pretty strict in everyday life: “What? for a mere 20 bits I am to be called a million times less probable?” But for stranger hypotheses about things like Matrix Lords, the size of the hypothetical universe can blow up enormously faster than the exponential of its complexity. This would mean that all our decisions were dominated by tiny-seeming probabilities (on the order of 2^{-100} and less) of scenarios where our lightest action affected 3↑↑4 people… which would *in turn* be dominated by even *more* remote probabilities of affecting 3↑↑5 people...

This problem is worse than just giving five dollars to Pascal’s Mugger—our expected utilities don’t converge at all! Conventional epistemology tells us to sum over the predictions of all hypotheses weighted by their computational complexity and evidential fit. This works fine with *epistemic* probabilities and sensory predictions because no hypothesis can predict more than probability 1 or less than probability 0 for a sensory experience. As hypotheses get more and more complex, their contributed predictions have tinier and tinier weights, and the sum converges quickly. But decision theory tells us to calculate expected *utility* by summing the utility of each possible outcome, times the probability of that outcome conditional on our action. If hypothetical utilities can grow faster than hypothetical probability diminishes, the contribution of an average term in the series will keep increasing, and this sum will never converge—not if we try to do it the same way we got our epistemic predictions, by summing over complexity-weighted possibilities. (See also this similar-but-different paper by Peter de Blanc.)

Unfortunately I failed to make it clear in my original writeup that this was where the problem came from, and that it was general to situations beyond the Mugger. Nick Bostrom’s writeup of Pascal’s Mugging for a philosophy journal used a Mugger offering a quintillion days of happiness, where a quintillion is merely 1,000,000,000,000,000,000 = 10^{18}. It takes at least two exponentiations to outrun a singly-exponential complexity penalty. I would be willing to assign a probability of less than 1 in 10^{18} to a random person being a Matrix Lord. You may not have to invoke 3↑↑↑3 to cause problems, but you’ve got to use something like 10^{10100} - double exponentiation or better. Manipulating ordinary hypotheses about the ordinary physical universe taken at face value, which just contains 10^{80} atoms within range of our telescopes, should not lead us into such difficulties.

(And then the phrase “Pascal’s Mugging” got *completely* bastardized to refer to an emotional feeling of being mugged that some people apparently get when a high-stakes charitable proposition is presented to them, *regardless of whether it’s supposed to have a low probability.* This is enough to make me regret having ever invented the term “Pascal’s Mugging” in the first place; and for further thoughts on this see The Pascal’s Wager Fallacy Fallacy (just because the stakes are high does not mean the probabilities are low, and Pascal’s Wager is fallacious because of the low probability, not the high stakes!) and Being Half-Rational About Pascal’s Wager Is Even Worse. Again, when dealing with issues the mere size of the apparent universe, on the order of 10^{80} - for *small* large numbers—we do *not* run into the sort of decision-theoretic problems I originally meant to single out by the concept of “Pascal’s Mugging”. My rough intuitive stance on x-risk charity is that if you are one of the tiny fraction of all sentient beings who happened to be born here on Earth before the intelligence explosion, when the existence of the whole vast intergalactic future depends on what we do now, you should expect to find yourself surrounded by a *smorgasbord *of opportunities to affect small large numbers of sentient beings. There is then no reason to worry about tiny probabilities of having a large impact when we can expect to find medium-sized opportunities of having a large impact, so long as we restrict ourselves to impacts no larger than the size of the known universe.)

One proposal which has been floated for dealing with Pascal’s Mugger in the decision-theoretic sense is to penalize hypotheses that let you affect a large number of people, in proportion to the number of people affected—what we could call perhaps a “leverage penalty” instead of a “complexity penalty”.

Unfortunately this potentially leads us into a different problem, that of *Pascal’s Muggle.*

Suppose a poorly-dressed street person asks you for five dollars in exchange for doing a googolplex’s worth of good using his Matrix Lord powers.

“Well,” you reply, “I think it very improbable that I would be able to affect so many people through my own, personal actions—who am I to have such a great impact upon events? Indeed, I think the probability is somewhere around one over googolplex, maybe a bit less. So no, I won’t pay five dollars—it is unthinkably improbable that I could do so much good!”

“I see,” says the Mugger.

A wind begins to blow about the alley, whipping the Mugger’s loose clothes about him as they shift from ill-fitting shirt and jeans into robes of infinite blackness, within whose depths tiny galaxies and stranger things seem to twinkle. In the sky above, a gap edged by blue fire opens with a horrendous tearing sound—you can hear people on the nearby street yelling in sudden shock and terror, implying that they can see it too—and displays the image of the Mugger himself, wearing the same robes that now adorn his body, seated before a keyboard and a monitor.

“That’s not actually me,” the Mugger says, “just a conceptual representation, but I don’t want to drive you insane. Now give me those five dollars, and I’ll save a googolplex lives, just as promised. It’s easy enough for me, given the computing power my home universe offers. As for why I’m doing this, there’s an ancient debate in philosophy among my people—something about how we ought to sum our expected utilities—and I mean to use the video of this event to make a point at the next decision theory conference I attend. Now will you give me the five dollars, or not?”

“Mm… no,” you reply.

“*No?*” says the Mugger. “I understood earlier when you didn’t want to give a random street person five dollars based on a wild story with no evidence behind it. But now I’ve offered you evidence.”

“Unfortunately, you haven’t offered me *enough* evidence,” you explain.

“Really?” says the Mugger. “I’ve opened up a fiery portal in the sky, and that’s not enough to persuade you? What do I have to do, then? Rearrange the planets in your solar system, and wait for the observatories to confirm the fact? I suppose I could also explain the true laws of physics in the higher universe in more detail, and let you play around a bit with the computer program that encodes all the universes containing the googolplex people I would save if you gave me the five dollars—”

“Sorry,” you say, shaking your head firmly, “there’s just no *way* you can convince me that I’m in a position to affect a googolplex people, because the prior probability of that is one over googolplex. If you wanted to convince me of some fact of merely 2^{-100 }prior probability, a mere decillion to one—like that a coin would come up heads and tails in some particular pattern of a hundred coinflips—then you could just show me 100 bits of evidence, which is within easy reach of my brain’s sensory bandwidth. I mean, you could just flip the coin a hundred times, and my eyes, which send my brain a hundred megabits a second or so—though that gets processed down to one megabit or so by the time it goes through the lateral geniculate nucleus—would easily give me enough data to conclude that this decillion-to-one possibility was true. But to conclude something whose prior probability is on the order of one over googolplex, I need on the order of a googol bits of evidence, and you can’t present me with a sensory experience containing a googol bits. Indeed, you can’t *ever* present a mortal like me with evidence that has a likelihood ratio of a googolplex to one—evidence I’m a googolplex times more likely to encounter if the hypothesis is true, than if it’s false—because the chance of all my neurons spontaneously rearranging themselves to fake the same evidence would always be higher than one over googolplex. You know the old saying about how once you assign something probability one, or probability zero, you can never change your mind regardless of what evidence you see? Well, odds of a googolplex to one, or one to a googolplex, work pretty much the same way.”

“So no matter what evidence I show you,” the Mugger says—as the blue fire goes on crackling in the torn sky above, and screams and desperate prayers continue from the street beyond—“you can’t ever notice that you’re in a position to help a googolplex people.”

“Right!” you say. “I can believe that you’re a Matrix Lord. I mean, I’m not a *total* Muggle, I’m psychologically capable of responding in *some* fashion to that giant hole in the sky. But it’s just completely forbidden for me to assign any significant probability whatsoever that you will actually save a googolplex people after I give you five dollars. You’re lying, and I am absolutely, absolutely, absolutely confident of that.”

“So you weren’t *just* invoking the leverage penalty as a plausible-sounding way of getting out of paying me the five dollars earlier,” the Mugger says thoughtfully. “I mean, I’d understand if that was just a rationalization of your discomfort at forking over five dollars for what seemed like a tiny probability, when I hadn’t done my duty to present you with a corresponding amount of evidence before demanding payment. But you… you’re acting like an AI would if it was actually programmed with a leverage penalty on hypotheses!”

“Exactly,” you say. “I’m forbidden *a priori* to believe I can ever do that much good.”

“Why?” the Mugger says curiously. “I mean, all I have to do is press this button here and a googolplex lives will be saved.” The figure within the blazing portal above points to a green button on the console before it.

“Like I said,” you explain again, “the prior probability is just too infinitesimal for the massive evidence you’re showing me to overcome it—”

The Mugger shrugs, and vanishes in a puff of purple mist.

The portal in the sky above closes, taking with the console and the green button.

(The screams go on from the street outside.)

A few days later, you’re sitting in your office at the physics institute where you work, when one of your colleagues bursts in through your door, seeming highly excited. “I’ve got it!” she cries. “I’ve figured out that whole dark energy thing! Look, these simple equations retrodict it exactly, there’s no way that could be a coincidence!”

At first you’re also excited, but as you pore over the equations, your face configures itself into a frown. “No...” you say slowly. “These equations may look extremely simple so far as computational complexity goes—and they do exactly fit the petabytes of evidence our telescopes have gathered so far—but I’m afraid they’re far too improbable to ever believe.”

“What?” she says. “Why?”

“Well,” you say reasonably, “if these equations are actually true, then our descendants will be able to exploit dark energy to do computations, and according to my back-of-the-envelope calculations here, we’d be able to create around a googolplex people that way. But that would mean that we, here on Earth, are in a position to affect a googolplex people—since, if we blow ourselves up via a nanotechnological war or *(cough) *make certain other errors, those googolplex people will never come into existence. The prior probability of us being in a position to impact a googolplex people is on the order of one over googolplex, so your equations must be wrong.”

“Hmm...” she says. “I hadn’t thought of that. But what if these equations are right, and yet somehow, everything I do is exactly balanced, down to the googolth decimal point or so, with respect to how it impacts the chance of modern-day Earth participating in a chain of events that leads to creating an intergalactic civilization?”

“How would *that* work?” you say. “There’s only seven billion people on today’s Earth—there’s probably been only a hundred billion people who ever existed total, or will exist before we go through the intelligence explosion or whatever—so even before analyzing your exact position, it seems like your leverage on future affairs couldn’t reasonably be less than a one in ten trillion part of the future or so.”

“But then given this physical theory which seems obviously true, my acts might imply expected utility differentials on the order of 10^{10100}^{-13},” she explains, “and I’m not allowed to believe that no matter how much evidence you show me.”

This problem may not be as bad as it looks; with some further reasoning, the leverage penalty may lead to more sensible behavior than depicted above.

Robin Hanson has suggested that the logic of a leverage penalty should stem from the general improbability of individuals being in a *unique* position to affect many others (which is why I called it a leverage penalty). At most 10 out of 3↑↑↑3 people can ever be in a position to be “solely responsible” for the fate of 3↑↑↑3 people if “solely responsible” is taken to imply a causal chain that goes through no more than 10 people’s decisions; i.e. at most 10 people can ever be solely_{10} responsible for any given event. Or if “fate” is taken to be a sufficiently ultimate fate that there’s at most 10 other decisions of similar magnitude that could cumulate to determine someone’s outcome utility to within ±50%, then any given person could have their fate_{10} determined on at most 10 occasions. We would surely agree, while assigning priors at the dawn of reasoning, that an agent randomly selected from the pool of all agents in Reality has at most a 100/X chance of being able to be solely_{10} responsible for the fate_{10} of X people. Any reasoning we do about universes, their complexity, sensory experiences, and so on, should maintain this net balance. You can even strip out the part about agents and carry out the reasoning on pure causal nodes; the chance of a randomly selected causal node being in a unique_{100} position on a causal graph with respect to 3↑↑↑3 other nodes ought to be at most 100/3↑↑↑3 for finite causal graphs. (As for infinite causal graphs, well, if problems arise *only* when introducing infinity, maybe it’s infinity that has the problem.)

Suppose we apply the Hansonian leverage penalty to the face-value scenario of our own universe, in which there are apparently no aliens and the galaxies we can reach in the future contain on the order of 10^{80} atoms; which, if the intelligence explosion goes well, might be transformed into on the very loose order of… let’s ignore a lot of intermediate calculations and just call it the equivalent of 10^{80} centuries of life. (The neurons in your brain perform lots of operations; you don’t get only one computing operation per element, because you’re powered by the Sun over time. The universe contains a lot more negentropy than just 10^{80} bits due to things like the gravitational potential energy that can be extracted from mass. Plus we should take into account reversible computing. But of course it also takes more than one computing operation to implement a century of life. So I’m just going to xerox the number 10^{80} for use in these calculations, since it’s not supposed to be the main focus.)

Wouldn’t it be terribly odd to find ourselves—where by ‘ourselves’ I mean the hundred billion humans who have ever lived on Earth, for no more than a century or so apiece—solely_{100,000,000,000} responsible for the fate_{10} of around 10^{80} units of life? Isn’t the prior probability of this somewhere around 10^{-68}?

Yes, according to the leverage penalty. But a prior probability of 10^{-68} is not an insurmountable epistemological barrier. If you’re taking things at face value, 10^{-68} is just 226 bits of evidence or thereabouts, and your eyes are sending you a megabit per second. Becoming convinced that *you,* yes *you* are an Earthling is epistemically doable; you just need to see a stream of sensory experiences which is 10^{68} times more probable if you are an Earthling than if you are someone else. If we take everything at face value, then there could be around 10^{80} centuries of life over the history of the universe, and only 10^{11} of those centuries will be lived by creatures who discover themselves occupying organic bodies. Taking everything at face value, the sensory experiences of your life are unique to Earthlings and should immediately convince you that you’re an Earthling—just looking around the room you occupy will provide you with sensory experiences that plausibly belong to only 10^{11} out of 10^{80} life-centuries.

If we *don’t* take everything at face value, then there might be such things as ancestor simulations, and it might be that your experience of looking around the room is something that happens in 10^{20} ancestor simulations for every time that it happens in ‘base level’ reality. In this case your probable leverage on the future is diluted (though it may be large even post-dilution). But this is not something that the Hansonian leverage penalty *forces *you to believe—not when the putative stakes are still as small as 10^{80}. Conceptually, the Hansonian leverage penalty doesn’t interact much with the Simulation Hypothesis (SH) at all. If you don’t believe SH, then you think that the experiences of creatures like yours are rare in the universe and hence present strong, convincing evidence for you occupying the leverage-privileged position of an Earthling—much stronger evidence than its prior improbability. (There’s some separate anthropic issues here about whether or not this is *itself* evidence for SH, but I don’t think that question is intrinsic to leverage penalties per se.)

A key point here is that even if you accept a Hanson-style leverage penalty, it doesn’t have to manifest as an inescapable commandment of modesty. You need not refuse to believe (in your deep and irrevocable humility) that you could be someone as special as an Ancient Earthling. Even if Earthlings matter in the universe—even if we occupy a unique position to affect the future of galaxies—it is still possible to encounter pretty convincing evidence that you’re an Earthling. Universes the size of 10^{80} do not pose problems to conventional decision-theoretic reasoning, or to conventional epistemology.

Things play out similarly if—still taking everything at face value—you’re wondering about the chance that you could be special even for an Earthling, because you might be one of say 10^{4} people in the history of the universe who contribute a major amount to an x-risk reduction project which ends up actually saving the galaxies. The vast majority of the improbability here is just in being an Earthling in the first place! Thus most of the clever arguments for not taking this high-impact possibility at face value would also tell you not to take being an Earthling at face value, since Earthlings as a whole are much more unique within the total temporal history of the universe than you are supposing yourself to be unique among Earthlings. But given ¬SH, the prior improbability of being an Earthling can be overcome by a few megabits of sensory experience from looking around the room and querying your memories—it’s not like 10^{80} is enough future beings that the number of agents randomly hallucinating similar experiences outweighs the number of real Earthlings. Similarly, if you don’t think lots of Earthlings are hallucinating the experience of going to a donation page and clicking on the Paypal button for an x-risk charity, that sensory experience can easily serve to distinguish you as one of 10^{4} people donating to an x-risk philanthropy.

Yes, there are various clever-sounding lines of argument which involve *not* taking things at face value—“Ah, but maybe you should consider yourself as an indistinguishable part of this here large reference class of deluded people who think they’re important.” Which I consider to be a bad idea because it renders you a permanent Muggle by putting you into an inescapable reference class of self-deluded people and then dismissing all your further thoughts as insufficient evidence because you *could* just be deluding yourself further about whether these are good arguments. Nor do I believe the world can only be saved by good people who are incapable of distinguishing themselves from a large class of crackpots, all of whom have no choice but to continue based on the tiny probability that they are not crackpots. (For more on this see Being Half-Rational About Pascal’s Wager Is Even Worse.) In this case you are a Pascal’s Muggle not because you’ve explicitly assigned a probability like one over googolplex, but because you took an improbability like 10^{-6} at unquestioning face value and then cleverly questioned all the evidence which could’ve overcome that prior improbability, and so, in practice, you can never climb out of the epistemological sinkhole. By the same token, you should conclude that you are just self-deluded about being an Earthling since real Earthlings are so rare and privileged in their leverage.

In general, leverage penalties don’t translate into advice about modesty or that you’re just deluding yourself—they just say that to be rationally coherent, your picture of the universe has to imply that your sensory experiences are at least as rare as the corresponding magnitude of your leverage.

Which brings us back to Pascal’s Mugger, in the original alleyway version. The Hansonian leverage penalty seems to imply that to be coherent, *either* you believe that your sensory experiences are *really actually* 1 in a googolplex—that only 1 in a googolplex beings experiences what you’re experiencing—or else you really *can’t *take the situation at face value.

Suppose the Mugger is telling the truth, and a googolplex other people are being simulated. Then there are at least a googolplex people in the universe. Perhaps some of them are hallucinating a situation similar to this one by sheer chance? Rather than telling you flatly that you can’t have a large impact, the Hansonian leverage penalty implies a coherence requirement on how uniquely you think your sensory experiences identify the position you believe yourself to occupy. When it comes to believing you’re one of 10^{11} Earthlings who can impact 10^{80} other life-centuries, you need to think your sensory experiences are unique to Earthlings—identify Earthlings with a likelihood ratio on the order of 10^{69}. This is quite achievable, if we take the evidence at face value. But when it comes to improbability on the order of 1/3↑↑↑3, the prior improbability *is *inescapable—your sensory experiences *can’t *possibly be that unique—which is assumed to be appropriate because almost-everyone who ever believes they’ll be in a position to help 3↑↑↑3 people *will in fact* be hallucinating. Boltzmann brains should be much more common than people in a unique position to affect 3↑↑↑3 others, at least if the causal graphs are finite.

Furthermore—although I didn’t realize this part until recently—applying Bayesian updates from that starting point may partially avert the Pascal’s Muggle effect:

Mugger: “Give me five dollars, and I’ll save 3↑↑↑3 lives using my Matrix Powers.”

You: “Nope.”

Mugger: “Why not? It’s a really large impact.”

You: “Yes, and I assign a probability on the order of 1 in 3↑↑↑3 that I would be in a unique position to affect 3↑↑↑3 people.”

Mugger: “Oh, is that really the probability that you assign? Behold!”

*(A gap opens in the sky, edged with blue fire.)*

Mugger: “Now what do you think, eh?”

You: “Well… I can’t actually say this observation has a likelihood ratio of 3↑↑↑3 to 1. No stream of evidence that can enter a human brain over the course of a century is ever going to have a likelihood ratio larger than, say, 10^{1026} to 1 at the *absurdly most, *assuming one megabit per second of sensory data, for a century, each bit of which has at least a 1-in-a-trillion error probability. I’d probably start to be dominated by Boltzmann brains or other exotic minds well before then.”

Mugger: “So you’re not convinced.”

You: “Indeed not. The probability that you’re telling the truth is so tiny that God couldn’t find it with an electron microscope. Here’s the five dollars.”

Mugger: “Done! You’ve saved 3↑↑↑3 lives! Congratulations, you’re never going to top that, your peak life accomplishment will now always lie in your past. But why’d you give me the five dollars if you think I’m lying?”

You: “Well, because the evidence you *did* present me with had a likelihood ratio of at least a billion to one—I would’ve assigned less than 10^{-9} prior probability of seeing this when I woke up this morning—so in accordance with Bayes’s Theorem I promoted the probability from 1/3↑↑↑3 to at least 10^{9}/3↑↑↑3, which when multiplied by an impact of 3↑↑↑3, yields an expected value of at least a billion lives saved for giving you five dollars.”

I confess that I find this line of reasoning a bit suspicious—it seems overly clever. But on the level of intuitive virtues of rationality, it does seem less stupid than the original Pascal’s Muggle; this muggee is at least *behaviorally *reacting to the evidence. In fact, they’re reacting in a way exactly proportional to the evidence—they would’ve assigned the same net importance to handing over the five dollars if the Mugger had offered 3↑↑↑4 lives, so long as the strength of the evidence seemed the same.

(Anyone who tries to apply the lessons here to actual x-risk reduction charities (which I think is probably a bad idea), keep in mind that the vast majority of the improbable-position-of-leverage in any x-risk reduction effort comes from being an Earthling in a position to affect the future of a hundred billion galaxies, and that sensory evidence for being an Earthling is what gives you most of your belief that your actions can have an outsized impact.)

So why not just run with this—why not just declare the decision-theoretic problem resolved, if we have a rule that seems to give reasonable behavioral answers in practice? Why not just go ahead and program that rule into an AI?

Well… I still feel a bit nervous about the idea that Pascal’s Muggee, after the sky splits open, is handing over five dollars while claiming to assign probability on the order of 10^{9}/3↑↑↑3 that it’s doing any good.

I think that my own reaction in a similar situation would be along these lines instead:

Mugger: “Give me five dollars, and I’ll save 3↑↑↑3 lives using my Matrix Powers.”

Me: “Nope.”

Mugger: “So then, you think the probability I’m telling the truth is on the order of 1/3↑↑↑3?”

Me: “Yeah… that probably *has *to follow. I don’t see any way around that revealed belief, given that I’m not actually giving you the five dollars. I’ve heard some people try to claim silly things like, the probability that you’re telling the truth is counterbalanced by the probability that you’ll kill 3↑↑↑3 people instead, or something else with a conveniently equal and opposite utility. But there’s no way that things would balance out *exactly* in practice, if there was no *a priori* mathematical requirement that they balance. Even if the prior probability of your saving 3↑↑↑3 people and killing 3↑↑↑3 people, conditional on my giving you five dollars, *exactly *balanced down to the log(3↑↑↑3) decimal place, the likelihood ratio for your telling me that you would “save” 3↑↑↑3 people would not be exactly 1:1 for the two hypotheses down to the log(3↑↑↑3) decimal place. So if I assigned probabilities much greater than 1/3↑↑↑3 to your doing something that affected 3↑↑↑3 people, my actions would be overwhelmingly dominated by even a tiny difference in likelihood ratio elevating the probability that you saved 3↑↑↑3 people over the probability that you did something bad to them. The only way this hypothesis can’t dominate my actions—really, the only way my expected utility sums can converge at all—is if I assign probability on the order of 1/3↑↑↑3 or less. I don’t see any way of escaping that part.”

Mugger: “But can you, in your mortal uncertainty, truly assign a probability as low as 1 in 3↑↑↑3 to any proposition whatever? Can you truly believe, with your error-prone neural brain, that you could make 3↑↑↑3 statements *of any kind *one after another, and be wrong, on average, about once?”

Me: “Nope.”

Mugger: “So give me five dollars!”

Me: “Nope.”

Mugger: “Why not?”

Me: “Because even though I, in my mortal uncertainty, will eventually be wrong about all sorts of things if I make enough statements one after another, this fact can’t be used to increase the probability of arbitrary statements beyond what my prior says they should be, because then my prior would sum to more than 1. There must be some kind of required condition for taking a hypothesis seriously enough to worry that I might be overconfident about it—”

Mugger: “Then behold!”

*(A gap opens in the sky, edged with blue fire.)*

Mugger: “Now what do you think, eh?”

Me *(staring up at the sky):* ”...whoa.” *(Pause.)* “You turned into a cat.”

Mugger: “What?”

Me: “Private joke. Okay, I think I’m going to have to rethink a *lot *of things. But if you want to tell me about how I was wrong to assign a prior probability on the order of 1/3↑↑↑3 to your scenario, I will shut up and listen very carefully to what you have to say about it. Oh, and here’s the five dollars, can I pay an extra twenty and make some other requests?”

*(The thought bubble pops, and we return to two people standing in an alley, the sky above perfectly normal.)*

Mugger: “Now, in this scenario we’ve just imagined, you were taking my case seriously, right? But the evidence there couldn’t have had a likelihood ratio of more than 10^{1026} to 1, and probably much less. So by the method of imaginary updates, you must assign probability at least 10^{-1026} to my scenario, which when multiplied by a benefit on the order of 3↑↑↑3, yields an unimaginable bonanza in exchange for just five dollars—”

Me: “Nope.”

Mugger: “How can you possibly say that? You’re not being logically coherent!”

Me: “I agree that I’m not being logically coherent, but I think that’s acceptable in this case.”

Mugger: “This ought to be good. Since when are rationalists allowed to deliberately be logically incoherent?”

Me: “Since we don’t have infinite computing power—”

Mugger: “That sounds like a fully general excuse if I ever heard one.”

Me: “No, this is a *specific* consequence of bounded computing power. Let me start with a simpler example. Suppose I believe in a set of mathematical axioms. Since I don’t have infinite computing power, I won’t be able to know all the deductive consequences of those axioms. And *that* means I will necessarily fall prey to the conjunction fallacy, in the sense that you’ll present me with a theorem X that is a deductive consequence of my axioms, but which I don’t know to be a deductive consequence of my axioms, and you’ll ask me to assign a probability to X, and I’ll assign it 50% probability or something. Then you present me with a brilliant lemma Y, which clearly seems like a likely consequence of my mathematical axioms, and which also seems to imply X—once I see Y, the connection from my axioms to X, via Y, becomes obvious. So I assign P(X&Y) = 90%, or something like that. Well, that’s the conjunction fallacy—I assigned P(X&Y) > P(X). The thing is, if you *then* ask me P(X), after I’ve seen Y, I’ll reply that P(X) is 91% or at any rate something higher than P(X&Y). I’ll have changed my mind about what my prior beliefs logically imply, because I’m not logically omniscient, even if that looks like assigning probabilities *over time* which are incoherent in the Bayesian sense.”

Mugger: “And how does this work out to my not getting five dollars?”

Me: “In the scenario you’re asking me to imagine, you present me with evidence which I currently think Just Plain Shouldn’t Happen. And if that actually *does* happen, the sensible way for me to react is by questioning my prior assumptions and the reasoning which led me assign such low probability. One way that I handle my lack of logical omniscience—my finite, error-prone reasoning capabilities—is by being willing to assign infinitesimal probabilities to non-privileged hypotheses so that my prior over all possibilities can sum to 1. But if I actually see strong evidence for something I previously thought was super-improbable, I don’t just do a Bayesian update, I should also question whether I was right to assign such a tiny probability in the first place—whether it was really as complex, or unnatural, as I thought. In real life, you are not ever supposed to have a prior improbability of 10^{-100} for some fact distinguished enough to be written down in advance, and yet encounter strong evidence, say 10^{10} to 1, that the thing has actually happened. If something like that happens, you don’t do a Bayesian update to a posterior of 10^{-90}. Instead you question both whether the evidence might be weaker than it seems, *and* whether your estimate of prior improbability might have been poorly calibrated, because rational agents who actually have well-calibrated priors should not encounter situations like that until they are ten billion days old. Now, this may mean that I end up doing some non-Bayesian updates: I say some hypothesis has a prior probability of a quadrillion to one, you show me evidence with a likelihood ratio of a billion to one, and I say ‘Guess I was wrong about that quadrillion to one thing’ rather than being a Muggle about it. And then I shut up and listen to what *you* have to say about how to estimate probabilities, because on my worldview, I wasn’t *expecting* to see you turn into a cat. But for me to make a super-update like that—reflecting a posterior belief that I was logically incorrect about the prior probability—you have to really actually show me the evidence, you can’t just ask me to imagine it. This is something that only logically incoherent agents ever say, but that’s all right because I’m not logically omniscient.”

At some point, we’re going to have to build some sort of actual prior into, you know, some sort of actual self-improving AI.

(Scary thought, right?)

So far as I can presently see, the logic requiring some sort of leverage penalty—not just so that we don’t pay $5 to Pascal’s Mugger, but also so that our expected utility sums converge at all—seems clear enough that I can’t yet see a good alternative to it (feel welcome to suggest one), and Robin Hanson’s rationale is by far the best I’ve heard.

In fact, what we actually need is more like a combined leverage-and-complexity penalty, to avoid scenarios like this:

Mugger: “Give me $5 and I’ll save 3↑↑↑3 people.”

You: “I assign probability exactly 1/3↑↑↑3 to that.”

Mugger: “So that’s one life saved for $5, on average. That’s a pretty good bargain, right?”

You: “Not by comparison with x-risk reduction charities. But I also like to do good on a smaller scale now and then. How about a penny? Would you be willing to save 3↑↑↑3/500 lives for a penny?”

Mugger: “Eh, fine.”

You: “Well, the probability of that is 500/3↑↑↑3, so here’s a penny!” *(Goes on way, whistling cheerfully.)*

Adding a complexity penalty *and* a leverage penalty is necessary, not just to avert this exact scenario, but so that we don’t get an infinite expected utility sum over a 1/3↑↑↑3 probability of saving 3↑↑↑3 lives, 1/(3↑↑↑3 + 1) probability of saving 3↑↑↑3 + 1 lives, and so on. If we combine the standard complexity penalty with a leverage penalty, the whole thing should converge.

Probability penalties are epistemic features—they affect what we believe, not just what we do. Maps, ideally, correspond to territories. Is there any territory that this complexity+leverage penalty can correspond to—any state of a single reality which would make these the true frequencies? Or is it only interpretable as pure uncertainty over realities, with there being no single reality that could correspond to it? To put it another way, the complexity penalty and the leverage penalty seem unrelated, so perhaps they’re mutually inconsistent; can we show that the union of these two theories has a model?

As near as I can figure, the corresponding state of affairs to a complexity+leverage prior improbability would be a Tegmark Level IV multiverse in which each reality got an amount of magical-reality-fluid corresponding to the complexity of its program (1/2 to the power of its Kolmogorov complexity) and then this magical-reality-fluid had to be *divided* among all the causal elements within that universe—if you contain 3↑↑↑3 causal nodes, then each node can only get 1/3↑↑↑3 of the total realness of that universe. (As always, the term “magical reality fluid” reflects an attempt to demarcate a philosophical area where I feel quite confused, and try to use correspondingly blatantly wrong terminology so that I do not mistake my reasoning about my confusion for a solution.) This setup is not entirely implausible because the Born probabilities in our own universe look like they might behave like this sort of magical-reality-fluid—quantum amplitude flowing between configurations in a way that preserves the total amount of realness while dividing it between worlds—and perhaps every other part of the multiverse must necessarily work the same way for some reason. It seems worth noting that part of what’s motivating this version of the ‘territory’ is that our sum over all real things, weighted by reality-fluid, can then converge. In other words, the reason why complexity+leverage works in decision theory is that the union of the two theories has a model in which the total multiverse contains an amount of reality-fluid that can sum to 1 rather than being infinite. (Though we need to suppose that either (a) only programs with a finite number of causal nodes exist, or (2) programs can divide finite reality-fluid among an infinite number of nodes via some measure that gives every experience-moment a well-defined relative amount of reality-fluid. Again see caveats about basic philosophical confusion—perhaps our map needs this property over its uncertainty but the territory doesn’t have to work the same way, etcetera.)

If an AI’s overall architecture is also such as to enable it to carry out the “You turned into a cat” effect—where if the AI actually ends up with strong evidence for a scenario it assigned super-exponential improbability, the AI reconsiders its priors and the apparent strength of evidence rather than executing a blind Bayesian update, though this part is formally a tad underspecified—then at the moment I can’t think of anything else to add in.

In other words: This is my best current idea for how a prior, e.g. as used in an AI, could yield decision-theoretic convergence over explosively large possible worlds.

However, I would still call this a semi-open FAI problem (edit: wide-open) because it seems quite plausible that somebody is going to kick holes in the overall view I’ve just presented, or come up with a better solution, possibly within an hour of my posting this—the proposal is both recent and weak even by my standards. I’m also worried about whether it turns out to imply anything crazy on anthropic problems. Over to you, readers.

- Hero Licensing by 21 Nov 2017 21:13 UTC; 171 points) (
- Radical Probabilism by 18 Aug 2020 21:14 UTC; 151 points) (
- A Semitechnical Introductory Dialogue on Solomonoff Induction by 4 Mar 2021 17:27 UTC; 123 points) (
- Pascal’s Muggle: Infinitesimal Priors and Strong Evidence by 8 May 2013 0:43 UTC; 67 points) (
- Pascal’s Muggle (short version) by 5 May 2013 23:36 UTC; 44 points) (
- Original Research on Less Wrong by 29 Oct 2012 22:50 UTC; 41 points) (
- Against the Linear Utility Hypothesis and the Leverage Penalty by 14 Dec 2017 18:38 UTC; 39 points) (
- Pascal’s Muggle Pays by 16 Dec 2017 20:40 UTC; 27 points) (
- 21 Feb 2021 17:30 UTC; 19 points) 's comment on Interview with Tom Chivers: “AI is a plausible existential risk, but it feels as if I’m in Pascal’s mugging” by (EA Forum;
- 2 Feb 2014 15:54 UTC; 13 points) 's comment on On saving the world by (
- 20 Aug 2013 22:37 UTC; 12 points) 's comment on Torture vs Dust Specks Yet Again by (
- How can we influence the long-term future? by 6 Mar 2019 15:31 UTC; 10 points) (EA Forum;
- Tiny Probabilities of Vast Utilities: Bibliography and Appendix by 20 Nov 2018 17:34 UTC; 9 points) (EA Forum;
- 27 May 2014 4:07 UTC; 8 points) 's comment on Pascal’s Mugging Solved by (
- Anti-Pascaline agent by 12 Mar 2015 14:17 UTC; 7 points) (
- 21 Dec 2017 22:31 UTC; 6 points) 's comment on Pascal’s Muggle Pays by (
- 28 Jun 2017 19:54 UTC; 5 points) 's comment on Hi, I’m Luke Muehlhauser. AMA about Open Philanthropy’s new report on consciousness and moral patienthood by (EA Forum;
- Testing lords over foolish lords: gaming Pascal’s mugging by 7 May 2013 18:47 UTC; 4 points) (
- 11 May 2013 1:16 UTC; 1 point) 's comment on Pascal’s Mugging for bounded utility functions by (
- 23 Sep 2015 12:58 UTC; 0 points) 's comment on Probabilities Small Enough To Ignore: An attack on Pascal’s Mugging by (
- Pascal’s Mugging, Finite or Unbounded Resources? by 15 Oct 2015 4:01 UTC; 0 points) (
- 4 May 2016 22:03 UTC; 0 points) 's comment on My Kind of Moral Responsibility by (
- 8 Jan 2016 22:18 UTC; 0 points) 's comment on The Number Choosing Game: Against the existence of perfect theoretical rationality by (
- 20 Aug 2013 9:52 UTC; -2 points) 's comment on Torture vs Dust Specks Yet Again by (

I don’t like to be a bearer of bad news here, but it ought to be stated. This whole leverage ratio idea is very obviously an intelligent kludge / patch / work around because you have two base level theories that either don’t work together or don’t work individually.

You already know that something doesn’t work. That’s what the original post was about and that’s what this post tries to address. But this is a clunky inelegant patch, that’s fine for a project or a website, but given belief in the rest of your writings on AI, this is high stakes. At those stakes saying “we know it doesn’t work, but we patched the bugs we found” is not acceptable.

The combination of your best guess at picking the rigtht decision theory and your best guess at epistemology produces absurd conclusions. Note that you allready know this. This knowledge which you already have motivated this post.

The next step is to identify which is wrong, the decision theory or the epistemology. After that you need to find something that’s not wrong to replace it. That sucks, it’s probably extreamly hard, and it probably sets you back to square one on multiple points. But you can’t know that one of your foundations is wrong and just keep going. Once you know you are wrong you need to act consistently with that.

I’m not sure that the kludge works anyway, since there are still some “high impact” scenarios which don’t get kludged out. Let’s imagine the mugger’s pitch is as follows. “I am the Lord of the Matrix, and guess what—you’re in it! I’m in the process of running a huge number of simulations of human civilization, in series, and in each run of the simulation I am making a very special offer to some carefully selected people within it. If you are prepared to hand over $5 to me, I will kindly prevent one dust speck from entering the eye of one person in each of the next googleplex simulations that I run! Doesn’t that sound like a great offer?”

Now, rather naturally, you’re going to tell him to get lost. And in the worlds where there really is a Matrix Lord, and he’s telling the truth, the approached subjects almost always tell him to get lost as well (the Lord is careful in whom he approaches), which means that googleplexes of preventable dust specks hit googleplexes of eyes. Each rejection of the offer causes a lower total utility than would be obtained from accepting it. And if those worlds have a measure > 1/googleplex, there is on the face of it a net loss in expected utility. More likely, we’re just going to get non-convergent expected utilities again.

The general issue is that the causal structure of the hypothetical world is highly linear. A reasonable proportion of nodes (perhaps 1 in a billion) do indeed have the ability to affect a colossal number of other nodes in such a world. So the high utility outcome doesn’t get suppressed by a locational penalty.

I’d be more worried about that if I couldn’t (apparently) visualize what a corresponding Tegmark Level IV universe looks like. If the union of two theories has a model, they can’t be mutually inconsistent. Whether this corresponding multiverse is plausible is a different problem.

Why is decision/probability theory allowed to constrain the space of “physical” models? It seems that the proper theory should not depend on metaphysical assumptions.

If they are starting to require uncertain metaphysical assumptions, I think that counts as “not working together”.

Metaphysical assumptions are one thing: this one involves normative assumptions. There is zero reason to think we evolved values that can make any sense at all of saving 3^^^3 people. The software we shipped with cannot take numbers like that in it’s domain. That we can think up thought experiments that confuse our ethical intuitions is already incredibly likely. Coming up with kludgey methods to make decisions that give intuitively correct answers to the thought experiments while preserving normal normative reasoning and then—from there—concluding something about what the universe must be like is a really odd epistemic position to take.

I’m not familiar with any certain metaphysical assumptions. And the constraint here is along the lines of “things converge” where it is at least

plausiblethat reality has to converge too. (Small edit made to final paragraphs to reflect this.)That’s the part that starts grating on me. Especially when Eliezer mentions Tegmark Level IV with a straight face. I assume that I do not grok his meaning in fullness. If he means what I think he means, it would be a great disappointment.

shminux,

It’s just a fact that you endorse a very different theory of “reality” than Eliezer. Why disguise your reasonable disagreement with him by claiming that you don’t understand him?

You talk like you don’t notice when highly-qualified-physicist shminux is talking and when average-armchair-philosopher shminux is talking.

Which is annoying to me in particular because physicist shminux knows a lot more than I, and I should pay attention to what he says in order to be less wrong, while philosopher shminux is not entitled to the same weight. So I’d like some markers of which one is talking.

I thought I was pretty clear re the “markers of which one is talking”. But let me recap.

Eliezer has thought about metaethics, decision theories and AI design for much much longer time and much much more seriously than I have. I can see that when I read what he writes about the issues I have not even thought of. While I cannot tell if it is correct, I can certainly tell that there is a fair amount of learning I still have to do if I wanted to be interesting. This is the same feeling I used to get (and still get on occasion) when talking with an expert in, say, General Relativity, before I learned the subject in sufficient depth. Now that I have some expertise in the area, I see the situation from the other side, as well. I can often recognize a standard amateurish argument before the person making it has finished. I often know exactly what implicit false premises lead to this argument, because I had been there myself. If I am lucky, I can successfully point out the problematic assumptions to the amateur in question, provided I can simplify it to the proper level. If so, the reaction I get is “that’s so cool… so deep… I’ll go and ponder it, Thank you, Master!”, the same thing I used to feel when hearing an expert answer my amateurish questions.

As far as Eliezer’s area of expertise is concerned, I am on the wrong side of the gulf. Thus I am happy to learn what I can from him in this area and be gratified if my humble suggestions prove useful on occasion.

I am much more skeptical about his forays into Quantum Mechanics, Relativity and some other areas of physics I have more than passing familiarity with. I do not get the feeling that what he says is “deep”, and only occasionally that it is “interesting”. Hence I am happy to discount his musings about MWI as amateurish.

There is this grey area between the two, which could be thought of as philosophy of science. While I am far from an expert in the area, I have put in a fair amount of effort to understand what the leading edge is. What I find is warring camps of hand-waving “experts” with few interesting insights and no way to convince the rival school of anything. These interesting insights mostly happen in something more properly called math, linguistics or cognitive science, not philosophy proper. There is no feeling of awe you get from listening to a true expert in a certain field. Expert physicists who venture into philosophy, like Tegmark and Page, quickly lose their aura of expertise and seem mere mortals with little or no advantage over other amateurs.

When Eliezer talks about something metaphysical related to MWI and Tegmark IV, or any kind of anthropics, I suspect that he is out of his depth, because he sounds as such. However, knowing that he is an expert in a somewhat related area makes me think that I may well have missed something important, and so I give him the benefit of a doubt and try to figure out what I may have missed. If the only difference is that I “endorse a very different theory of “reality” than Eliezer”, and if this is indeed only the matter of endorsement, and there is no way to tell experimentally who is right, now or in the far future, then his “theory of reality” becomes much less relevant to me and therefore much less interesting. Oh, and here I don’t mean realism vs instrumentalism, I mean falsifiable models of the “real external world”, as opposed to anything Everett-like or Barbour-like.

Even if the field X is confused, to confidently dismiss subtheory Y you must know something confidently about Y from within this confusion, such as that Y is inconsistent or nonreductionist or something. I often occupy this mental state myself but I’m aware that it’s ‘arrogant’ and setting myself above everyone in field X who does think Y is plausible—for example, I am arrogant with respect to respected but elderly physicists who think single-world interpretations of QM are plausible, or anyone who thinks our confusion about the ultimate nature of reality can keep the God subtheory in the running. Our admitted confusion does not permit that particular answer to remain plausible.

I don’t think anyone I take seriously would deny that the field of anthropics / magical-reality-fluid is confused. What do you think you know about all computable processes, or all logical theories with models, existing, which makes that obviously impermitted? In case it’s not clear, I wasn’t endorsing Tegmark Level IV as the obvious truth the way I consider MWI obvious, nor yet endorsing it at all, rather I was pointing out that with some further specification a version of T4

couldprovide a model in which frequencies would go as the probabilities assigned by the complexity+leverage penalty, which would not necessarily make it true. It is not clear to me what epistemic state you could occupy from which this would justly disappoint you in me, unless you considered T4 obviously forbidden even from within our confusion. And of course I’m fine with your being arrogant about that, so long as you realize you’re being arrogant and so long as you have the epistemological firepower to back it up.Maybe I was unclear. I don’t dismiss Y=TL4 as wrong, I ignore it as untestable and therefore useless for justifying anything interesting, like how an AI ought to deal with tiny probabilities of enormous utilities. I agree that I am “arrogant” here, in the sense that I discount an opinion of a smart and popular MIT prof as misguided. The postulate “mathematical existence = physical existence” raises a category error exception for me, as one is, in your words, logic, the other is physics. In fact, I don’t understand why privilege math to begin with. Maybe the universe indeed does not run on math (man, I still chuckle every time I recall that omake). Maybe the trouble we have with understanding the world is that we rely on math too much (sorry, getting too Chopra here). Maybe the matrix lord was a sloppy programmer whose bugs and self-contradictory assumptions manifest themselves to us as black hole singularities, which are hidden from view only because the code maintainers did a passable job of acting on the QA reports. There are many ideas which are just as pretty and just as unjustifiable as TL4. I don’t pretend to fully grok the “complexity+leverage penalty” idea, except to say that your dark energy example makes me think less of it, as it seems to rely on considerations I find dubious (that any model with the potential of affecting gazillions of people in the far future if accurate is extremely unlikely despite being the currently best map available). Is it arrogant? Probably. Is it wrong? Not unless you prove the alternative right.

He’s not saying that the leverage penalty might be correct because we might live in a certain type of Tegmark IV, he’s saying that the fact that the leverage penalty

would becorrect if we did live in Tegmark IV + some other assumptions shows (a) that it is a consistent decision procedure and¹ (b) it is the sort of decision procedure that emerges reasonably naturally and is thus a more reasonable hypothesis than if we didn’t know it comes up natuarally like that.It is possible that it is hard to communicate here since Eliezer is making analogies to model theory, and I would assume that you are not familiar with model theory.

¹ The word ‘and’ isn’t really correct here. It’s very likely that EY means one of (a) and (b), and possibly both.

(Yep. More a than b, it still feels pretty unnatural to me.)

Huh. This whole exchange makes me more certain than I am missing something crucial, but reading and dissecting it repeatedly does not seem to help. And apparently it’s not the issue of not knowing enough math. I guess the mental block I can’t get over is “why TL4?”. Or maybe “what other mental constructs could one use in place of TL4 to make a similar argument?”

Maybe paper-machine or someone else on #lesswrong will be able to clarify this.

Have you got one?

Not sure why you are asking, but yes, I pointed some out 5 levels up. They clearly have a complexity penalty, but I am not sure how much vs TL4. At least I know that the “sloppy programmer” construct is finite (though possibly circular). I am not sure how to even begin to estimate the Kolmogorov complexity of “everything mathematically possible exists physically”. What Turing machine would output all possible mathematical structures?

“Loop infinitely, incrementing

`count`

from 1: [Let`steps`

be`count`

. Iterate all legal programs until`steps`

= 0 into`prog`

: [Load submachine state from “cache tape”. Execute one step of`prog`

, writing output to “output tape”. Save machine state onto “cache tape”. Decrement`steps`

.] ]”The output of every program is found on the output tape (albeit at intervals). I’m sure one could design the Turing machine so that it reordered the output tape with every piece of data written so that they’re in order too, if you want that. Or make it copypaste the entire output so far to the end of the tape, so that every number of evaluation steps for every Turing machine has its own tape location. Seemed a little wasteful though.

edit: THANK YOU GWERN . This is indeed what I was thinking of :D

Hey, don’t look at me. I’m with you on “Existence of T4 is untestable therefore boring.”

You are right, I am out of my depth math-wise. Maybe that’s why I can’t see the relevance of an untestable theory to AI design.

It seems to be the

problemthat is relevant to AI design. How does an expected utility maximising agent handle edge cases and infinitesimals given logical uncertainty and bounded capabilities? If you get that wrong then Rocks Fall and Everyone Dies. The relevance of any given theory of how such things can be modelled is then based on either suitability for use in an AI design (or conceivably the implications if an AI constructed and used said model).(Also yep.)

TL4, or at least (TL4+some measure theory that gives calculable and sensible answers), is not entirely unfalsifiable. For instance, it predicts that a random observer (you) should live in a very “big” universe. Since we have plausible reasons to believe TL0-TL3 (or at least, I think we do), and I have a very hard time imagining specific laws of physics that give “bigger” causal webs than you get from TL0-TL3, that gives me some weak evidence for TL4; it could have been falsified but wasn’t.

It seems plausible that that’s the only evidence we’ll ever get regarding TL4. If so, I’m not sure that either of the terms “testable” or “untestable” apply. “Testable” means “susceptible to reproducible experiment”; “untestable” means “unsusceptible to experiment”; so what do you call something in between, which is susceptible only to limited and irreproducible evidence? Quasitestable?

Of course, you could still perhaps say “I ignore it as only quasitestable and therefore useless for justifying anything interesting”.

TL4 seems testable by asking what a ‘randomly chosen’ observer would expect to see. In fact, the simplest version seems falsified by the lack of observed discontinuities in physics (of the ‘clothes turn into a crocodile’ type).

Variants of TL4 that might hold seem untestable right now. But we could see them as ideas or directions for groping towards a theory, rather than complete hypotheses. Or it might happen that when we understand anthropics better, we’ll see an obvious test. (Or the original hypothesis might turn out to work, but I strongly doubt that.)

’Splain yo’self.

See my reply to TimS.

Mugger: Give me five dollars, and I’ll save 3↑↑↑3 lives using my Matrix Powers.

Me: I’m not sure about that.

Mugger: So then, you think the probability I’m telling the truth is on the order of 1/3↑↑↑3?

Me: Actually no. I’m just not sure I care as much about your 3↑↑↑3 simulated people as much as you think I do.

Mugger: “This should be good.”

Me: There’s only something like n=10^10 neurons in a human brain, and the number of possible states of a human brain exponential in n. This is stupidly tiny compared to 3↑↑↑3, so most of the lives you’re saving will be heavily duplicated. I’m not really sure that I care about duplicates that much.

Mugger: Well I didn’t say they would all be humans. Haven’t you read enough Sci-Fi to know that you should care about all possible sentient life?

Me: Of course. But the same sort of reasoning implies that, either there are a lot of duplicates, or else most of the people you are talking about are incomprehensibly large, since there aren’t that many small Turing machines to go around. And it’s not at all obvious to me that you can describe arbitrarily large minds whose existence I should care about without using up a lot of complexity. More generally, I can’t see any way to describe worlds which I care about to a degree that vastly outgrows their complexity. My values are complicated.

Bostrom would probably try to argue that you do. See Bostrom (2006).

Am I crazy, or does Bostrom’s argument in that paper fall flat almost immediately, based on a bad moral argument?

His first, and seemingly most compelling, argument for Duplication over Unification is that, assuming an infinite universe, it’s certain (with probability 1) that there is already an identical portion of the universe where you’re torturing the person in front of you. Given Unification, it’s meaningless to distinguish between that portion and this portion, given their physical identicalness, so torturing the person is morally blameless, as you’re not increasing the number of unique observers being tortured. Duplication makes the two instances of the person distinct due to their differing spatial locations, even if every other physical and mental aspect is identical, so torturing is still adding to the suffering in the universe.

However, you can flip this over trivially and come to a terrible conclusion. If Duplication is true, you merely have to simulate a person until they experience a moment of pure hedonic bliss, in some ethically correct manner that everyone agrees is morally good to experience and enjoy. Then, copy the fragment of the simulation covering the experiencing of that emotion, and duplicate it endlessly. Each duplicate is distinct, and so you’re increasing the amount of joy in the universe every time you make a copy. It would be a net win, in fact, if you killed every human and replaced the earth with a computer doing nothing but running copies of that one person experiencing a moment of bliss. Unification takes care of this, by noting that duplicating someone adds, at most, a single bit of information to the universe, so spamming the universe with copies of the happy moment counts either the same as the single experience, or at most a trivial amount more.

Am I thinking wrong here?

True just if your

summum bonumis exactly an aggregate of moments of happiness experienced.I take the position that it is not.

I don’t think one even has to resort to a position like “only one copy counts”.

True, but that’s then striking more at the heart of Bostrom’s argument, rather than my counter-argument, which was just flipping Bostrom around. (Unless your summum malum is significantly different, such that duplicate tortures and duplicate good-things-equivalent-to-torture-in-emotional-effect still sum differently?)

I’d argue that the torture portion is not identical to the not-torture portion and that the difference is caused by at least one event in the common prior history of both portions of the universe where they diverged. Unification only makes counterfactual worlds real; it does not cause every agent to experience every counterfactual world. Agents are differentiated by the choices they make and agents who perform torture are not the same agents as those who abstain from torture. The difference can be made arbitrarily small, for instance by choosing an agent with a 50% probability of committing torture based on the outcome of a quantum coin flip, but the moral question in that case is why an agent would choose to become 50% likely to commit torture in the first place. Some counterfactual agents will choose to become 50% likely to commit torture, but they will be very different than the agents who are 1% likely to commit torture.

I think you’re interpreting Bostrom slightly wrong. You seem to be reading his argument (or perhaps just my short distillation of it) as arguing that you’re not currently torturing someone, but there’s an identical section of the universe elsewhere where you

aretorturing someone, so you might as well start torturing now.As you note, that’s contradictory—if you’re not currently torturing, then your section of the universe must not be identical to the section where the you-copy

istorturing.Instead, assume that you are currently torturing someone. Bostrom’s argument is that you’re not making the universe worse, because there’s a you-copy which is torturing an identical person elsewhere in the universe. At most one of your copies is capable of taking blame for this; the rest are just running the same calculations “a second time”, so to say. (Or at least, that’s what he’s arguing that Unification would say, and using this as a reason to reject it and turn to Duplication, so each copy is morally culpable for causing new suffering.)

I think it not unlikely that if we have a successful intelligence explosion and subsequently discover a way to build something 4^^^^4-sized, then we will figure out a way to grow into it, one step at a time. This 4^^^^4-sized supertranshuman mind then should be able to discriminate “interesting” from “boring” 3^^^3-sized things. If you could convince the 4^^^^4-sized thing to write down a list of all nonboring 3^^^3-sized things in its spare time, then you would have a formal way to say what an “interesting 3^^^3-sized thing” is, with description length (the description length of humanity = the description length of our actual universe) + (the additional description length to give humanity access to a 4^^^^4-sized computer—which isn’t much because access to a universal Turing machine would do the job and more).

Thus, I don’t think that it needs a 3^^^3-sized description length to pick out interesting 3^^^3-sized minds.

Me: Actually no. I’m just not sure I care as much about your 3↑↑↑3 simulated people as much as you think I do.

Mugger: So then, you think the probability that you should care as much about my 3↑↑↑3 simulated people as I thought you did is on the order of 1/3↑↑↑3?

After thinking about it a bit more I decided that I actually do care about simulated people almost exactly as the mugger thought I did.

Didn’t you feel sad when Yoona-939 was terminated, or wish all happiness for Sonmi-451?

All the other Yoona-939s were fine, right? And that Yoona-939 was terminated quickly enough to prevent divergence, wasn’t she?

(my point is, you’re making it seem like you’re breaking the degeneracy by labeling them. But their being identical is deep)

But now she’s… you know… now she’s…

(wipes away tears)slightly less real.You hit pretty strong diminishing returns on existence once you’ve hit the ‘at least one copy’ point.

Clones aren’t duplicates. They may have started out as duplicates but they were not by the time the reader is introduced to them.

Benja’s method is better and more clearly right, but here’s another interesting one. Start from me now. At every future moment when there are two possible valuable next experiences for me, make two copies of me, have the first experience one and the second the other. Allow me to grow if it’s valuable. Continue branching and growing until 3^^^3 beings have been generated.

“The kind of mind I would like to grow into if I had 3^^^3 years”

I agree with most of this. I think it is plausible that the value of a scenario is in some sense upper-bounded by its description length, so that we need on the order of googolplex bits to describe a googolplex of value.

We can separately ask if this solves the problem. One may want a theory which solves the problem regardless of utility function; or, aiming lower, one may be satisfied to find a class of utility functions which seem to capture human intuition well enough.

Upper-bounding utility by description complexity doesn’t actually capture the intuition, since a simple universe could give rise to many complex minds.

This post has not at all misunderstood my suggestion from long ago, though I don’t think I thought about it very much at the time. I agree with the thrust of the post that a leverage factor seems to deal with the basic problem, though of course I’m also somewhat expecting more scenarios to be proposed to upset the apparent resolution soon.

Hm, a linear “leverage penalty” sounds an awful lot like adding the complexity of locating you of the pool of possibilities to the total complexity.

Thing 2: consider the case of the

otherpeople on that street when the Pascal’s Muggle-ing happens. Suppose they could overhear what is being said. Since they have no leverage of their own, are they free to assign a high probability to the muggle helping 3^^^3 people? Do a few of them start forward to interfere, only to be held back by the cooler heads who realize that all who interfere will suddenly have the probability of success reduced by a factor of 3^^^3?This is indeed a good argument for viewing the leverage penalty as a special case of a locational penalty (which I think is more or less what Hanson proposed to begin with).

Suppose we had a planet of 3^^^3 people (their universe has novel physical laws). There is a planet-wide lottery. Catherine wins. There was a 1/3^^^3 chance of this happening. The lotto representative comes up to her and asks her to hand over her ID card for verification.

All over the planet, as a fun prank, a small proportion of people have been dressing up as lotto representatives and running away with peoples’ ID cards. This is very rare—only one person in 3^^3 does this today.

If the lottery prize is 3^^3 times better than getting your ID card stolen, should Catherine trust the lotto official? No, because there are 3^^^3/3^^3 pranksters, and only 1 real official, and 3^^^3/3^^3 is 3^^(3^^3 − 3), which is a whole lot of pranksters. She hangs on to her card, and doesn’t get the prize. Maybe if the reward were 3^^^3 times greater than the penalty, we could finally get some lottery winners to actually collect their winnings.

All of which is to say, I don’t think there’s any locational penalty—the crowd near the muggle should have exactly the same probability assignments as her, just as the crowd near Catherine has the same probability assignments as her about whether this is a prankster or the real official. I think the penalty is the ratio of lotto officials to pranksters (conditional on a hypothesis like “the lottery has taken place”). If the hypothesis is clever, though, it could probably evade this penalty (hypothesize a smaller population with a reward of 3^^^3 years of utility-satisfaction, maybe, or 3^^^3

newpeople created), and so what intuitively seems like a defense against pascal’s mugging may not be.Really? I was going to say that the argument need not mention the muggle at all, since the mugger is also one person among 3^^^3.

A simplified version of the argument here:

The utility function isn’t up for grabs.

Therefore, we need unbounded utility.

Oops! If we allow unbounded utility, we can get non-convergence in our expectation.

Since we’ve already established that the utility function is not up for grabs, let’s try and modify the probability to fix this!

My response to this is that the probability distribution is

even less up for grabs. The utility, at least, is explicitly there to reflect our preferences. If we see that a utility function is causing our agent to take the wrong actions, then it makes sense to change it to better reflect the actions we wish our agent to take.The probability distribution, on the other hand, is a map that should reflect the territory as well as possible! It should not be modified on account of badly-behaved utility computations.

This may be taken as an argument in favor of modifying the utility function; Sniffnoy makes a case for bounded utility in another comment.

It could alternatively be taken as a case for modifying the decision procedure. Perhaps neither the probability nor the utility are “up for grabs”, but how we use them should be modified.

One (somewhat crazy) option is to take the median expectation rather than the mean expectation: we judge actions by computing the lowest utility score that we have 50% chance of making or beating, rather than by computing the average. This makes the computation insensitive to extreme (high or low) outcomes with small probabilities. Unfortunately, it also makes the computation insensitive to extreme (high or low) options with 49% probabilities: it would prefer a gamble with a 49% probability of utility −3^^^3 and 51% probability of utility +1, to a gamble with 51% probability of utility 0, and 49% probability of +3^^^3.

But perhaps there are more well-motivated alternatives.

If the agent defines its utility indirectly in terms of designer’s preference, a disagreement in evaluation of a decision by agent’s utility function and designer’s preference doesn’t easily indicate that designer’s evaluation is more accurate, and if it’s not, then the designer should defer to the agent’s judgment instead of adjusting its utility.

Similarly, if the agent is good at building its map, it might have a better map than the designer, so a disagreement is not easily resolved in favor of the designer. On the other hand, there can be a bug in agent’s world modeling code in which case it should be fixed! And similarly, if there is a bug in agent’s indirect utility definition, it too should be fixed. The arguments seem analogous to me, so why would preference be more easily debugged than world model?

Really? In practice I have a great deal of uncertainty about both my utility function and my probability estimates. Accurate probability estimates require the ability to accurately model the world, and this seems incredibly hard in general. It’s not at all clear to me that instrumental rationality means trusting your current probability estimates if you have reason to believe that future evidence will drastically change them or that they’re corrupted for some other reason (even an otherwise flawlessly designed AI has to worry about cosmic rays flipping the bits in its memory or, Omega forbid, its source code).

I am definitely not saying “trust your current probability estimates”.

What I’m saying is that probability should reflect reality as closely as possible, whereas utility should reflect preferences as closely as possible.

Modifying the preference function in an ad-hoc way to get the right behavior is a bad idea, but modifying our expectation about how reality actually might be is even worse. The probability function should be modified exclusively in response to considerations about how reality might be. The utility function should be modified exclusively in response to considerations about our preferences.

I have a problem with calling this a “semi-open FAI problem”, because even if Eliezer’s proposed solution turns out to be correct, it’s still a wide open problem to develop arguments that can allow us to be confident enough in it to incorporate it into an FAI design. This would be true even if nobody can see any holes in it or have any better ideas, and doubly true given that some FAI researchers consider a different approach (which assumes that there is no such thing as “reality-fluid”, that everything in the multiverse just exists and as a matter of preference we do not / can not care about all parts of it in equal measure, #4 in this post) to be at least as plausible as Eliezer’s current approach.

You’re right. Edited.

In my view, we could make act-based agents without answering this or any similar questions. So I’m much less interested in answering them then I used to be. (There are possible approaches that do have to answer all of these questions, but at this point they seem very much less promising to me.)

We’ve briefly discussed this issue in the abstract, but I’m curious to get your take in a concrete case. Does that seem right to you? Do you think that we need to understand issues like this one, and have confidence in that understanding, prior to building powerful AI systems?

FAI designs that require high confidence solutions to many philosophical problems also do not seem very promising to me at this point. I endorse looking for alternative approaches.

I agree that act-based agents seem to require fewer high confidence solutions to philosophical problems. My main concern with act-based agents is that these designs will be in competition with fully autonomous AGIs (either alternative designs, or act-based agents that evolve into full autonomy due to inadequate care of their owners/users) to colonize the universe. The dependence on humans and lack of full autonomy in act-based agents seem likely to cause a significant weakness in at least one crucial area of this competition, such as general speed/efficiency/creativity, warfare (conventional, cyber, psychological, biological, nano, etc.), cooperation/coordination, self-improvement, and space travel. So even if these agents turn out to be “safe”, I’m not optimistic that we “win” in the long run.

My own idea is to aim for FAI designs that can correct their philosophical errors, autonomously, the same way that we humans can. Ideally, we’d fully understand how humans reason about philosophical problems and how philosophy normatively

oughtto be done before programming or teaching that to an AI. But realistically, due to time pressure, we might have to settle for something suboptimal like teaching through examples of human philosophical reasoning. Of course there’s lots of ways for this kind of AI to go wrong as well, so I also consider it to be a long shot.Let me ask you a related question. Suppose act-based designs are as successful as you expect them to be. We still need to understand issues like the one described in Eliezer’s post (or solve the meta-problem of understanding philosophical reasoning)

at some point, right? When do you think that will be? In other words, how much time do you think successfully creating act-based agents buys us?It’s not so much that I have confidence in these approaches, but that I think (1) they are the most natural to explore at the moment, and (2) issues that seem like they can be cleanly avoided for these approaches seem less likely to be fundamental obstructions in general.

Whenever such issues bear directly on our decision-making in such a way that making errors would be really bad. For example, when we encounter a situation where we face a small probability of a very large payoff, then it matters how well we understand the particular tradeoff at hand. The goal / best case is that the development of AI doesn’t depend on sorting out these kinds of considerations for its own sake, only insofar as the AI has to actually make critical choices that depend on these considerations.

I wrote a little bit about efficiency here. I don’t see why an approval-directed agent would be at a serious disadvantage compared to an RL agent (though I do see why an imitation learner would be at a disadvantage by default, and why an approval-directed agent may be unsatisfying from a safety perspective for non-philosophical reasons).

Ideally you would synthesize data in advance in order to operate without access to counterfactual human feedback at runtime—it’s not clear if this is possible, but it seems at least plausible. But it’s also not clear to me it is necessary, as long as we can tolerate very modest (<1%) overhead from oversight.

Of course if such a period goes on long enough then it will be a problem, but that is a slow-burning problem that a superintelligent civilization can address at its leisure. In terms of technical solutions, anything we can think of now will easily be thought of in this future scenario. It seems like the only thing we really lose is the option of technological relinquishment or serious slow-down, which don’t look very attractive/feasible at the moment.

Isn’t a crucial consideration here how soon after the development of AI they will be faced with such choices? If the answer is “soon” then it seems that we should try to solve the problems ahead of time or try to delay AI. What’s your estimate? And what do you think the first such choices will be?

I think that we are facing some issues all of the time (e.g. some of these questions probably bear on “how much should we prioritize fast technological development?” or “how concerned should we be with physics disasters?” or so on), but that it will be a long time before we face really big expected costs from getting these wrong. My best guess is that we will get to do many-centuries-of-current-humanity worth of thinking before we really need to get any of these questions right.

I don’t have a clear sense of what the first choices will be. My view is largely coming from not seeing any serious candidates for critical choices.

Anything to do with expansion into space looks like it will be very far away in subjective time (though perhaps not far in calendar time). Maybe there is some stuff with simulations, or value drift, but neither of those look very big in expectation for now. Maybe all of these issues together make 5% difference in expectation over the next few hundred subjective issues? (Though this is a pretty unstable estimate.)

How did you arrive at the conclusion that we’re not facing big expected costs with these questions? It seems to me that for example the construction of large nuclear arsenals and lack of sufficient safeguards against nuclear war has already caused a large expected cost, and may have been based on one or more incorrect philosophical understandings (e.g., to the question of, what is the right amount of concern for distant strangers and future people). Similarly with “how much should we prioritize fast technological development?” But this is just from intuition since I don’t really know how to compute expected costs when the uncertainties involved have a large moral or normative component.

Do you expect technological development to have plateaued by then (i.e., AIs will have invented essentially all technologies feasible in this universe)? If so, do you think there won’t be any technologies among them that would let some group of people/AIs unilaterally alter the future of the universe according to their understanding of what is normative? (For example, intentionally or accidentally destroy civilization, or win a decisive war against the rest of the world.) Or do you think something like a world government will have been created to control the use of such technologies?

There are lots of things we don’t know, and my default presumption is for errors to be non-astronomically-costly, until there are arguments otherwise.

I agree that philosophical problems have some stronger claim to causing astronomical damage, and so I am more scared of philosophical errors than e.g. our lack of effective public policy, our weak coordination mechanisms, global warming, the dismal state of computer security.

But I don’t see really strong arguments for philosophical errors causing great damage, and so I’m skeptical that we are facing big expected costs (big compared to the biggest costs we can identify and intervene on, amongst them AI safety).

That is, there seems to be a pretty good case that AI may be built soon, and that we lack the understanding to build AI systems that do what we want, that we will nevertheless build AI systems to help us get what we want in the short term, and that in the long run this will radically reduce the value of the universe. The cases for philosophical errors causing damage are overall much more speculative, have lower stakes, and are less urgent.

I agree that philosophical progress would very slightly decrease the probability of nuclear trouble, but this looks like a very small effect. (Orders of magnitude smaller than the effects from say increased global peace and stability, which I’d probably list as a higher priority right now than resolving philosophical uncertainty.) It’s possible we disagree about the mechanics of this particular situation.

No. I think that 200 years of subjective time probably amounts 5-10 more doublings of the economy, and that technological change is a plausible reason that philosophical error would eventually become catastrophic.

I said “best guess” but this really is a pretty wild guess about the relevant timescales.

As with the special case of nuclear weapons, I think that philosophical error is a relatively small input into world-destruction.

I don’t expect this to cause philosophical errors to become catastrophic. I guess the concern is that the war will be won by someone who doesn’t much care about the future, thereby increasing the probability that resources are controlled by someone who prefers not undergo any further reflection? I’m willing to talk about this scenario more, but at face value the prospect of a decisive military victory wouldn’t bump philosophical error above AI risk as a concern for me.

I’m open to ending up with a more pessimistic view about the consequences of philosophical error, either by thinking through more possible scenarios in which it causes damage or by considering more abstract arguments.

But if I end up with a view more like yours, I don’t know if it would change my view on AI safety. It still feels like the AI control problem is a different issue which can be considered separately.

This seems like a

really usefulstrategy!Agreed—placeholders and kludges should

looklike placeholders and kludges. I became a happier programmer when I realised this, because up until then I was always conflicted about how much time I should spend making some unsatisfying piece of code look beautiful.How does this style of reasoning work on something more like the original Pascal’s Wager problem?

Suppose a (to all appearances) perfectly ordinary person goes on TV and says “I am an avatar of the Dark Lords of the Matrix. Please send me $5. When I shut down the simulation in a few months, I will subject those who send me the money to [LARGE NUMBER] years of happiness, and those who do not to [LARGE NUMBER] years of pain”.

Here you can’t solve the problem by pointing out the very large numbers of people involved, because there aren’t very high numbers of people involved. Your probability should depend only on your probability that this is a simulation, your probability that the simulators would make a weird request like this, and your probability that this person’s specific weird request is likely to be it. None of these numbers help you get down to a 1/[LARGE NUMBER] level.

I’ve avoided saying 3^^^3, because maybe there’s some fundamental constraint on computing power that makes it impossible for simulators to simulate 3^^^3 years of happiness in any amount of time they might conceivably be willing to dedicate to the problem. But they might be able to simulate some number of years large enough to outweigh our prior against any given weird request coming from the Dark Lords of the Matrix.

(also, it seems less than 3^^^3-level certain that there’s no clever trick to get effectively infinite computing power or effectively infinite computing time, like the substrateless computation in Permutation City)

When we jump to the version involving causal nodes having Large leverage over other nodes in a graph, there aren’t Large numbers of distinct people involved, but there’s Large numbers of life-centuries involved and those moments of thought and life have to be instantiated by causal nodes.

Infinity makes my calculations break down and cry, at least at the moment.

Imagine someone makes the following claims:

I’ve invented an immortality drug

I’ve invented a near-light-speed spaceship

The spaceship has really good life support/recycling

The spaceship is self-repairing and draws power from interstellar hydrogen

I’ve discovered the Universe will last at least another 3^^^3 years

Then they threaten, unless you give them $5, to kidnap you, give you the immortality drug, stick you in the spaceship, launch it at near-light speed, and have you stuck (presumably bound in an uncomfortable position) in the spaceship for the 3^^^3 years the universe will last.

(okay, there are lots of contingent features of the universe that will make this not work, but imagine something better. Pocket dimension, maybe?)

If their claims are true, then their threat seems credible even though it involves a large amount of suffering. Can you explain what you mean by life-centuries being instantiated by causal nodes, and how that makes the madman’s threat less credible?

If what he says is true, then there will be 3^^^3 years of life in the universe. Then, assuming this anthropic framework is correct, it’s very unlikely to find yourself at the beginning rather than at any other point in time, so this provides 3^^^3-sized evidence against this scenario.

I’m not entirely sure that the doomsday argument also applies to different time slices of the same person, given that Eliezer in 2013 remembers being Eliezer in 2012 but not vice versa.

Are you sure it wouldn’t be rational to pay up? I mean, if the guy looks like he could do that for $5, I’d rather not take chances. If you pay, and it turns out he didn’t have all that equipment for torture, you could just sue him and get that $5 back, since he defrauded you. If he starts making up rules about how you can never ever tell anyone else about this, or later check validity of his claim or he’ll kidnap you, you should, for game-theoretical reasons not abide, since being the kinda agent that accepts those terms makes you valid target for such frauds. Reasons for not abiding being the same as for single-boxing.

That requires a MTTF of 3^^^3 years, or a per-year probability of failure of roughly 1/3^^^3.

This implies that physical properties like the cosmological constant and the half-life of protons can be measured to a precision of roughly 1/3^^^3 relative error.

To me it seems like both of those claims have prior probability ~ 1/3^^^3. (How many spaceships would you have to build and how long would you have to test them to get an MTTF estimate as large as 3^^^3? How many measurements do you have to make to get the standard deviation below 1/3^^^3?)

Say the being that suffers for 3^^^3 seconds is morally relevant but not in the same observer moment reference class as humans for some reason. (IIRC putting all possible observers in the same reference class leads to bizarre conclusions...? I can’t immediately re-derive why that would be.) But anyway it really seems that the magical causal juice is the important thing here, not the anthropic/experiential nature or lack thereof of the highly-causal nodes, in which case the anthropic solution isn’t quite hugging the real query.

The only reason that I have ever thought of is that our reference class should intuitively consist of only sentient beings, but that nonsentient beings should still be able to reason. Is this what you were thinking of? Whether it applies in a given context may depend on what exactly you mean by a reference class in that context.

If it can reason but isn’t sentient then it maybe doesn’t have “observer” moments, and maybe isn’t itself morally relevant—Eliezer seems to think that way anyway. I’ve been trying something like, maybe messing with the non-sentient observer has a 3^^^3 utilon effect on human utility somehow, but that seems psychologically-architecturally impossible for humans in a way that might end up being fundamental. (Like, you either have to make 3^^^3 humans, which defeats the purpose of the argument, or make a single human have a 3^^^3 times better life without lengthening it, which seems impossible.) Overall I’m having a really surprising amount of difficulty thinking up an example where you have a lot of causal importance but no anthropic counter-evidence.

Anyway, does “anthropic” even really have anything to do with qualia? The way people talk about it it clearly does, but I’m not sure it even shows up in the definition—a non-sentient optimizer could totally make anthropic updates. (That said I guess Hofstadter and other strange loop functionalists would disagree.) Have I just been wrongly assuming that everyone else was including “qualia” as fundamental to anthropics?

Yeah, this whole line of reasoning fails if you can get to 3^^^3 utilons without creating ~3^^^3 sentients to distribute them among.

I’m not sure what you mean. If you use an anthropic theory like what Eliezer is using here (e.g. SSA, UDASSA) then an amount of causal importance that is large compared to the rest of your reference class implies few similar members of the reference class, which is anthropic counter-evidence, so of course it would be impossible to think of an example. Even if nonsentients can contribute to utility, if I can create 3^^^3 utilons using nonsentients, than some other people probably can to, so I don’t have a lot of causal importance compared to them.

This is the contrapositive of the grandparent. I was saying that if we assume that the reference class is sentients, then nonsentients need to reason using different rules i.e. a different reference class. You are saying that if nonsentients should reason using the same rules, then the reference class cannot comprise only sentients. I actually agree with the latter much more strongly, and I only brought up the former because it seemed similar to the argument you were trying to remember.

There are really two separate questions here, that of how to reason anthropically and that of how magic reality-fluid is distributed. Confusing these is common, since the same sort of considerations affect both of them and since they are both badly understood, though I would say that due to UDT/ADT, we now understand the former much better, while acknowledging the possibility of unknown unknowns. (Our current state of knowledge where we confuse these actually feels a lot like people who have never learnt to separate the descriptive and the normative.)

The way Eliezer presented things in the post, it is not entirely clear which of the two he meant to be responsible for the leverage penalty. It seems like he meant for it to be an epistemic consideration due to anthropic reasoning, but this seems obviously wrong given UDT. In the Tegmark IV model that he describes, the leverage penalty is caused by reality-fluid, but it seems like he only intended that as an analogy. It seems a lot more probable to me though, and it is possible that Eliezer would express uncertainty as to whether the leverage penalty is actually caused by reality-fluid, so that it is a bit more than an analogy. There is also a third mathematically equivalent possibility where the leverage penalty is about values, and we just care less about individual people when there are more of them, but Eliezer obviously does not hold that view.

A comment: it is not clear to me that Eliezer is intending to use SSA or UDASSA here. The “magic reality fluid” measure looks more like SIA, but with a prior based on Levin complexity rather than Kolmogorov complexity—see my comment here. Or—in an equivalent formulation—he’s using Kolmogorov + SSA but with an extremely broad “reference class” (the class of all causal nodes, most of which aren’t observers in any anthropic sense). This is still not UDASSA.

To get something like UDASSA, we shouldn’t distribute the weight 2^-#p of each program p

uniformlyamong its execution steps. Instead we should consider using another program q to pick out an execution step or a sequence of steps (i.e. a sub-program s) from p, and then give the combination of q,p a weight 2^-(#p+#q). This means each sub-program s will get a total prior weight of Sum {p, q: q(p) = s & s is a sub-program of p} 2^-(#p + #q).When updating on your evidence E, consider the class S(E) of all sub-programs which correspond to an AI program having that evidence, and normalize. The posterior probability you are in a particular universe p’ then becomes proportional to Sum {q: q(p’) is a sub-program of p’ and a member of S(E)} 2^-(#p’ + #q).

This looks rather different to what I discussed in my other comment, and it maybe handles anthropic problems a bit better. I can’t see there is any shift either towards very big universes (no presumptuous philosopher) or towards dense computronium universes, where we are simulations. There does appear to be a Great Filter or “Doomsday” shift, since it is still a form of SSA, but this is mitigated by the consideration that we may be part of a reference class (program q) which preferentially selects pre-AI biological observers, as opposed to any old observers.

I agree with this; the ‘e.g.’ was meant to point toward the most similar theories that have names, not pin down exactly what Eliezer is doing here. I though that it would be better to refer to the class of similar theories here since there is enough uncertainty that we don’t really have details.

Just thought of something:

How sure are we that P(there are N people) is not at least as small as 1/N for sufficiently large N, even without a leverage penalty? The OP seems to be arguing that the complexity penalty on the prior is insufficient to generate this low probability, since it doesn’t take much additional complexity to generate scenarios with arbitrarily more people. Yet it seems to me that after some sufficiently large number, P(there are N people)

mustdrop faster than 1/N. This is because our prior must be normalized. That is:Sum(all non-negative integers N) of P(there are N people) = 1.

If there was some integer M such that for all n > M, P(there are n people) >= 1/n, the above sum would not converge. If we are to have a normalized prior, there must be a faster-than-1/N falloff to the function P(there are N people).

In fact, if one demands that my priors indicate that my expected average number of people in the universe/multiverse is finite, then my priors must diminish faster than 1/N^2. (So that that the sum of N*P(there are N people) converges).

TL:DR If your priors are such that the probability of there being 3^^^3 people is not smaller than 1/(3^^^3), then you don’t have a normalized distribution of priors. If your priors are such that the probability of there being 3^^^3 people is not smaller than 1/((3^^^3)^2) then your expected number of people in the multiverse is divergent/infinite.

Hm. Technically for EU

differentialsto converge we only need that the number of peoplewe expectedly affectsums to something finite, but having a finite expected number of people existing in the multiverse would certainly accomplish that.The problem is that the Solomonoff prior picks out 3^^^3 as much more likely than most of the numbers of the same magnitude because it has much lower Kolmogorov complexity.

I’m not familiar with Kolmogorov complexity, but isn’t the aparent simplicity of 3^^^3 just an artifact of what notation we happen to have invented? I mean, “^^^” is not really a basic operation in arithmetic. We have a nice compact way of describing what steps are needed to get from a number we intuitively grok, 3, to 3^^^3, but I’m not sure it’s safe to say that makes it simple in any significant way. For one thing, what would make 3 a simple number in the first place?

In the nicest possible way, shouldn’t you have stopped right there? Shouldn’t the appearance of this unfamiliar and formidable-looking word have told you that I wasn’t appealing to some intuitive notion of complexity, but to a particular formalisation that you would need to be familiar with to challenge? If instead of commenting you’d Googled that term, you would have found the Wikipedia article that answered this and your next question.

You can as a rough estimate of the complexity of a number take the amount of lines of the shortest program that would compute the number from basic operations. More formally, substitute lines of a program with states of a Turing Machine.

But what numbers are you allowed to start with on the computation? Why can’t I say that, for example, 12,345,346,437,682,315,436 is one of the numbers I can do computation from (as a starting point), and thus it has extremely small complexity?

You could say this—doing so would be like describing your own language in which things involving 12,345,346,437,682,315,436 can be expressed concisely.

So Kolmogorov complexity is somewhat language-dependent. However, given two languages in which you can describe numbers, you can compute a constant such that the complexity of any number is off by at most that constant between the two languages. (The constant is more or less the complexity of describing one language in the other). So things aren’t actually too bad.

But if we’re just talking about Turing machines, we presumably express numbers in binary, in which case writing “3” can be done very easily, and all you need to do to specify 3^^^3 is to make a Turing machine computing ^^^.

But can’t this constant itself be arbitrarily large when talking about arbitrary numbers? (Of course, for any

specificnumber, it is limited in size.)Well… Given any number

N, you can in principle invent a programming language where the program`do_it`

outputsN.The constant depends on the two languages, but not on the number. As army1987 points out, if you pick the number first, and then make up languages, then the difference can be arbitrarily large. (You could go in the other direction as well: if your language specifies that no number less than 3^^^3 can be entered as a constant, then it would probably take approximately log(3^^^3) bits to specify even small numbers like 1 or 2.)

But if you pick the languages first, then you can compute a constant based on the languages, such that for

allnumbers, the optimal description lengths in the two languages differ by at most a constant.The context this in which this comes up here generally requires something like “there’s a way to compare the complexity of numbers which always produces the same results independent of language, except in a finite set of cases. Since that set is finite and my argument doesn’t depend on any specific number, I can always base my argument on a case that’s not in that set.”

If that’s how you’re using it, then you don’t get to pick the languages first.

You do get to pick the languages first because there is a large but finite (say no more than 10^6) set of reasonable languages-modulo-trivial-details that could form the basis for such a measurement.

This is an awful lot of words to expend to notice that

(1) Social interactions need to be modeled in a game-theoretic setting, not straightforward expected payoff

(2) Distributions of expected values matter. (Hint: p(N) = 1/N is a really bad model as it doesn’t converge).

(3) Utility functions are neither linear nor symmetric. (Hint: extinction is not symmetric with doubling the population.)

(4) We don’t actually have an agreed-upon utility function anyway; big numbers plus a not-well-agreed-on fuzzy notion is a great way to produce counterintuitive results. The details don’t really matter; as fuzzy approaches infinity, you get nonintuitiveness.

It’s much more valuable to address some of these imperfections in the setup of the problem than continuing to wade through the logic with bad assumptions in hand.

Friendly neighborhood Matrix Lord checking in!

I’d like to apologize for the behavior of my friend in the hypothetical. He likes to make illusory promises. You should realize that regardless of what he may tell you, his choice of whether to hit the green button is independent of your choice of what to do with your $5. He may hit the green button and save 3↑↑↑3 lives, or he may not, at his whim. Your $5 can not be reliably expected to influence his decision in any way you can predict.

You are no doubt accustomed to thinking about enforceable contracts between parties, since those are a staple of your game theoretic literature as well as your storytelling traditions. Often, your literature omits the requisite preconditions for a binding contract since they are implicit or taken for granted in typical cases. Matrix Lords are highly atypical counterparties, however, and it would be a mistake to carry over those assumptions merely because his statements resemble the syntactic form of an offer between humans.

Did my Matrix Lord friend (who you just met a few minutes ago!) volunteer to have his green save-the-multitudes button and your $5 placed under the control of a mutually trustworthy third party escrow agent who will reliably uphold the stated bargain?

Alternately, if my Matrix Lord friend breaches his contract with you, is someone Even More Powerful standing by to forcibly remedy the non-performance?

Absent either of the above conditions, is my Matrix Lord friend participating in an iterated trading game wherein cheating on today’s deal will subject him to less attractive terms on future deals, such that the net present value of his future earnings would be diminished by more than the amount he can steal from you today?

Since none of these three criteria seem to apply, there is no deal to be made here. The power asymmetry enables him to do whatever he feels like regardless of your actions, and he is just toying with you! Do you really think your $5 means anything to him? He’ll spend it making 3↑↑↑3 paperclips for all you know.

Your $5 will not exert any predictable causal influence on the fate of the hypothetical 3↑↑↑3 Matrix Lord hostages. Decision theory doesn’t even begin to apply.

You should stick to taking boxes from Omega; at least she has an established reputation for paying out as promised.

Caveat emptor, the boxes she gave me always were empty!

I don’t at all think that this is central to the problem, but I do think you’re equating “bits” of sensory data with “bits” of evidence far too easily. There is no

law of probability theorythat forbids you from assigning probability 1/3^^^3 to the next bit in your input stream being a zero—so as far asprobability theoryis concerned, there is nothing wrong with receiving only one input bit and as a result ending up believing a hypothesis that you assigned probability 1/3^^^3 before.Similarly, probability theory allows you to assign prior probability 1/3^^^3 to seeing the blue hole in the sky, and therefore believing the mugger after seeing it happen anyway. This may not be a good thing to do on other principles, but probability theory does not forbid it.

ETA:In particular, if you feel between a rock and a bad place in terms of possible solutions to Pascal’s Muggle, then you can at leastconsiderassigning probabilities this way even if it doesn’t normally seem like a good idea.True, but it seems crazy to be that certain about what you’ll see. It doesn’t seem that unlikely to hallucinate that happening. It doesn’t seem that unlikely for all the photons and phonons to just happen to converge in some pattern that makes it look and sound exactly like a Matrix Lord.

You’re basically assuming that your sensory equipment is vastly more reliable than you have evidence to believe, just because you want to make sure that if you get a positive, you won’t just assume it’s a false positive.

Actually, there is such a law. You cannot reasonably start, when you are born into this world, naked, without any sensory experiences, expecting that the next bit you experience is much more likely to be 1 rather than 0. If you encounter one hundred zillion bits and they all are 1, you still wouldn’t assign 1/3^^^3 probability to next bit you see being 0, if you’re rational enough.

Of course, this is mudded by the fact that you’re not born into this world without priors and all kinds of stuff that weights on your shoulders. Evolution has done billions of years worth of R&D on your priors, to get them straight. However, the gap these evolution-set priors would have to cross to get even close to that absurd 1/3^^^3… It’s a theoretical possibility that’s by no stretch a realistic one.

Related: Would an AI conclude it’s likely to be a Boltzmann brain? ;)

Everyone’s a Boltzmann brain to some degree.

Or even if the AI experienced an intelligence explosion the danger is that it would not believe it had really become so important because the prior odds of you being the most important thing that will probably ever exist is so low.

Edit: The AI could note that it uses a lot more computing power than any other sentient and so give itself an anothropic weight much greater than 1.

With respect to this being a “danger,” don’t Boltzmann brains have a decision-theoretic weight of zero?

Why zero? If you came to believe there was a 99.99999% chance you are currently dreaming wouldn’t it effect your choices?

Two quick thoughts:

Anytwo theories can be made compatible if allowing for some additional correction factor (e.g. a “leverage penalty”) designed tomakethem compatible. As such, all the work rests with “is the leverage penalty justified?”For said justification, there has to some sort of justifiable territory-level reasoning, including “does it carve reality at its joints?” and such, “is this the world we live in?”.

The problem I see with the leverage penalty is that there is no Bayesian updating way that will get you

tosuch a low prior. It’s the mirror from “can never process enough bits to getawayfrom such a low prior”, namely “can never process enough bits to gettoassigning such low priors” (the blade cuts both ways).The reason for that is in part that your entire level of confidence you have in the governing laws of physics, and the causal structure and dependency graphs and such is predicated on the sensory bitstream of your previous life—no more, it’s a strictly upper bound. You can gain confidence that a prior to affect a googleplex people is that low only by using that lifetime bitstream you have accumulated—but then the trap shuts, just as you can’t get out of such a low prior, you cannot use any confidence you gained in the current system by ways of your lifetime sensory input to get

tosuch a low prior. You can beverysure you can’t affect that many, based on your understanding of how causal nodes are interconnected, but you can’t bethatsure (since you base your understanding on a comparatively much smaller number of bits of evidence):It’s a prior ex machina, with little more justification than just saying “I don’t deal with numbers that large/small in my decision making”.

You probably shouldn’t let super-exponentials into your probability assignments, but you also shouldn’t let super-exponentials into the range of your utility function. I’m really not a fan of having a discontinuous bound anywhere, but I think it’s important to acknowledge that when you throw a trip-up (^^^) into the mix, important assumptions start breaking down all over the place. The VNM independence assumption no longer looks convincing, or straightforward. Normally my preferences in a Tegmark-style multiverse would reflect a linear combination of my preferences for its subcomponents; but throw a 3^^^3 in the mix, and this is no longer the case, so suddenly you have to introduce new distinctions between logical uncertainty and at least one type of reality fluid.

My short-term hack for Pascal’s Muggle is to recognize that my consequentialism module is just throwing exceptions, and fall back on math-free pattern matching, including low-weighted deontological and virtue-ethical values that I’ve kept around for just such an occasion. I am very unhappy with this answer, but the long-term solution seems to require fully figuring out how I value different kinds of reality fluid.

Is it just me, or is everyone here overly concerned with coming up with patches for

this specific caseand not the more general problem? If utilities can grow vastly larger than the prior probability of the situation that contains them, then an expected utility system will become almost useless. Acting on situations with probabilities as tiny as can possibly be represented in that system, since the math would vastly outweigh the expected utility from acting on anything else.I’ve heard people come up with apparent resolutions to this problem. Like counter balancing every possible situation with an equally low probability situation that has vast negative utility. There are a lot of problems with this though. What if the utilities don’t

exactlycounterbalance? An extra bit to represent a negative utility for example, might add to the complexity and therefore the prior probability. Or even a tiny amount of evidence for one scenario over the other would completely upset it.And even if that isn’t the case, your utility might not have negative. Maybe you only value the number of paperclips in the universe. The worst that can happen is you end up in a universe with no paperclips. You can’t have negative paperclips, so the lowest utility you can have is 0. Or maybe your positive and negative values don’t exactly match up. Fear is a better motivator than reward, for example. The fear of having people suffer may have more negative utility than the opposite scenario of just as many people living happy lives or something (and since they are both different scenarios with more differences than a single number, they would have different prior probabilities to begin with.)

Resolutions that involve tweaking the probability of different events is just cheating since the probability shouldn’t change if the universe hasn’t. It’s how you act on those probabilities that we should be concerned about. And changing the utility function is pretty much cheating too. You can make all sorts of arbitrary tweaks that would solve the problem, like having a maximum utility or something. But if you really found out you lived in a universe where 3^^^3 lives existed (perhaps aliens have been breeding extensively, or we really do live in a simulation, etc), are you just supposed to stop caring about all life since it exceeds your maximum amount of caring?

I apologize if I’m only reiterating arguments that have already been gone over. But it’s concerning to me that people are focusing on extremely sketchy patches to a specific case of this problem, and not the more general problem, that any expected utility function becomes apparently worthless in a probabilistic universe like ours.

EDIT: I think I might have a solution to the problem and posted it here.

The idea is that it’d be great to have a formalism where they do by construction.

Also, when there’s no third party, it’s not distinct enough from Pascal’s Wager as to demand extra terminology that focusses on the third party, such as “Pascal’s Mugging”. If it is just agent doing contemplations by itself, that’s the agent making a wager on it’s hypotheses, not getting mugged by someone.

I’ll just go ahead and use “Pascal Scam” to describe a situation where an in-distinguished agent promises unusually huge pay off, and the mark erroneously gives in due to some combination of bad priors and bad utility evaluation. The common errors seem to be 1: omit the consequence of keeping the money for a more distinguished agent, 2: assign too high prior, 3: and, when picking between approaches, ignore the huge cost of acting in a manner which encourages disinformation. All those errors act in favour of the scammer (and some are optional), while non-erroneous processing would assign huge negative utility to paying up even given high priors.

There is no real way of doing that without changing your probability function or your utility function. However you can’t change those. The real problem is with the expected utility function and I don’t see any way of fixing it, though perhaps I missed something.

Any agent subject to Pascal’s Mugging would fall pray to this problem first, and it would be far worse. While the mugger is giving his scenario, the agent could imagine an even more unlikely scenario. Say one where the mugger actually gives him 3^^^^^^3 units of utility if he does some arbitrary task, instead of 3^^^3. This possibility immediately gets so much utility that it far outweighs anything the mugger has to say after that. Then the agent may imagine

an even more unlikely scenariowhere it gets 3^^^^^^^^^^3 units of utility, and so on.I don’t really know what an agent would do if the expected utility of any action approached infinity. Perhaps it would generally work out as some things would approach infinity faster than others. I admit I didn’t consider that. But I don’t know if that would necessarily be the case. Even if it is it seems “wrong” for expected utilities of everything to be infinite and only tiny probabilities to matter for anything. And if so then it would work out for the pascal’s mugging scenario too I think.

Last time I checked, priors were fairly subjective even here. We don’t know what is the best way to assign priors. Things like “Solomonoff induction” depend to arbitrary choice of machine.

Nope, people who end up 419-scammed or waste a lot of money investing into someone like Randel L Mills or Andrea Rossi live through their life ok until they read a harmful string in a harmful set of circumstances (bunch of other believers around for example).

Priors are indeed up for grabs, but a set of priors about the universe ought be consistent with itself, no? A set of priors based only on complexity may indeed not be the best set of priors—that’s what all the discussions about “leverage penalties” and the like are about, enhancing Solomonoff induction with something extra. But what you seem to suggest is a set of priors about the universe that are designed for the express purposes of making

human utilitycalculations balance out? Wouldn’t such a set of priors require the anthroporphization of the universe, and effectively mean sacrificing all sense of epistemic rationality?The best “priors” about the universe are 1 for what that universe right around you is, and 0 for everything else. Other priors are a compromise, an engineering decision.

What I am thinking is that

there is a considerably better way to assign priors which we do not know of yet—the way which will assign equal probabilities to each side of a die if it has no reason to prefer one over the other—the way that does correspond to symmetries in the evidence.

We don’t know that there will still be same problem when we have a non-stupid way to assign priors (especially as the non-stupid way ought to be considerably more symmetric). And it may be that some value systems are intrinsically incoherent. Suppose you wanted to maximize blerg without knowing what blerg even really is. That wouldn’t be possible, you can’t maximize something without having a measure of it. But I still can tell you i’d give you 3^^^^3 blergs for a dollar, without either of us knowing what blerg is supposed to be or whenever 3^^^^3 blergs even make sense (if blerg is an unique good book of up to 1000 page length, it doesn’t because duplicates aren’t blerg).

True, but the goal of a probability function is to represent the actual probability of an event happening as closely as possible. The map should correspond to the territory. If your map is good, you shouldn’t change it unless you observe actual changes in the territory.

I don’t know if those things have such extremes in low probability vs high utility to be called pascal’s mugging. But even so, the human brain doesn’t operate on anything like Solomonoff induction, Bayesian probability theory, or expected utility maximization.

The actual probability is either 0 or 1 (either happens or doesn’t happen). Values in-between quantify ignorance and partial knowledge (e.g. when you have no reason to prefer one side of the die to the other), or, at times, are chosen very arbitrarily (what is the probability that a physics theory is “correct”).

New names for same things are kind of annoying, to be honest, especially ill chosen… if it happens by your own contemplation, I’d call it Pascal’s Wager. Mugging implies someone making threats, scam is more general and can involve promises of reward. Either way the key is the high payoff proposition wrecking some havoc, either through it’s prior probability being too high, other propositions having been omitted, or the like.

People are still agents, though.

Yes but the goal is to assign whatever outcome that will actually happen with the highest probability as possible, using whatever information we have. The fact that some outcomes result in ridiculously huge utility gains does not imply anything about how likely they are to happen, so there is no reason that should be taken into account (unless it actually does, in which case it should.)

Pascal’s mugging was an absurd scenario with absurd rewards that approach infinity. What you are talking about is just normal everyday scams. Most scams do not promise such huge rewards or have such low probabilities (if you didn’t know any better it is feasible that someone could have an awesome invention or need your help with transaction fees.)

And the problem with scams is that people overestimate their probability. If they were to consider how many emails in the world are actually from Nigerian Princes vs scammers, or how many people promise awesome inventions without any proof they will actually work, they would reconsider. In pascal’s mugging, you fall for it even after having considered the probability of it happening in detail.

Your probability estimation could be absolutely correct. Maybe 1 out of a trillion times a person meets someone claiming to be a matrix lord, they are actually telling the truth. And they still end up getting scammed, so that the 1 in a trillionth counter-factual of themselves gets infinite reward.

They are agents, but they aren’t subject to this specific problem because we don’t really use expected utility maximization. At best maybe some kind of poor approximation of it. But it is a problem for building AIs or any kind of computer system that makes decisions based on probabilities.

I think you’re considering a different problem than Pascal’s Mugging, if you’re taking it as a given that the probabilities are indeed 1 in a trillion (or for that matter 1 in 10). The original problem doesn’t make such an assumption.

What you have in mind, the case of definitely known probabilities, seems to me more like The LifeSpan dilemma where e.g. “an unbounded utility on lifespan implies willingness to trade an 80% probability of living some large number of years for a 1/(3^^^3) probability of living some sufficiently longer lifespan”

The wiki page on it seems to suggest that this is the problem.

Also this

which is pretty concerning.

I’m curious what you think the problem with Pascal’s Mugging is though. That you can’t easily estimate the probability of such a situation? Well that is true of anything and isn’t really unique to Pascal’s Mugging. But we can still approximate probabilities. A necessary evil to live in a probabilistic world without the ability to do perfect Bayesian updates on all available information, or unbiased priors.

I abhor using unnecessary novel jargon.

Bad math being internally bad, that’s the problem. Nothing to do with any worlds, real or imaginary, just a case of internally bad math—utilities are undefined, it is undefined if you pay up or not, the actions chosen are undefined. Akin to maximizing blerg without any definition of what blerg even is—maximizing “expected utility” without having defined it.

Speed prior works, for example (it breaks some assumptions of Blanc. Namely, the probability is not bounded from below by any computable function of length of the hypothesis).

Call it undefined if you like, but I’d still prefer 3^^^3 people not suffer. It would be pretty weird to argue that human lives decay in utility based on how many there are. If you found out that the universe was bigger than you thought, that there really were far more humans in the universe somehow, would you just stop caring about human life?

It would also be pretty hard to argue that at least some small amount of money isn’t worth giving in order to save a human life, or that giving a small amount of money isn’t worth a small probability of saving enough lives to make up for the small probability.

Well, suppose there’s mind uploads, and one mind upload is very worried about himself so he runs himself multiply redundant with 5 exact copies. Should this upload be a minor utility monster?

3^^^3 is far more than there are possible people.

Bounded doesn’t mean it just hits a cap and stays there. Also, if you scale all utilities that you can effect down it changes nothing about actions (another confusion—mapping the utility to how much one cares).

And yes there are definitely cases where money are worth small probability of saving lives, and everyone agrees on such—e.g. if we find out that an asteroid has certain chance to hit Earth, we’d give money to space agencies, even when chance is rather minute (we’d not give money to cold fusion crackpots though). There’s nothing fundamentally wrong with spending a bit to avert a small probability of something terrible happening. The problem arises when the probability is overestimated, when the consequences are poorly evaluated, etc. It is actively harmful for example to encourage boys to cry wolf needlessly. I’m thinking people sort of feel innately that if they are giving money away—losing—some giant fairness fairy is going to make the result more likely good than bad for everyone. World doesn’t work like this; all those naive folks who jump on opportunity to give money to someone promising to save the world, no matter how ignorant, uneducated, or crackpotty that person is in the fields where correctness can be checked at all, they are increasing risk, not decreasing.

Maybe not as weird as all that. Given a forced choice between killing A and B where I know nothing about them, I flip a coin, but add the knowledge that A is a duplicate of C and B is not a duplicate of anyone, and I choose A quite easily. I conclude from this that I value unique human lives quite a lot more than I value non-unique human lives. As others have pointed out, the number of unique human lives is finite, and the number of lives I consider worth living necessarily even lower, so the more people there are living lives worth living, the less unique any individual is, and therefore the less I value any individual life. (Insofar as my values are consistent, anyway. Which of course they aren’t, but this whole “lets pretend” game of utility calculation that we enjoy playing depends on treating them as though they were.)

There is no evidence for the actual existence of neatly walled-of and unupdateable utility functions or probability functions, any more than there is for a

luz’.Utility and probability functions are not perfect or neatly walled off. But that doesn’t mean you should change them to fix a problem with your expected utility function. The goal of a probability function is to represent the actual probability of an event happening as closely as possible. And the goal of a utility function is to represent what you states you would prefer the universe to be in. This also shouldn’t change unless you’ve actually changed your preferences.

There’s plenty of evidence of people changing their preferences over significant periods of time: it would be weird not to. And I am well aware that the theory of stable utility functions is standardly patched up with a further theory of terminal values, for which there is also no direct evidence.

Of course people can change their preferences. But if your preferences are not consistent you will likely end up in situations that are less preferable than if you had the same preferences the entire time. It also makes you a potential money pump.

What? Terminal values are not a patch for utility functions. It’s basically another word that means the same thing, what state you would prefer the world to end up in. And how can there be evidence for a decision theory?

Well, I’ve certainly seen discussions here in which the observed inconsistency among our professed values is treated as a non-problem on the grounds that those are mere

instrumentalvalues, and ourterminalvalues are presumed to be more consistent than that.Insofar as stable utility functions depend on consistent values, it’s not unreasonable to describe such discussions as positing consistent terminal values in order to support a belief in stable utility functions.

Well, how is this different from changing our preferences to utility functions to fix problems with our naive preferences?

I don’t know what you mean. All I’m saying is that you shouldn’t change your preferences because of a problem with your expected utility function. Your preferences are just what you want. Utility functions are just a mathematical way of expressing that.

Human preferences don’t naturally satisfy the VNM axioms, thus by expressing them as a utility function you’ve already changed them.

I don’t see why our preferences can’t be expressed by a utility function even as they are. The only reason it wouldn’t work out is if there were circular preferences, and I don’t think most peoples preferences would work out to be truly circular if they were to think about the specific occurrence and decide what they really preferred.

Though mapping out which outcomes are more preferred than others is not enough to assign them an actual utility, you’d somehow have to guess how

muchmore preferable one outcome is to another quantitatively.But even then I think most people could if they thought about it enough. The problem is that our utility functions are complex and we don’t really know what they are, not that they don’t exist.Or they might violate the independence axiom, but in any case what do you mean by ” think about the specific occurrence and decide what they really preferred”, since the result of such thinking is likely to depend on the exact order they thought about things in.

Nick Beckstead’s finished but as-yet unpublished dissertation has much to say on this topic. Here is Beckstead’s summary of chapters 6 and 7 of his dissertation:

Ex ante, when the AI assigns infinitesimal probability to the real thing, and meaningful probability to “hallucination/my sensors are being fed false information,” why doesn’t it self-modify/self-bind to treat future apparent cat transformations as hallucinations?

I don’t buy this. Consider the following combination of features of the world and account of anthropic reasoning (brought up by various commenters in previous discussions), which is at least very improbable in light of its specific features and what we know about physics and cosmology, but not cosmically so.

A world small enough not to contain ludicrous numbers of Boltzmann brains (or Boltzmann machinery)

Where it is possible to create hypercomputers through complex artificial means

Where hypercomputers are used to compute arbitrarily many happy life-years of animals, or humanlike beings with epistemic environments clearly distinct from our own (YOU ARE IN A HYPERCOMPUTER SIMULATION tags floating in front of their eyes)

And the hypercomputed beings are not less real or valuable because of their numbers and long addresses

Treating this as infinitesimally likely, and then jumping to measurable probability on receipt of (what?) evidence about hypercomputers being possible, etc, seems pretty unreasonable to me.

The behavior you want could be approximated with a bounded utility function that assigned some weight to achieving big payoffs/achieving a significant portion (on one of several scales) of possible big payoffs/etc. In the absence of evidence that the big payoffs are possible, the bounded utility gain is multiplied by low probability and you won’t make big sacrifices for it, but in the face of lots of evidence, and if you have satisfied other terms in your utility function pretty well, big payoffs could become a larger focus.

Basically, I think such a bounded utility function could better track the emotional responses driving your intuitions about what an AI should do in various situations than jury-rigging the prior. And if you don’t want to track those responses then be careful of those intuitions and look to empirical stabilizing assumptions.

It seems reasonable to me because on the stated assumptions—the floating tags seen by vast numbers of other beings but not yourself—you’ve managed to generate sensory data with a vast likelihood ratio. The vast update is as reasonable as this vast ratio, no more, no less.

The problem is that you seem to be introducing one dubious piece to deal with another. Why is the hypothesis that those bullet points hold infinitesimally unlikely rather than very unlikely in the first place?

I think the bullet points as a whole are “very unlikely” (the universe as a whole has some Kolmogorov complexity, or equivalent complexity of logical axioms, which determines this); within that universe your being one of the non-hypercomputed sentients is infinitesimally unlikely, and then there’s a vast update when you don’t see the tag. How would you reason in this situation?

OK, but if you’re willing to buy all that, then the expected payoff in some kind of stuff for almost any action (setting aside opportunity costs and empirical stabilizing assumptions) is also going to be cosmically large, since you have some prior probability on conditions like those in the bullet pointed list blocking the leverage considerations.

Hm. That does sound like a problem. I hadn’t considered the problem of finite axioms giving you unboundedly large likelihood ratios over your exact situation. It seems like this ought to violate the Hansonian principle somehow but I’m not sure to articulate it...

Maybe not seeing the tag updates against the probability that you’re in a universe where non-tags are such a tiny fraction of existence, but this sounds like it also ought to replicate Doomsday type arguments and such? Hm.

Really? People have been raising this (worlds with big payoffs and in which your observations are not correspondingly common) from the very beginning. E.g. in the comments of your original Pascal’s Mugging post in 2007, Michael Vassar raised the point:

and you replied:

Wei Dai and Rolf Nelson discussed the issue further in the comments there, and from different angles. And it is the obvious pattern-completion for “this argument gives me nigh-infinite certainty given its assumptions—now do I have nigh-infinite certainty in the assumptions?” i.e. Probing the Improbable issues. This is how I explained the unbounded payoffs issue to Steven Kaas when he asked for feedback on earlier drafts of his recent post about expected value and extreme payoffs (note how he talks about our uncertainty re anthropics and the other conditions required for Hanson’s anthropic argument to go through).

Hanson endorses SIA. So he would multiply the possible worlds by the number of copies of his observations therein. A world with 3^^^3 copies of him would get a 3^^^3 anthropic update. A world with only one copy of his observations that can affect 3^^^^3 creatures with different observations would get no such probability boost.

Or if one was a one-boxer on Newcomb one might think of the utility of ordinary payoffs in the first world as multiplied by the 3^^^3 copies who get them.

Just gonna jot down some thoughts here. First a layout of the problem.

Expected utility is a product of two numbers, probability of the event times utility generated by the event.

Traditionally speaking, when the event is claimed to affect 3^^^3 people, the utility generated is on the order of 3^^^3

Traditionally speaking, there’s nothing about the 3^^^3 people that requires a super-exponentially large extension to the complexity of the system (the univers/multivers/etc). So the probability of the event does

notscale like 1/(3^^^3)Thus Expected Payoff becomes enormous, and you should pay the dude $5.

If you actually follow this, you’ll be mugged by random strangers offerring to save 3^^^3 people or whatever super-exponential numbers they can come up with.

In order to avoid being mugged, your suggestion is to apply a scale penalty (leverage penalty) to the

probability. You then notice that this has some very strange effects on your epistemology—you become incapable of ever believing the 5$ will actually help no matter how much evidence you’re given, even though evidence can make the expected payoff large. You then respond tothisproblem with what appears to be an excuse to be illogical and/or non-bayesian at times (due to finite computing power).It seems to me that an alternative would be to rescale the untility value, instead of the probability. This way, you wouldn’t run into any epistemic issues anywhere because you aren’t messing with the epistemics.

I’m not proposing we rescale Utility(save X people) by a factor 1/X, as that would make Utility(save X people) = Utility(save 1 person) all the time, which is obviously problematic. Rather, my idea is to make Utility a

per capitaquantity. That way, when the random hobo tells you he’ll save 3^^^3 people, he’s making a claim that requires there to be at least 3^^^3 people to save. If this does turn out to be true, keeping your Utility as a per capita quantity will require a rescaling on the order of 1/(3^^^3) to account for the now-much-larger population. This gives you a small expected payoff without requiring problematically small prior probabilities.It seems we humans may already do a rescaling of this kind anyway. We tend to value rare things more than we would if they were common, tend to protect an endangered species more than we would if it weren’t endangered, and so on. But I’ll be honest and say that I haven’t really thought the consequences of this utility re-scaling through very much. It just seems that if you need to rescale a product of two numbers and rescaling one of the numbers causes problems, we may as well try rescaling the other and see where it leads.

Any thoughts?

This reminds me a lot of Levin’s universal search algorithm, and the associated Levin complexity.

To formalize, I think you will want to assign each program p, of length #p, a prior weight 2^-#p (as in usual Solomonoff induction), and then divide that weight among the execution steps of the program (each execution step corresponding to some sort of causal node). So if program p executes for t steps before stopping, then each individual step gets a prior weight 2^-#p/t. The connection to universal search is as follows: Imagine dovetailing all possible programs on one big computer, giving each program p a share 2^-#’p of all the execution steps. (If the program stops, then start it again, so that the computer doesn’t have idle steps). In the limit, the computer will spend a proportion 2^-#p/t of its resources executing each particular step of p, so this is an intuitive sense of the step’s prior “weight”.

You’ll then want to condition on your evidence to get a posterior distribution. Most steps of most programs won’t in any sense correspond to an intelligent observer (or AI program) having your evidence, E, but some of them will. Let nE(p) be the number of steps in a program p which so-correspond (for a lot of programs nE(p) will be zero) and then program p will get posterior weight proportional to 2^-#p x (nE(p) / t). Normalize, and that gives you the posterior probability you are in a universe executed by a program p.

You asked if there are any anthropic problems with this measure. I can think of a few:

Should “giant” observers (corresponding to lots of execution steps) count for more weight than “midget” observers (corresponding to fewer steps)? They do in this measure, which seems a bit counter-intuitive.

The posterior will tend to focus weight on programs which have a high proportion (nE(p) / t) of their execution steps corresponding to observers like you. If you take your observations at face value (i.e. you are not in a simulation), then this leads to the same sort of “Great Filter” issues that Katja Grace noticed with the SIA. There is a shift towards universes which have a high density of habitable planets, occupied by observers like us, but where very few or none of those observers ever expand off their home worlds to become super-advanced civilizations, since if they did they would take the executions steps away from observers like us.

There also seems to be a good reason in this measure NOT to take your observations at face value. The term nE(p) / t will tend to be maximized in universes very unlike ours: ones which are built of dense “computronium” running lots of different observer simulations, and you’re one of them. Our own universe is very “sparse” in comparison (very few execution steps corresponding to observers).

Even if you deal with simulations, there appears to be a “cyclic history” problem. The density nE(p)/t will tend to be is maximized if civilizations last for a long time (large number of observers), but go through periodic “resets”, wiping out all traces of the prior cycles (so leading to lots of observers in a state like us). Maybe there is some sort of AI guardian in the universe which interrupts civilizations before they create their own (rival) AIs, but is not so unfriendly as to wipe them out altogether. So it just knocks them back to the stone age from time to time. That seems highly unlikely a priori, but it does get magnified a lot in posterior probability.

On the plus side, note that there is no particular reason in this measure to expect you are in a very big universe or multiverse, so this defuses the “presumptuous philosopher” objection (as well as some technical problems if the weight is dominated by infinite universes). Large universes will tend to correspond to many copies of you (high nE(p)) but also to a large number of execution steps t. What matters is the density of observers (hence the computronium problem) rather than the total size.

I think the simpler solution is just to use a bounded utility function. There are several things suggesting we do this, and I really don’t see any reason to not do so, instead of going through contortions to make unbounded utility work.

Consider the paper of Peter de Blanc that you link—it doesn’t say a computable utility function won’t have convergent utilities, but rather that it will iff said function is bounded. (At least, in the restricted context defined there, though it seems fairly general.) You could try to escape the conditions of the theorem, or you could just conclude that utility functions should be bounded.

Let’s go back and ask the question of why we’re using probabilities and utilities in the first place. Is it because of Savage’s Theorem? But the utility function output by Savage’s Theorem is always bounded.

OK, maybe we don’t accept Savage’s axiom 7, which is what forces utility functions to be bounded. But then we can only be sure that comparing expected utilities is the right thing to do for finite gambles, not for infinite ones, so talking about sums converging or not—well, it’s something that shouldn’t even come up. Or alternatively, if we do encounter a situation with infinitely many choices, each of differing utility, we simply don’t know what to do.

Maybe we’re not basing this on Savage’s theorem at all—maybe we simply take probability for granted (or just take for granted that it should be a real number and ground it in something like Cox’s theorem—after all, like Savage’s theorem, Cox’s theorem only requires that probability be finitely additive) and are then deriving utility from the VNM theorem. The VNM theorem doesn’t prohibit unbounded utilities. But the VNM theorem once again only tells us how to handle finite gambles—it doesn’t tell us that infinite gambles should also be handled via expected utility.

OK, well, maybe we don’t care about the particular grounding—we’re just going to use probability and utility because it’s the best framework we know, and we’ll make the probability countably additive and use expected utility in all cases hey, why not, seems natural, right? (In that case, the AI may want to eventually reconsider whether probability and utility really is the best framework to use, if it is capable of doing so.) But even if we throw all that out, we still have the problem de Blanc raises. And, um, all the other problems that have been raised with unbounded utility. (And if we’re just using probability and utility to make things nice, well, we should probably use bounded utility to make things nicer.)

I really don’t see any particular reason utility has to be unbounded either. Eliezer Yudkowsky seems to keep using this assumption that utility should be unbounded, or just not necessarily bounded, but I’ve yet to see any justification for this. I can find one discussion where, when the question of bounded utility functions came up, Eliezer responded, “[To avert a certain problem] the bound would also have to be substantially less than 3^^^^3.”—but this indicates a misunderstanding of the idea of utility, because utility functions can be arbitrarily (positively) rescaled or recentered. Individual utility “numbers” are not meaningful; only ratios of utility differences. If a utility function is bounded, you can assume the bounds are 0 and 1. Talk about the value of the bound is as meaningless as anything else using absolute utility numbers; they’re not amounts of fun or something.

Sure, if you’re taking a total-utilitarian viewpoint, then your (decision-theoretic) utility function has to be unbounded, because you’re summing a quantity over an arbitrarily large set. (I mean, I guess physical limitations impose a bound, but they’re not logical limitations, so we want to be able to assign values to situations where they don’t hold.) (As opposed to the individual “utility” functions that your’e summing, which is a different sort of “utility” that isn’t actually well-defined at present.) But total utilitarianism—or utilitarianism in general—is on much shakier ground than decision-theoretic utility functions and what we can do with them or prove about them. To insist that utility be unbounded based on total utilitarianism (or any form of utilitarianism) while ignoring the solid things we can say seems backwards.

Not everything has to scale linearly, after all. There seems to be this idea out there that utility must be unbounded because there are constants C_1 and C_2 such that adding to the world of person of “utility” (in the utilitarian sense) C_1 must increase your utility (in the decision-theoretic sense) by C_2, but this doesn’t need to be so. This to me seems a lot like insisting “Well, no matter how fast I’m going, I can always toss a baseball forward in my direction at 1 foot per second relative to me; so it will be going 1 foot per second faster than me, so the set of possible speeds is unbounded.” As it turns out, the set of possible speeds is bounded, velocities don’t add linearly, and if you toss a baseball forward in your direction at 1 foot per second relative to you, it will not be going 1 foot per second faster.

My own intuition is more in line with earthwormchuck163′s comment—I doubt I would be that joyous about making that many more people when so many are going to be duplicates or near-duplicates of one another. But even if you don’t agree with this, things don’t have to add linearly, and utilities don’t have to be unbounded.

I think he was assuming a natural scale. After all, you can just pick some everyday-sized utility difference to use as your unit, and measure everytihng on that scale. It wouldn’t really matter what utility difference you pick as long as it is a natural size, since multiplying by 3^^^3 is easily enough for the argument to go through.

Using a bounded utility function is the what you do if and only if your preferences happen to be bounded in that way. The utility function is not up for grabs. You don’t change the utility function because it makes decision making more convenient (well, unless you have done a lot of homework).

As it happens I don’t make (hypothetical) decisions as if assign linear value to person-lives. That is because as best as I can tell my actual preference really does assign less value to the 3^^^3rd person-life created than to the 5th person-life. However, someone who does care just as much about each additional person would be making an error if they acted as if they had a bounded utility function.

Your argument proves a bit too much, I think. I could equally well reply, “Using a utility function is what you do if and only if your preferences are described by a utility function. Terminal values are not up for grabs. You don’t reduce your terminal values to a utility function just because it makes decision making more convenient.”

The fact of the matter is that our preferences are not naturally described by a utility function; so if we’ve agreed that the AI should use a utility function, well, there must be some reason for that other than “it’s a correct description of our preferences”, i.e., we’ve agreed that such reasons are worth consideration. And I don’t see any such reason that doesn’t also immediately suggest we should use a bounded utility function (at least, if we want to be able to consider infinite gambles).

So I’m having trouble believing that your position is consistent. If you said we should do away with utility functions entirely to better model human terminal values, that would make sense. But why would you throw out the bounded part, and then keep the utility function part? I’m having trouble seeing any line of reasoning that would support both of those simultaneously. (Well unless you want to throw out infinite gambles, which does seems like a consistent position. Note, though, that in that case we also don’t have to do contortions like in this post.)

Edit: Added notes about finite vs. infinite gambles.If you were to generalise it would have to be to something like “only if your preferences can be represented without loss as a utility function”. Even then there are exceptions. However the intricacies of resolving complex and internally inconsistent agents seems rather orthogonal to the issue of how a given agent would behave in the counter-factual scenario presented.

Meanwhile, I evaluate your solution to this problem (throw away the utility function and replace it with a different one) to be equivalent to, when encountering Newcomb’s Problem, choosing the response “Self modify into a paperclip maximiser, just for the hell of it, then choose whichever box choice maximises paperclips”. That it seems to be persuasive to readers makes this thread all too surreal for me. Tapping out before candidness causes difficulties.

It’s not clear to me what distinction you are attempting to draw between “Can be described by a utility function” and “can be represented without loss as a utility function”. I don’t think any such distinction can sensibly be drawn. They seem to simply say the same thing.

I’d ask you to explain, but, well, I guess you’re not going to.

I’m not throwing out the utility function and replacing it with a different one, because there is no utility function. What there is is a bunch of preferences that don’t satisfy Savage’s axioms (or the VNM axioms or whichever formulation you prefer) and as such cannot actually be described by a utility function. Again—everything you’ve said works perfectly well as an argument against utility functions generally. (“You’re tossing out human preferences and using a utility function? So, what, when presented with Newcomb’s problem, you self-modify into a paperclipper and then pick the paperclip-maximizing box?”)

Perhaps I should explain in more detail how I’m thinking about this.

We want to implement an AI, and we want it to be rational in certain senses—i.e. obey certain axioms—while still implementing human values. Human preferences don’t satisfy these axioms. We could just give it human preferences and not worry about the intransitivity and the dynamic inconsistencies and such, or, we could force it a bit.

So we imagine that we have some (as yet unknown) procedure that takes a general set of preferences and converts it to one satisfying certain requirements (specific to the procedure). Obviously something is lost in the process. Are we OK with this? I don’t know. I’m not making a claim either way about this. But you are going to lose something if you apply this procedure.

OK, so we feed in a set of preferences and we get out one satisfying our requirements. What are our requirements? If they’re Savage’s axioms, we get out something that can be described by a utility function, and a bounded one at that. If they’re Savage’s axioms without axiom 7, or (if we take probability as a primitive) the VNM axioms, then we get out something that for finite gambles can be described by a utility function (not necessarily bounded), but which cannot necessarily be easily described for infinite gambles.

If I’m understanding you correctly, you’re reading me as suggesting a two-step process: First we take human values and force them into a utility function, then take that utility function and force it to be bounded. I am not suggesting that. Rather, I am saying, we take human values and force them to satisfy certain properties, and the result can then necessarily be described by a bounded utility function.

People on this site seem to often just assume that being rational

meansusing a utility function, not remembering that a utility function is just how we describe sets of preferences satisfying certain axioms. It’s not whether you use a utility function or not that it’s important, it’s questions like, are your preferences transitive? Do they obey the sure-thing principle? And so forth. Now, sure, the only way to obey all those requirementsisto use a utility function, but it’s important to keep the reason in mind.If we require the output of our procedure to obey Savage’s axioms, it can be described by a bounded utility function. That’s just a fact. If we leave out axiom 7 (or use the VNM axioms), then it can kind of be described by a utility function—for finite gambles it can be described by a utility function, and it’s not clear what happens for infinite gambles.

So do you include axiom 7 or no? (Well, OK, you might just use a different set of requirements entirely, but let’s assume it’s one of these two sets of requirements for now.) If yes, the output of your procedure will be a bounded utility function, and you don’t run into these problems with nonconvergence. If no, you also don’t run into these problems with nonconvergence—the procedure is required to output a coherent set of preferences, after all! -- but for a different reason: Because the set of preferences it output can only be modeled by a utility function for finite gambles. So if you start taking infinite weighted sums of utilities, the result doesn’t necessarily tell you anything about which one to choose.

So at no point should you be taking infinite sums with an unbounded utility function, because there is no underlying reason to do so. The only reason to do so that I can see is that, for your requirements, you’ve simply declared, “We’re going to require that the output of the procedure can be described by a utility function (including for infinite gambles).” But that’s just a silly set of requirements. As I said above—it’s not failing to use a utility function we should be avoiding; it’s the actual problems this causes we should be avoiding. Declaring at the outset we’re going to use a utility function, instead of that we want to avoid particular problems, is silly. I don’t see why you’d want to run human values through such a poorly motivated procedure.

So again, I’m not claiming you want to run your values through the machine and force them into a bounded utility function; but rather just that, if you want to run them through this one machine, you will get a bounded utility function; and if instead you run them through this other machine, you will get a utility function, kind of, but it won’t necessarily be valid for infinite gambles. Eliezer seems to want to run human values through the machine. Which one will he disprefer less? Well, he always seems to assume that comparing the expected utilities of infinite gambles is a valid operation, so I’m inferring he’d prefer the first one, and that one only outputs bounded utility functions. Maybe I’m wrong. But in that case he should stop assuming that comparing the expected utilities of infinite gambles is a valid operation.

You still get a probability function without Savage’s P6 and P7, you just don’t get a utility function with codomain the reals, and you don’t get expectations over infinite outcome spaces. If we add real-valued probabilities, for example by assuming Savage’s P6′, you even get finite expectations, assuming I haven’t made an error.

True.

That said, given some statement P about my preferences, such as “I assign linear value to person-lives,” such that P being true makes decision-making inconvenient, if I currently have C confidence in P then depending on C it may be more worthwhile to devote my time to gathering additional evidence for and against P than to developing a decision procedure that works in the inconvenient case.

On the other hand, if I keep gathering evidence about P until I conclude that P is false and then stop, that also has an obvious associated failure mode.

But that’s essentially already the case. Just consider the bound to be 3^^^^3 utilons, or even an illimited number of them. Those are not infinite, but still allow all the situations and arguments made above.

Paradoxes of infinity weren’t the issue in this case.

Again, individual utility numbers are not meaningful.

I’m not sure which “situations and arguments” you’re saying this still allows. It doesn’t allow the divergent sum that started all this.

I get the sense you’re starting from the position that rejecting the Mugging is correct, and then looking for reasons to support that predetermined conclusion. Doesn’t this attitude seem

dangerous? I mean, in the hypothetical world where accepting the Mugging is actually the right thing to do, wouldn’t this sort of analysis reject itanyway? (This is a feature of debates about Pascal’s Mugging in general, not just this post in particular.)That’s just how it is when you reason about reason; Neurath’s boat must be repaired while on the open sea. In this case, our instincts strongly suggest that what the decision theory seems to say we should do must be wrong, and we have to turn to the rest of our abilities and beliefs to adjudicate between them.

Well, besides that thing about wanting expected utilities to converge, from a rationalist-virtue perspective it seems relatively less dangerous to start from a position of someone rejecting something with no priors or evidence in favor of it, and relatively more dangerous to start from a position of rejecting something that has strong priors or evidence.

It seems to me like the whistler is saying that the probability of saving knuth people for $5 is exactly 1/knuth after updating for the Matrix Lord’s claim, not before the claim, which seems surprising. Also, it’s not clear that we need to make an FAI resistant to very very unlikely scenarios.

I’m a lot more worried about making an FAI behave correctly if it encounters a scenario which

we thoughtwas very very unlikely.Also, if the AI spreads widely and is around for a long time, it will eventually run into very unlikely scenarios. Not 1/3^^^3 unlikely, but pretty unlikely.

I enjoyed this really a lot, and while I don’t have anything insightful to add, I gave five bucks to MIRI to encourage more of this sort of thing.

(By “this sort of thing” I mean detailed descriptions of the actual problems you are working on as regards FAI research. I gather that you consider a lot of it too dangerous to describe in public, but then I don’t get to enjoy reading about it. So I would like to encourage you sharing some of the fun problems sometimes. This one was fun.)

Not ‘a lot’ and present-day non-sharing imperatives are driven by an (obvious) strategy to accumulate a long-term advantage for FAI projects over AGI projects which is impossible if all lines of research are shared at all points when they are not yet imminently dangerous. No present-day knowledge is imminently dangerous AFAIK.

Do you believe this to be possible? In modern times with high mobility of information

and peopleI have strong doubts a gnostic approach would work. You can hide small, specific, contained “trade secrets”, you can’t hide a large body of knowledge that needs to be actively developed.I can’t help but remember HPJEV talk about plausible deniability and how that relates to you telling people whether there is dangerous knowledge out there.

Thanks for the clarification!

I thought this was an engaging, well-written summary targeted to the general audience, and I’d like to encourage more articles along these lines. So as a follow-up question: How much income for MIRI would it take, per article, for the beneficial effects of sharing non-dangerous research to outweigh the negatives?

(Gah, the editor in me WINCES at that sentence. Is it clear enough or should I re-write? I’m asking how much I-slash-we should kick in per article to make the whole thing generally worth your while.)

Given how many underpaid science writers are out there, I’d have to say that ~50k/year would probably do it for a pretty good one, especially given the ‘good cause’ bonus to happiness that any qualified individual would understand and value. But is even 1k/week in donations realistic? What are the page view numbers? I’d pay $5 for a good article on a valuable topic; how many others would as well? I suspect the numbers don’t add up, but I don’t even have an order-of-magnitude estimate on current or potential readers, so I can’t myself say.

You need not only a good science writer, but one who either already groks the problem, or can be made to do so with a quick explanation.

Furthermore, they need to have the above qualifications without being capable of doing primary research on the problem (this is the issue with Eliezer—he would certainly be capable of doing it, but his time is better spent elsewhere.)

Well, $100K/year would probably pay someone to write things up full time, if we only had the right candidate hire for it—I’m not sure we do. The issue is almost never danger, it’s just that writing stuff up is hard.

Apropos the above conversation: Do you know Annalee Newitz? (Of io9). If not, would you like to? I think you guys would get on like a house on fire.

I can certainly see that people who can both understand these issues and write them up for a general audience would be rare. Working in your favor is the fact that writers in general are terribly underpaid, and a lot of smart tech journalists have been laid off in recent years. (I used to be the news editor for Dr. Dobb’s Journal, and although I am not looking for a job right now, I have contacts who could probably fill the position for you.)

But I did some back-of-the-envelope calculations and it doesn’t seem like this effort would pay for itself. I doubt you have enough questions like this to cover a daily article, and for a weekly one you’d need to take in over $2K in donations (counting taxes) to cover your writer’s salary. And that seems...unlikely.

Sad! But I get it.

I would love to have a conversation about this. Is the “tad” here hyperbole or do you actually have something mostly worked out that you just don’t want to post? On a first reading (and admittedly without much serious thought—it’s been a long day), it seems to me that this is where the real heavy lifting has to be done. I’m always worried that I’m missing something, but I don’t see how to evaluate the proposal without knowing how the super-updates are carried out.

Reallyinteresting, though.That hyperbole one. I wasn’t intending the primary focus of this post to be on the notion of a super-update—I’m not sure if

thatpart needs to make it into AIs, though it seems to me to be partially responsible for my humanlike foibles in the Horrible LHC Inconsistency. I agree that this notion is actuallyveryunderspecified but so is almost all of bounded logical uncertainty.Using “a tad” to mean “very” is understatement, not hyperbole.

One could call it

hypobole.Specifically, litotes.

If someone suggests to me that they have the ability to save 3^^^3 lives, and I assign this a 1/3^^^3 probability, and then they open a gap in the sky at billions to one odds, I would conclude that it is still extremely unlikely that they can save 3^^^3 lives. However, it is possible that their original statement is false and yet it would be worth giving them five dollars because they would save a billion lives. Of course, this would require further assumptions on whether people are likely to do things that they have not said they would do, but are weaker versions of things they did say they would do but are not capable of.

Also, I would assign lower probabilities when they claim they could save more people, for reasons that have nothing to do with complexity. For instance, “the more powerful a being is, the less likely he would be interested in five dollars” or :”a fraudster would wish to specify a large number to increase the chance that his fraud succeeds when used on ordinary utility maximizers, so the larger the number, the greater the comparative likelihood that the person is fraudulent”.

1) Sometimes what you may actually be seeing is disagreement on whether the hypothesis has a low probability.

2) Some of the arguments against Pascal’s Wager and Pascal’s Mugging don’t depend on the probability. For instance, Pascal’s Wager has the “worshipping the wrong god” problem—what if there’s a god who prefers that he not be worshipped and damns worshippers to Hell? Even if there’s a 99% chance of a god existing, this is still a legitimate objection (unless you want to say there’s a 99% chance

specificallyof one type of god).3) In some cases, it may be technically true that there is no low probability involved but there may be some other small number that the size of the benefit is multiplied by. For instance, most people discount events that happen far in the future. A highly beneficial event that happens far in the future would have the benefit multiplied by a very small number when considering discounting.

Of course in cases 2 and 3 that is not technically Pascal’s mugging by the original definition, but I would suggest the definition should be extended to include such cases. Even if not, they should at least be called something that acknowledges the similarity, like “Pascal-like muggings”.

1) It’s been applied to

cryonic preservation, fer crying out loud. It’s reasonable to suspect that the probability of that working is low, but anyone who says with current evidence that the probability isbeyond astronomically lowis being too silly to take seriously.The benefit of cryonic preservation isn’t astronomically high, though, so you don’t need a probability that is beyond astronomically low. First of all,even an infinitely long life after being revived only has a finite present value, and possibly a very low one, because of discounting. Second, the benefit from cryonics is the benefit you’d gain from being revived after being cryonically preserved,

minus the benefit that you’d gain from being revived after not cryonically preserved.(A really advanced society might be able to simulate us. If simulations count as us, simulating us counts as reviving us without the need for cryonic preservation.)I do not think that you have gotten Luke’s point. He was addressing your point #1, not trying to make a substantive argument in favor of cryonics.

I don’t think that either Pascal’s Wager or Pascal’s Mugging requires a probability that is astronomically low. It just requires that the size of the purported benefit be large enough that it overwhelms the low probability of the event.

No, otherwise taking good but long-shot bets would be a case of Pascal’s Mugging.

It needs to involve a breakdown in the math because you’re basically trying to evaluate infinity/infinity

Any similarities are arguments for giving them a maximally

differentname to avoid confusion, not a similar one. Would the English language really be better if rubies were called diyermands?Chemistry would not be improved by providing completely different names to chlorate and perchlorate (e.g. chlorate and sneblobs). Also, I think English might be better if rubies were called diyermands. If all of the gemstones were named something that followed a scheme similar to diamonds, that might be an improvement.

I disagree. Communication can be noisy, and if a bit of noise replaces a word with a word in a totally different semantic class the error can be recovered, whereas if it replaces it with a word in the similar class it can’t. See the last paragraph in myl’s comment to this comment.

Humans have the luxury of neither perfect learning nor perfect recall. In general, I find that my ability to learn and ability to recall words are much more limiting generally speaking than noisy communication channels. I think that there are other sources of redundancy in human communication that make noise less of an issue. For example, if I’m not sure if someone said “chlorate” or “perchlorate” often the ambiguity would be obvious, such as if it is clear that they had mumbled so I wasn’t quite sure what they said. In the case of the written word, Chemistry and context provide a model for things which adds as a layer of redundancy, similar to the language model described in the post you linked to.

It would take me at least twice as long to memorize random/unique alternatives to hypochlorite, chlorite, chlorate, perchlorate, multiplied by all the other oxyanion series. It would take me many times as long to memorize unique names for every acetyl compound, although I obviously acknowledge that Chemistry is the best case scenario for my argument and worst case scenario for yours. In the case of philosophy, I still think there are advantages to learning and recall for similar things to be named similarly. Even in the case of “Pascal’s mugging” vs. “Pascal’s wager”, I believe that it is easier to recall and thus easier to have cognition about in part because of the naming connection between the two, despite the fact that these are two different things.

Note that I am not saying I am in favor of calling any particular thing “Pascal-like muggings,” which draws an explicit similarity between the two, all I’m saying is that choosing a “maximally different name to avoid confusion” strikes me as being less ideal, and that if you called it a Jiro’s mugging or something, that would more than enough semantic distance between the ideas.

Okay, thats actually a good example. This caused me to re-think my position. After thinking, I’m still not sure that the analogy is actually valid though.

In chemistry, we have a systemic naming scheme. Systematic name schemes are good, because they let us guess word meanings without having to learn them. In a difficult field which most people learn only as adults if at all, this is a very good thing. I’m no chemist, but if I had to guess the words chlorate and perchlorate to cause confusion sometimes, but that this price is overall worth paying for a systemic naming scheme.

For gemstones, we do not currently have a systematic naming scheme. I’m not entirely sure that bringing one in would be good, there aren’t all that many common gemstones that we’re likely to forget them and frankly if it ain’t broke don’t fix it, but I’m not sure it would be bad either.

What would not be good would be to simply rename rubies to diyermands without changing anything else. This would not only result in misunderstandings, but generate the false impression that rubies and diamonds have something special in common as distinct from Sapphires and Emeralds (I apologise for my ignorance if this is in fact the case).

But at least in the case of gemstones we do not

alreadyhave a serious problem, I do not know of any major epistemic failures floating around to do with the diamond-ruby distinction.In the case of Pascal’s mugging, we have a complete epistemic disaster, a very specific very useful term have been turned into a useless bloated red-giant word, laden with piles of negative connotations and no actual meaning beyond ‘offer of lots of utility that I need an excuse to ignore’.

I know of almost nobody who has serious problems noticing the similarities between these situations, but tons of people seem not to realise there are any differences. The priority with terminology must be to separate the meanings and make it absolutely clear that these are not the same thing and need not be treated in the same way. Giving them similar names is nearly the worst thing that could be done, second only to leaving the situation as it is.

If you were to propose a systematic terminology for decision-theoretric dilemmas, that would be a different matter. I think I would disagree with you, the field is young and we don’t have a good enough picture of the space of possible problems, a systemic scheme risks reducing our ability to think beyond it.

But that is not what is being suggested, what is being suggested is creating an ad-hoc confusion generator by making deliberately similar terms for different situations.

This might all be rationalisation, but thats my best guess for why the situations feel different to me.

I agree with your analysis regarding the difference between systematic naming systems and merely similar naming. That said, the justification for more clearly separating Pascal’s mugging and this other unnamed situation does strike me as a political decision or rationalization. If the real world impact of people’s misunderstanding were beneficial for the AI friendly cause, I doubt if anyone here would be making much ado about it. I would be in favor of renaming moissanite to diamand if this would help avert our ongoing malinvestment in clear glittery rocks to the tune of billions of dollars and numerous lives, so political reasons can perhaps be justified in some situations.

I would agree that it is to some extent political. I don’t think its very dark artsy though, because it seems to be a case of getting rid of an anti-FAI misunderstanding rather than creating a pro-FAI misunderstanding.

I suspect it would be. The first time one encounters the word “ruby”, you have only context to go off of. But if the word sounded like “diamond”, then you could also make a tentative guess that the referent is also similar.

Do you really think this!? I admit to being extremely surprised to find anyone saying this.

If rubies were called diyermands it seems to me that people wouldn’t guess what it was when they heard it, they would simply guess that they had misheard ‘diamond’, especially since it would almost certainly be a context where that was plausible, most people would probably still have to have the word explained to them.

Furthermore, once we had the definition, we would be endlessly mixing them up, given that they come up in exactly the same context. Words are used many times, but only need to be learned once, so getting the former unambiguous is far more important.

The word ‘ruby’ exists

primarilyto distinguish them from things like diamonds, you can usually guess that they’re not cows from context. Replacing it with diyermand causes it to fail at its main purpose.EDIT:

To give an example from my own field, in maths we have the terms ‘compact’ and ‘sequentially compact’ for types of topological space. The meanings are similar but not the same, you can find spaces satisfying one but not the other, but most ‘nice’ spaces have both or neither.

If your theory is correct, this situation is good, because it will allow people to form a plausible guess at what ‘compact’ means if they already know ‘sequentially compact’ (this is almost always they order a student meets them). Indeed, they do always form a plausible guess, and that guess is ‘the two terms mean the same thing’. This guess seems so plausible, they never question it and go off believing the wrong thing. In my case this lasted about 6 months before someone undeluded me, even when I learned the real definition of compactness, I assumed they were provably equivalent.

Had their names been totally different, I would have actually asked what it meant when I first heard it, and would never have had any misunderstandings, and several others I know would have avoided them as well. This seems unambiguously better.

Hm, that’s a good point, I’ve changed my opinion about this case.

When I wrote my comment, I was thinking primarily of words that share a common prefix or suffix, which tends to imply that they refer to things that share the same category but are not the same thing. “English” and “Spanish”, for example.

But yeah, “diyer” is too close to “die” to be easily distinguishable. Maybe “rubemond”?

I could see the argument for that, provided we also had saphmonds, emmonds etc… Otherwise you run the risk of claiming a special connection that doesn’t exist.

We would also need to find a different word for almonds.

That argument is isomorphic to the one discussed in the post here:

Essentially, it’s hard to argue that the probabilities you assign should be balanced so exactly, and thus (if you’re an altruist) Pascal’s Wager exhorts you either to devote your entire existence to proselytizing for some god, or proselytizing for atheism, depending on which type of deity seems to you to have the slightest edge in probability (maybe with some weighting for the awesomeness of their heavens and awfulness of their hells).

So that’s why you still need a mathematical/epistemic/decision-theoretic reason to reject Pascal’s Wager and Mugger.

What you have is a divergent sum whose sign will depend to the order of summation, so maybe some sort of re-normalization can be applied to make it balance itself out in absence of evidence.

Actually, there is no order of summation in which the sum will converge, since the terms get arbitrary large. The theorem you are thinking of applies to conditionally convergent series, not all divergent series.

Strictly speaking, you don’t always need the sums to converge. To choose between two actions you merely need the sign of difference between utilities of two actions, which you can represent with divergent sum. The issue is that it is not clear how to order such sum or if it’s sign is even meaningful in any way.

Without discussing the merits of your proposal, this is something that clearly falls under “mathematical/epistemic/decision-theoretic reason to reject Pascal’s Wager and Mugger”, so I don’t understand why you left that comment here.

Has the following reply to Pascal’s Mugging been discussed on LessWrong?

Almost any ordinary good thing you could do has some positive expected downstream effects.

These positive expected downstream effects include lots of things like, “Humanity has slightly higher probability of doing awesome thing X in the far future.” Possible values of X include: create 3^^^^3 great lives or create infinite value through some presently unknown method, and stuff like, in a scenario where the future would have been really awesome, it’s one part in 10^30 better.

Given all the possible values of X whose probability is raised by doing ordinary good things, the expected value of doing any ordinary good thing is higher than the expected value of paying the mugger.

Therefore, almost any ordinary good thing you could do is better than paying the mugger. [I take it this is the conclusion we want.]

The most obvious complaint I can think of for this response is that it doesn’t solve selfish versions of Pascal’s Mugging very well, and may need to be combined with other tools in that case. But I don’t remember people talking about this and I don’t currently see what’s wrong with this as a response to the altruistic version of Pascal’s Mugging. (I don’t mean to suggest I would be very surprised if someone quickly and convincingly shoots this down.)

The obvious problem with this is that your utility is

not definedif you are willing to accept muggings, so you can’t use the framework of expected utility maximization at all. The point of the mugger is just to illustrate this, I don’t think anyone thinks you should actually pay them (after all, you might encounter a more generous mugger tomorrow, or any number of more realistic opportunities to do astronomical amounts of good...)Part of the issue is that I am coming at this problem from a different perspective than maybe you or Eliezer is. I believe that paying the mugger is basically worthless in the sense that doing almost any old good thing is better than paying the mugger. I would like to have a satisfying explanation of this. In contrast, Eliezer is interested in reconciling a view about complexity priors with a view about utility functions, and the mugger is an illustration of the conflict.

I do not have a proposed reconciliation of complexity priors and unbounded utility functions. Instead, the above comment is a recommended as an explanation of why paying the mugger is basically worthless in comparison with ordinary things you could do. So this hypothesis would say that if you set up your priors and your utility function in a reasonable way, the expected utility of downstream effects of ordinary good actions would greatly exceed the expected utility of paying the mugger.

Even if you decided that the expected utility framework somehow breaks down in cases like this, I think various related claims would still be plausible. E.g., rather than saying that doing ordinary good things has higher expected utility, it would be plausible that doing ordinary good things is “better relative to your uncertainty” than paying the mugger.

On a different note, another thing I find unsatisfying about the downstream effects reply is that it doesn’t seem to match up with why ordinary people think it is dumb to pay the mugger. The ultimate reason I think it is dumb to pay the mugger is strongly related to why ordinary people think it is dumb to pay the mugger, and I would like to be able to thoroughly understand the most plausible common-sense explanation of why paying the mugger is dumb. The proposed relationship between ordinary actions and their distant effects seems too far off from why common sense would say that paying the mugger is dumb. I guess this is ultimately pretty close to one of Nick Bostrom’s complaints about empirical stabilizing assumptions.

I think we are all in agreement with this (modulo the fact that all of the expected values end up being infinite and so we can’t compare in the normal way; if you e.g. proposed a cap of 3^^^^^^^3 on utilities, then you certainly wouldn’t pay the mugger).

It seems very likely to me that ordinary people are best modeled as having bounded utility functions, which would explain the puzzle.

So it seems like there are two issues:

You would never pay the mugger in any case, because other actions are better.

If you object to the fact that the only thing you care about is a very small probability of an incredibly good outcome, then that’s basically the definition of having a bounded utility function.

And then there is the third issue Eliezer is dealing with, where he wants to be able to have an unbounded utility function even if that doesn’t describe anyone’s preferences (since it seems like boundedness is an unfortunate restriction to randomly impose on your preferences for technical reasons), and formally it’s not clear how to do that. At the end of the post he seems to suggest giving up on that though.

Obviously to really put the idea of people having bounded utility functions to the test, you have to forget about it solving problems of small probabilities and incredibly good outcomes and focus on the most unintuitive consequences of it. For one, having a bounded utility function means caring arbitrarily little about differences between the goodness of different sufficiently good outcomes. And all the outcomes could be certain too. You could come up with all kinds of thought experiments involving purchasing huge numbers of years happy life or some other good for a few cents. You know all of this so I wonder why you don’t talk about it.

Also I believe that Eliezer thinks that an unbounded utility function describes at least his preferences. I remember he made a comment about caring about new happy years of life no matter how many he’d already been granted.

(I haven’t read most of the discussion in this thread or might just be missing something so this might be irrelevant.)

As far as I know the strongest version of this argument is Benja’s, here (which incidentally seems to deserve many more upvotes than it got).

Benja’s scenario isn’t a problem for normal people though, who are not reflectively consistent and whose preferences manifestly change over time.

Beyond that, it seems like people’s preferences regarding the lifespan dilemma are somewhat confusing and probably inconsistent, much like their preferences regarding the repugnant conclusion. But that seems mostly orthogonal to pascal’s mugging, and the basic point—having unbounded utility

by definitionmeans you are willing to accept negligible chances of sufficiently good outcomes against probability nearly 1 of any fixed bad outcome, so if you object to the latter you are just objecting to unbounded utility.I agree I was being uncharitable towards Eliezer. But it is true that at the end of this post he was suggesting giving up on unbounded utility, and that everyone in this crowd seems to ultimately take that route.

Sorry, I didn’t mean to suggest otherwise. The “different perspective” part was supposed to be about the “in contrast” part.

I agree with yli that this has other unfortunate consequences. And, like Holden, I find it unfortunate to have to say that saving N lives with probability 1/N is worse than saving 1 life with probability 1. I also recognize that the things I would like to say about this collection of cases are inconsistent with each other. It’s a puzzle. I have written about this puzzle at reasonable length in my dissertation. I tend to think that bounded utility functions are the best consistent solution I know of, but that continuing to operate with inconsistent preferences (in a tasteful way) may be better in practice.

It’s in Nick Bostrom’s Infinite Ethics paper, which has been discussed repeatedly here, and has been floating around in various versions since 2003. He uses the term “empirical stabilizing assumption.”

I bring this up routinely in such discussions because of the misleading intuitions you elicit by using an example like a mugging that sets off many “no-go heuristics” that track chances of payoffs, large or small. But just because ordinary things may have a higher chance of producing huge payoffs than paying off a Pascal’s Mugger (who doesn’t do demonstrations), doesn’t mean your activities will be completely unchanged by taking huge payoffs into account.

Maybe the answer to this reply is that if there is a downstream multiplier for ordinary good accomplished, there is also a downstream multiplier for good accomplished by the mugger in the scenario where he is telling the truth. And multiplying each by a constant doesn’t change the bottom line.

Why on earth would you expect the downstream utilities to exactly cancel the mugging utility?

The hypothesis is not that they exactly cancel the mugging utility, but that the downstream utilities exceed the mugging utility. I was actually thinking that these downstream effects would be much greater than paying the mugger.

That’s probably true in many cases, but the “mugger” scenario is really designed to test our limits. If 3^^^3 doesn’t work, then probably 3^^^^3 will. To be logically coherent, there has to be some crossover point, where the mugger provides exactly enough evidence to decide that yes, it’s worth paying the $5, despite our astoundingly low priors.

The proposed priors have one of two problems:

you can get mugged too easily, by your mugger simply being sophisticated enough to pick a high enough number to overwhelm your prior.

We’ve got a prior that is highly resistant to mugging, but unfortunately, is also resistant to being convinced by evidence. If there is any positive probability that we really could encounter a matrix lord able to do what they claim, and would offer some kind of pascal mugging like deal, there should be

someamount of evidence that would convince us to take the deal. We would like it if the amount of necessary evidence were within the bounds of what it is possible for our brain to receive and update on in a lifetime, but that is not necessarily the case with the priors which we know will be able to avoid specious muggings.I’m not actually certain that a prior has to exist which doesn’t have one of these two problems.

I also agree with Eliezer’s general principle that when we see convincing evidence of things that we previously considered effectively impossible (prior of /10^-googol or such), then we need to update the whole map on which that prior was based, not just on the specific point. When you watch a person turn into a small cat, either your own sense data, or pretty much your whole map of how things work must come into question. You can’t just say “Oh, people can turn into cats.” and move on as if that doesn’t affect almost everything you previously thought you knew about how the world worked.

It’s much more likely, based on what I know right now, that I am having an unusually convincing dream or hallucination than that people can turn into cats. And if I manage to collect enough evidence to actually make my probability of “people can turn into cats” higher than “my sensory data is not reliable”, then the whole framework of physics, chemistry, biology, and basic experience which caused me to assign such a low probability to “people can turn into cats” in the first place has to be reconsidered.

The probability that humans will eventually be capable of creating x utility given that the mugger is capable of creating x utility probably converges to some constant as x goes to infinity. (Of course, this still isn’t a solution as expected utility still doesn’t convege.)

That assumes that the number is independent of the prior. I wouldn’t make that assumption.

One point I don’t see mentioned here that may be important is that

someone is saying this to you.I encounter lots of people. Each of them has lots of thoughts. Most of those thoughts, they do not express to me (for which I am grateful). How do they decide which thoughts to express? To a first approximation, they express thoughts which are likely, important and/or amusing. Therefore, when I hear a thought that is highly important or amusing, I expect it had less of a likelihood barrier to being expressed, and assign it a proportionally lower probability.

Note that this doesn’t apply to arguments in general—only to ones that other people say to me.

This is probably obvious, but if this problem persisted, a Pascal-Mugging-vulnerable AI would immediately get mugged even without external offers or influence. The possibility alone, however remote, of a certain sequence of characters unlocking a hypothetical control console which could potentially access an above Turing computing model which could influence (insert sufficiently high number) amounts of matter/energy, would suffice. If an AI had to decide “until what length do I utter strange tentative passcodes in the hope of unlocking some higher level of physics”, it would get mugged by the shadow of a matrix lord every time.

It sounds like what you’re describing is something that Iain Banks calls an “Out of Context Problem”—it doesn’t seem like a ‘leverage penalty’ is the proper way to conceptualize what you’re applying, as much as a ‘privilege penalty’.

In other words, when the sky suddenly opens up and blue fire pours out, the entire context for your previous set of priors needs to be re-evaluated—and the very question of “should I give this man $5” exists on a foundation of those now-devaluated priors.

Is there a formalized tree or mesh model for Bayesian probabilities? Because I think that might be fruitful.

There’s something very counterintuitive about the notion that Pascal’s Muggle is perfectly rational. But I think we need to do a lot more intuition-pump research before we’ll have finished picking apart where that counterintuitiveness comes from. I take it your suggestion is that Pascal’s Muggle seems unreasonable because he’s overly confident in his own logical consistency and ability to construct priors that accurately reflect his credence levels. But he also seems unreasonable because he doesn’t take into account that the likeliest explanations for the Hole In The Sky datum either trivialize the loss from forking over $5 (e.g., ‘It’s All A Dream’) or provide much more credible generalized reasons to fork over the $5 (e.g., ‘He Really Is A Matrix Lord, So You Should Do What He Seems To Want You To Do Even If Not For The Reasons He Suggests’). Your response to the Holy In The Sky seems more safe and pragmatic because it leaves open that the decision might be made for those reasons, whereas the other two muggees were explicitly concerned only with whether the Lord’s claims were generically right or generically wrong.

Noting these complications doesn’t help solve the underlying problem, but it does suggest that the intuitively right answer may be overdetermined, complicating the task of isolating our relevant intuitions from our irrelevant ones.

One scheme with the properties you want is Wei Dai’s UDASSA, e.g. see here. I think UDASSA is by far the best formal theory we have to date, although I’m under no delusions about how well it captures all of our intuitions (I’m also under no delusions about how consistent our intuitions are, so I’m resigned to accepting a scheme that doesn’t capture them).

I think it would be more fair to call this allocation of measure part of my preferences, instead of “magical reality fluid.” Thinking that your preferences are objective facts about the world seems like one of the oldest errors in the book, which is only

possiblyjustified in this case because we are still confused about the hard problem of consciousness.As other commenters have observed, it seems clear that you should never actually believe that the mugger can influence the lives of 3^^^^3 other folks and will do so at your suggestion, whether or not you’ve made any special “leverage adjustment.” Nevertheless, even though you never believe that you have such influence, you would still need to pass to some bounded utility function if you want to use the normal framework of expected utility maximization, since you need to compare the goodness of whole worlds. Either that, or you would need to make quite significant modifications to your decision theory.

A note—it looks like what Eliezer is suggesting here is

notthe same as UDASSA. See my analysis here—and endoself’s reply—and here.The big difference is that UDASSA won’t impose the same locational penalty on nodes in extreme situations, since the measure is shared unequally between nodes. There are programs q with relatively short length that can select out such extreme nodes (parties getting genuine offers from Matrix Lords with the power of 3^^^3) and so give them much higher relative weight than 1/3^^^3. Combine this with an unbounded utility, and the mugger problem is still there (as is the divergence in expected utility).

I agree that what Eliezer described is not exactly UDASSA. At first I thought it was just like UDASSA but with a speed prior, but now I see that that’s wrong. I suspect it ends up being within a constant factor of UDASSA, just by considering universes with tiny little demons that go around duplicating all of the observers a bunch of times.

If you are using UDT, the role of UDASSA (or any anthropic theory) is in the definition of the utility function. We define a measure over observers, so that we can say how good a state of affairs is (by looking at the total goodness under that measure). In the case of UDASSA the utility is guaranteed to be bounded, because our measure is a probability measure. Similarly, there doesn’t seem to be a mugging issue.

As lukeprog says here, this really needs to be written up. It’s not clear to me that just because the measure over observers (or observer moments) sums to one then the expected utility is bounded.

Here’s a stab. Let’s use s to denote a sub-program of a universe program p, following the notation of my other comment. Each s gets a weight w(s) under UDASSA, and we normalize to ensure Sum{s} w(s) = 1.

Then, presumably, an expected utility looks like E(U) = Sum{s} U(s) w(s), and this is clearly bounded provided the utility U(s) for each observer moment s is bounded (and U(s) = 0 for any sub-program which isn’t an “observer moment”).

But why is U(s) bounded? It doesn’t seem obvious to me (perhaps observer moments can be arbitrarily blissful, rather than saturating at some state of pure bliss). Also, what happens if U bears no relationship to experiences/observer moments, but just counts the number of paperclips in the universe p? That’s not going to be bounded, is it?

I agree it would be nice if things were better written up; right now there is the description I linked and Hal Finney’s.

If individual moments can be arbitrarily good, then I agree you have unbounded utilities again.

If you count the number of paperclips you would again get into trouble; the analogous thing to do would be to count the mesure of paperclips.

Yeah, I like this solution too. It doesn’t have to be based on the universal distribution, any distribution will work. You must have some way of distributing your single unit of care across all creatures in the multiverse. What matters is not the large number of creatures affected by the mugger, but their total weight according to your care function, which is less than 1 no matter what outlandish numbers the mugger comes up with. The “leverage penalty” is just the measure of your care for not losing $5, which is probably more than 1/3^^^^3.

Who might have the time, desire, and ability to write up UDASSA clearly, if MIRI provides them with resources?

What if the mugger says he will give you a single moment of pleasure that is 3^^^3 times more intense than a standard good experience? Wouldn’t the leverage penalty not apply and thus make the probability of the mugger telling the truth much higher?

I think the real reason the mugger shouldn’t be given money is that people are more likely to be able to attain 3^^^3 utils by donating the five dollars to an existential risk-reducing charity. Even though the current universe presumably couldn’t support 3^^^3 utils, there is a chance of being able to create or travel to vast numbers of other universes, and I think this chance is greater than the chance of the mugger being honest.

Am I missing something? These points seem too obvious to miss, so I’m assigning a fairly large probability to me either being confused or that these were already mentioned.

I don’t think you can give me a moment of pleasure that intense without using 3^^^3 worth of atoms on which to run my brain, and I think the leverage penalty still applies then. You definitely can’t give me a moment of worthwhile happiness that intense without 3^^^3 units of background computation.

The article said the leverage penalty “[penalizes] hypotheses that let you affect a large number of people, in proportion to the number of people affected.” If this is

allthe leverage penalty does, then it doesn’t matter if it takes 3^^^3 atoms or units of computation, because atoms and computations aren’t people.That said, the article doesn’t precisely define what the leverage penalty is, so there could be something I’m missing. So, what exactly

isthe leverage penalty? Does it penalize how many units of computation, rather than people, you can affect? This sounds much less arbitrary than the vague definition of “person” and sounds much easier to define: simply divide the prior of a hypothesis by the number of bits flipped by your actions in it and then normalize.You’re absolutely right. I’m not sure how I missed or forgot about reading that.

Indeed, you can’t ever present a mortal like me with evidence that has a likelihood ratio of a googolplex to one—evidence I’m a googolplex times more likely to encounter if the hypothesis is true, than if it’s false—because the chance of all my neurons spontaneously rearranging themselves to fake the same evidence would always be higher than one over googolplex. You know the old saying about how once you assign something probability one, or probability zero, you can never change your mind regardless of what evidence you see? Well, odds of a googolplex to one, or one to a googolplex, work pretty much the same way.”

On the other hand, if I am dreaming, or drugged, or crazy, then it DOESN’T MATTER what I decide to do in this situation. I will still be trapped in my dream or delusion, and I won’t actually be five dollars poorer because you and I aren’t really here. So I may as well discount all probability lines in which the evidence I’m seeing isn’t a valid representation of an underlying reality. Here’s your $5.

Are you sure? I would expect that it’s possible to recover from that, and some actions would make you more likely to recover than others.

If all of my experiences are dreaming/drugged/crazy/etc. experiences then what decision I make only matters if I value having one set of dreaming/drugged/crazy experiences over a different set of such experiences.

The thing is, I sure do seem to value having one set of experiences over another. So if all of my experiences are dreaming/drugged/crazy/etc. experiences then it seems I do value having one set of such experiences over a different set of such experiences.

So, given that, do I choose the dreaming/drugged/crazy/etc. experience of giving you $5 (and whatever consequences that has?). Or of refusing to give you $5 (and whatever consequences

thathas)? Or something else?But that would destroy your ability to deal with optical illusions and misdirection.

Perhaps I should say …in which I can’t reasonably expect to GET evidence entangled with an underlying reality.

Random thoughts here, not highly confident in their correctness.

Why is the leverage penalty seen as something that needs to be added, isn’t it just the obviously correct way to do probability.

Suppose I want to calculate the probability that a race of aliens will descend from the skies and randomly declare me Overlord of Earth some time in the next year. To do this, I naturally go to Delphi to talk to the Oracle of Perfect Priors, and she tells me that the chance of aliens descending from the skies and declaring an Overlord of Earth in the next year is 0.0000007%.

If I then declare this to be my probability of become Overlord of Earth in an alien-backed coup, this is obviously wrong. Clearly I should multiply it by the probability that the aliens pick me, given that the aliens are doing this. There are about 7-billion people on earth, and updating on the existence of Overlord Declaring aliens doesn’t have much effect on that estimate, so my probability of being picked is about 1 in 7 billion, meaning my probability of being overlorded is about 0.0000000000000001%. Taking the former estimate rather than the latter is simply wrong.

Pascal’s mugging is a similar situation, only this time when we update on the mugger telling the truth, we radically change our estimate of the number of people who were ‘in the lottery’, all the way up to 3^^^^3. We then multiply 1/3^^^^3 by the probability that we live in a universe where Pascal’s muggings occur (which should be very small but not super-exponentially small). This gives you the leverage penalty straight away, no need to think about Tegmark multiverses. We were simply mistaken to not include it in the first place.

How does this work with Clippy (the only paperclipper in known existence) being tempted with 3^^^^3 paperclips?

That’s part of why I dislike Robin Hanson’s original solution. That the tempting/blackmailing offer involves 3^^^^3 other people, and that you are also a person should be merely incidental to one

particular illustrationof the problem of Pascal’s Mugging—and as such it can’t be part of a solution to the core problem.To replace this with something like “causal nodes”, as Eliezer mentions, might perhaps solve the problem. But I wish that we started talking about Clippy and his paperclips instead, so that the original illustration of the problem which involves incidental symmetries doesn’t mislead us into a “solution” overreliant on symmetries.

Clippy has some sort of prior over the number of paperclips that could possibly exist. Let this number be P. Conditioned on each value of P, Clippy evaluates the utility of the offer and the probability that it comes true.

In particular, for P < 3^^^^3, the conditional probability that the offer of 3^^^^3 paperclips is legit is 0. If some large number of paperclips exists, e.g. P = 2*3^^^^3, the offer might actually be viable with non-negligible probability, while its utility would be given by 3^^^^3/P. Note that this is always at most 1.

However, unless Clippy lives in a very strange universe, it thinks that P >= 3^^^^3 is very unlikely. So the expected utility will be bounded by Pr[P >= 3^^^^3] and will end up being very small.

First thought, I’m not at all sure that it does. Pascal’s mugging may still be a problem. This doesn’t seem to contradict what I said about the leverage penalty being the only correct approach, rather than a ‘fix’ of some kind, in the first case. Worryingly, if you are correct it may also not be a ‘fix’ in the sense of not actually fixing anything.

I notice I’m currently confused about whether the ‘causal nodes’ patch is justified by the same argument. I will think about it and hopefully find an answer.

This sounds a little bit like it might depend on the choice of SSA vs. SIA.

Okay, that makes sense. In that case, though, where’s the problem? Claims in the form of “not only is X a true event, with details A, B, C, …, but also it’s the greatest event by metric M that has ever happened” should have low enough probability that a human writing it down specifically in advance as a hypothesis to consider, without being prompted by some specific evidence, is doing really badly epistemologically.

Also, I’m confused about the relationship to MWI.

Many of the conspiracy theories generated have some significant overlap (i.e. are not mutually exclusive), so one shouldn’t expect the sum of their probabilities to be less than 1. It’s permitted for P(Cube A is red) + P(Sphere X is blue) to be greater than 1.

Edit: formatting fixed. Thanks, wedrifid.

My response to the mugger:

You claim to be able to simulate 3^^^^3 unique minds.

It takes log(3^^^^3) bits just to count that many things, so my absolute upper bound on the prior for an agent capable of doing this is 1/3^^^^3.

My brain is unable to process enough evidence to overcome this, so unless you can use your matrix powers to give me access to sufficient computing power to change my mind, get lost.

My response to the scientist:

Why yes, you do have sufficient evidence to overturn our current model of the universe, and if your model is sufficiently accurate, the computational capacity of the universe is vastly larger than we thought.

Let’s try building a computer based on your model and see if it works.

Try an additional linebreak before the first bullet point.

Why does that prior follow from the counting difficulty?

I was thinking that using (length of program) + (memory required to run program) as a penalty makes more sense to me than (length of program) + (size of impact). I am assuming that any program that can simulate X minds must be able to handle numbers the size of X, so it would need more than log(X) bits of memory, which makes the prior less than 2^-log(X).

I wouldn’t be overly surprised if there were some other situation that breaks this idea too, but I was just posting the first thing that came to mind when I read this.

You’re trying to italicize those long statements? It’s possible that you need to get rid of the spaces around the asterisks.

But you’re probably better off just using quote boxes with “>” instead.

This system does seem to lead to the odd effect that you would probably be more willing to pay Pascal’s Mugger to save 10^10^100 people than you would be willing to pay to save 10^10^101 people, since the leverage penalties make them about equal, but the latter has a higher complexity cost. In fact the leverage penalty effectively means that you cannot distinguish between events providing more utility than you can provide an appropriate amount of evidence to match.

It’s not that odd. If someone asked to borrow ten dollars, and said he’d pay you back tomorrow, would you believe him? What if he said he’d pay back $20? $100? $1000000? All the money in the world?

At some point, the probability goes down faster than the price goes up. That’s why you can’t just get a loan and keep raising the interest to make up for the fact that you probably won’t ever pay it back.

Is there any particular reason an AI wouldn’t be able to self-modify with regards to its prior/algorithm for deciding prior probabilities? A basic Solomonoff prior should include a non-negligible chance that it itself isn’t perfect for finding priors, if I’m not mistaken. That doesn’t answer the question as such, but it isn’t obvious to me that it’s necessary to answer this one to develop a Friendly AI.

You are mistaken. A prior isn’t something that can be mistaken per se. The closest it can get is assigning a low probability to something that is true. However, any prior system will say that the probability it gives of something being true is exactly equal to the probability of it being true, therefore it is well-calibrated. It will occasionally give low probabilities for things that are true, but only to the extent that unlikely things sometimes happen.

The difference between this and average utilitarianism is that we divide the probability by the hypothesis size, rather than dividing the utility by that size. The closeness of the two seems a bit surprising.

This bothers me because it seems like frequentist anthropic reasoning similar to the Doomsday argument. I’m not saying I know what the correct version should be, but assuming that we can use a uniform distribution and get nice results feels like the same mistake as the principle of indifference (and more sophisticated variations that often worked surprisingly well as an epistemic theory for finite cases). Things like Solomonoff distributions are more flexible...

The problem goes away of we try to employ a universal distribution for the reality fluid, rather than a uniform one. (This does not make that a good idea, necessarily.)

If we try to use universal-distribution reality-fluid instead, we would expect to continue to see the same sort of distribution we had seen in the past: we would believe that

wewent down a path where the reality fluid concentrated into the Born probabilities, but other quantum paths which would be very improbable according to the Born probabilities may get high probability from some other rule.Just to jump in here—the solution to the doomsday argument is that it is a low-information argument in a high-information situation. Basically, once you know you’re the 10 billionth zorblax, your prior should indeed put you in the middle of the group of zorblaxes, for 20 billion total, no matter what a zorblax is. This is correct and makes sense. The trouble comes if you open your eyes, collect additional data, like population growth patterns, and then

never use any of that to update the prior.When people put population growth patterns and the doomsday prior together in the same calculation for the “doomsday date,” that’s just blatantly having data but not updating on it.There is likely a broader-scoped discussion on this topic that I haven’t read, so please point me to such a thread if my comment is addressed—but it seems to me that there is a simpler resolution to this issue (as well as an obvious limitation to this way of thinking), namely that there’s an almost immediate stage (in the context of highly-abstract hypotheticals) where probability assessment breaks down completely.

For example, there are an uncountably-infinite number of different parent universes we could have. There are even an uncountably-infinite number of possible laws of physics that could govern our universe. And it’s literally impossible to have all these scenarios “possible” in the sense of a well-defined measure, simply because if you want an uncountable sum of real numbers to add up to 1, only countably many terms can be nonzero.

This is highly related to the axiomatic problem of cause and effect, a famous example being the question “why is there something rather than nothing”—you have to have an axiomatic foundation before you can make calculations, but the sheer act of adopting that foundation excludes a lot of very interesting material. In this case, if you want to make probabilistic expectations, you need a solid axiomatic framework to stipulate how calculations are made.

Just like with the laws of physics, this framework should agree with empirically-derived probabilities, but just like physics there will be seemingly-well-formulated questions that the current laws cannot address. In cases like hobos who make claims to special powers, the framework may be ill-equipped to make a definitive prediction. More generally, it will have a scope that is limited of mathematical necessity, and many hypotheses about spirituality, religion, and other universes, where we would want to assign positive but marginal probabilities, will likely be completely outside its light cone.

A few thoughts:

I haven’t strongly considered my prior on being able to save 3^^^3 people (more on this to follow). But regardless of what that prior is, if approached by somebody claiming to be a Matrix Lord who claims he can save 3^^^3 people, I’m not only faced with the problem of whether I ought to pay him the $5 - I’m also faced with the question of whether I ought to walk over to the next beggar on the street, and pay him $0.01 to save 3^^^3 people. Is this person 500 times more likely to be able to save 3^^^3 people? From the outset, not really. And giving money to random people has no prior probability of being more likely to save lives than anything else.

Now suppose that the said “Matrix Lord” opens the sky, splits the Red Sea, demonstrates his duplicator box on some fish and, sure, creates a humanoid Patronus. Now do I have more reason to believe that he is a Time Lord? Perhaps. Do I have reason to think that he will save 3^^^3 lives if I give him $5? I don’t see convincing reason to believe so, but I don’t see either view as problematic.

Obviously, once you’re not taking Hanson’s approach, there’s no problem with believing you’ve made a major discovery that can save an arbitrarily large number of lives.

But here’s where I noticed a bit of a problem in your analogy: In the dark matter case you say “”if these equations are actually true, then our descendants will be able to exploit dark energy to do computations, and according to my back-of-the-envelope calculations here, we’d be able to create around a googolplex people that way.”

Well, obviously the odds here of creating exactly a googolplex people is no greater than one in a googolplex. Why? Because those back of the hand calculations are going to get us (at best say) an interval from 0.5 x 10^(10^100) to 2 x 10^(10^100) - an interval containing more than a googolplex distinct integers. Hence, the odds of any specific one will be very low, but the sum might be very high. (This is simply worth contrasting with your single integer saved of the above case, where presumably your probabilities of saving 3^^^3 + 1 people are no higher than they were before.)

Here’s the main problem I have with your solution:

“But if I actually see strong evidence for something I previously thought was super-improbable, I don’t just do a Bayesian update, I should also question whether I was right to assign such a tiny probability in the first place—whether it was really as complex, or unnatural, as I thought. In real life, you are not ever supposed to have a prior improbability of 10^-100 for some fact distinguished enough to be written down, and yet encounter strong evidence, say 10^10 to 1, that the thing has actually happened.”

Sure you do. As you pointed out, dice rolls. The sequence of rolls in a game of Risk will do this for you, and you have strong reason to believe that you played a game of Risk and the dice landed as they did.

We do probability estimates because we lack information. Your example of a mathematical theorem is a good one: The Theorem X is true or false from the get-go. But whenever you give me new information, even if that information is framed in the form of a question, it makes sense for me to do a Bayesian update. That’s why a lot of so-called knowledge paradoxes are silly: If you ask me if I know who the president is, I can answer with 99%+ probability that it’s Obama, if you ask me whether Obama is still breathing, I have to do an update based on my consideration of what prompted the question. I’m not committing a fallacy by saying 95%, I’m doing a Bayesian update, as I should.

You’ll often find yourself updating your probabilities based on the knowledge that you were completely incorrect about something (even something mathematical) to begin with. That doesn’t mean you were wrong to assign the initial probabilities: You were assigning them based on your knowledge at the time. That’s how you assign probabilities.

In your case, you’re not even updating on an “unknown unknown”—that is, something you failed to consider even as a possibility—though that’s the reason you put all probabilities at less than 100%, because your knowledge is limited. You’re updating on something you considered before. And I see absolutely no reason to label this a special non-Bayesian type of update that somehow dodges the problem. I could be missing something, but I don’t see a coherent argument there.

As an aside, the repeated references to how people misunderstood previous posts are distracting to say the least. Couldn’t you just include a single link to Aaronson’s Large Numbers paper (or anything on up-arrow notation, I mention Aaronson’s paper because it’s fun)? After all, if you can’t understand tetration (and up), you’re not going to understand the article to begin with.

Honestly, at this point, I would strongly update in the direction that I am being deceived in some manner. Possibly I am dreaming, or drugged, of the person in front of me has some sort of perception-control device. I do not see any reason why someone who could open the sky, split the Red Sea, and so on, would need $5; and if he did, why not make it himself? Or sell the fish?

The only reasons I can imagine for a genuine Matrix Lord pulling this on me are very bad for me. Either he’s a sadist who likes people to suffer—in which case I’m doomed no matter what I do—or there’s something that he’s not telling me (perhaps doing what he says once surrenders my free will, allowing him to control me forever?), which implies that he believes that I would reject his demand if I knew the truth behind it, which strongly prompts me to reject his demand.

Or he’s insane, following no discernable rules, in which case the only thing to do is to try to evade notice (something I’ve clearly already failed at).

That your universe is controlled by a sadist doesn’t suggest that every possible action you could do is equivalent. Maybe all your possible fates are miserable, but some are far more miserable than others. More importantly, a being might be sadistic in some respects/situations but not in others.

I also have to assign a very, very low prior to anyone’s being able to figure out in 5 minutes what the Matrix Lord’s exact motivations are. Your options are too simplistic even to describe minds of human-level complexity, much less ones of the complexity required to design or oversee physics-breakingly large simulations.

I think indifference to our preferences (except as incidental to some other goal, e.g., paperclipping) is more likely than either sadism or beneficence. Only very small portions of the space of values focus on human-style suffering or joy. Even in hypotheticals that seem designed to play with human moral intuitions. Eliezer’s decision theory conference explanation makes as much sense as any.

You are right. However, I can see no way to decide which course of action is best (or least miserable). My own decision process becomes questionable in such a situation; I can’t imagine any strategy that is convincingly better than taking random actions.

When I say “doomed no matter what I do”, I do not mean doomed with certainty. I mean that I have a high probability of doom, for any given action, and I cannot find a way to minimise that probability through my own actions.

Thinking about this, I think that you are right. I still consider sadism more likely than beneficence, but I had been setting the prior for indifference too low. This implies that the Matrix Lord has preferences, but these preferences are unknown and possibly unknowable (perhaps he wants to maximise slood).

...

This make the question of which action to best take even more difficult to answer. I do not know anything about slood; I cannot, because it only exists outside the Matrix. The only source of information from outside the Matrix is the Matrix Lord. This implies that, before reaching any decision, I should spend a long time interviewing the Matrix Lord, in an attempt to better be able to model him.

Well, this Matrix Lord seems very interested in decision theory and utilitarianism. Sadistic or not, I expect such a being to respond more favorably to attempts to take the dilemmas he raised seriously than to an epistemic meltdown. Taking the guy at his word and trying to reason your way through the problem is likely to give him more useful data than attempts to rebel or go crazy, and if you’re useful then it’s less likely that he’ll punish you or pull the plug on your universe’s simulation.

It seems reasonably likely that this will lead to a response of ”...alright, I’ve got the data that I wanted, no need to keep this simulation running any longer...” and then pulling the plug on my universe. While it is true that this strategy is likely to lead to a happier Matrix Lord (especially if the data that I give him coincides with the data he expects), I’m not convinced that it leads to a longer existence for my universe.

That may be true too. It depends on the priors we have for generic superhuman agents’ reasons for keeping a simulation running (e.g., having some other science experiments planned, wanting to reward you for providing data...) vs. for shutting it down (e.g., vindictiveness, energy conservation, being interested only in one data point per simulation...).

We do have some data to work with here, since we have experience with the differential effects of power, intelligence, curiosity, etc. among humans. That data is only weakly applicable to such an exotic agent, but it does play a role, so our uncertainty isn’t absolute. My main point was that unusual situations like this don’t call for complete decision-theoretic despair; we still need to make choices, and we can still do so reasonably, though our confidence that the best decision is also a winning decision is greatly diminished.

Well, if I’m going to free-form speculate about the scenario, rather than use it to explore the question it was introduced to explore, the most likely explanation that occurs to me is that the entity is doing the Matrix Lord equivalent of free-form speculating… that is, it’s wondering “what would humans do, given this choice and that information?” And, it being a Matrix Lord, its act of wondering creates a human mind (in this case, mine) and gives it that choice and information.

Which makes it likely that I haven’t actually lived through most of the life I remember, and that I won’t continue to exist much longer than this interaction, and that most of what I think is in the world around me doesn’t actually exist.

That said, I’m not sure what use free-form speculating about such bizarre and underspecified scenarios really is, though I’ll admit it’s kind of fun.

It’s kind of fun. Isn’t that reason enough?

Looking at the original question—i.e. how to handle very large utilities with very small probability—I find that I have a mental safety net there. The safety net says that the situation is a lie. It does not matter how much utility is claimed, because anyone can state any arbitrarily large number, and a number has been chosen (in this case, by the Matrix Lord) in a specific attempt to overwhelm my utility function. The small probability is chosen (a) because I would not believe a larger probability and (b) so that I have no recourse when it fails to happen.

I am reluctant to fiddle with my mental safety nets because, well, they’re

safetynets—they’re there for a reason. And in this case, the reason is that such a fantastically unlikely event is unlikely enough that it’s not likely to happenever, toanyone. Not even once in the whole history of the universe. If I (out of all the hundreds of billions of people in all of history) do ever run across such a situation, then it’s so incredibly overwhelmingly more likely that I am being deceived that I’m far more likely to gain by immediately jumping to the conclusion of ‘deceit’ than by assuming that there’s any chance of this being true.(nods) Sure. My reply here applies here as well.

Those aren’t “distinguished enough to be written down” before the game is played. I’ll edit to make this slightly clearer hopefully.

Is it reasonable to take this as evidence that we shouldn’t use expected utility computations, or not only expected utility computations, to guide our decisions?

If I understand the context, the reason we believed an entity, either a human or an AI, ought to use expected utility as a practical decision making strategy, is because it would yield good results (a simple, general architecture for decision making). If there are fully general attacks (muggings) on all entities that use expected utility as a practical decision making strategy, then perhaps we should revise the original hypothesis.

Utility as a theoretical construct is charming, but it does have to pay its way, just like anything else.

P.S. I think the reasoning from “bounded rationality exists” to “non-Bayesian mind changes exist” is good stuff. Perhaps we could call this “on seeing this, I become willing to revise my model” phenomenon something like “surprise”, and distinguish it from merely new information.

I’m pretty ignorant of quantum mechanics, but I gather there was a similar problem, in that the probability function for some path appeared to be dominated by an infinite number of infinitessimally-unlikely paths, and Feynman solved the problem by showing that those paths cancelled each other out.

Relevant math, similar features in classical optics and Quantum Mechanics.

Why are you not a science fiction writer?

How confident are you of “Probability penalties are epistemic features—they affect what we believe, not just what we do. Maps, ideally, correspond to territories.”? That seems to me to be a strong heuristic, even a very very strong heuristic, but I don’t think it’s strong enough to carry the weight you’re placing on it here. I mean, more technically, the map corresponds to some relationship between the territory and the map-maker’s utility function, and nodes on a causal graph, which are, after all, probabilistic, and thus are features of maps, not of territories, are features of the map-maker’s utility function, not just summaries of evidence about the territory.

I suspect that this formalism mixes elements of division of magical reality fluid between maps with elements of division of magical reality fluid between territories.

...at what point are you overthinking this?

The link labelled “the prior probability of hypotheses diminishes exponentially with their complexity” is malformed.

Is there any justification for the leverage penalty? I understand that it would apply if there were a finite number of agents, but if there’s an infinite number of agents, couldn’t all agents have an effect on an arbitrarily larger number of other agents? Shouldn’t the prior probability instead be P(event A | n agents will be effected) = (1 / n) + P(there being infinite entities)? If this is the case, then it seems the leverage penalty won’t stop one from being mugged.

If our math has to handle infinities we have bigger problems. Unless we use measures, and then we have the same issue and seemingly forced solution as before. If we don’t use measures, things fail to add up the moment you imagine “infinity”.

Then this solution just assumes the possibility of infinite people is 0. If this solution is based on premises that are probably false, then how is it a solution at all? I understand that infinity makes even bigger problems, so we should instead just call your solution a pseudo-solution-that’s-probably-false-but-is—still-the-best-one-we-have, and dedicate more efforts to finding a real solution.

I should like to point out that if realness were not preserved, i.e., if some worlds at time t were more real than others, their inhabitants would have no way of discerning that fact.

Just a digression that has no bearing on the main point of the post:

The probability that we’re in a simulation, times the number of expected Matrix Lords at one moment per simulation, divided by population, should be a lower bound on that probability. I would think it would be at least 1 / population.

I expect far less than 1 Matrix Lord per simulated population. I expect the vast majority of simulations are within UFAIs trying to gain certain types of information through veridical simulation, no Matrix Lords there.

The usual analyses of Pascal’s Wager, like many lab experiments, privileges the hypothesis and doesn’t look for alternative hypotheses.

Why would anyone assume that the Mugger will do as he says? What do we know about the character of all powerful beings? Why should they be truthful to us? If he knows he could save that many people, but refrains from doing so because you won’t give him five dollars, he is by human standards a psycho. If he’s a psycho, maybe he’ll kill all those people if I give him 5 dollars. That actually seems

morelikely behavior from such a dick.The situation you are in isn’t the experimental hypothetical of knowing what the mugger will do depending on what your actions are. It’s a situation where you observer X,Y, and Z, and are free to make inferences from them. If he has the power, I infer the mugger is a sadistic dick who likes toying with creatures. I expect him to renege on the bet, and likely invert it. “Ha Ha! Yes, I saved those beings, knowing that each would go on to torture a zillion zillion others.”

This is a mistake theists make all the time. They think hypothesizing an all powerful being allows them to account for all mysteries, and assume that once the power is there, the privileged hypothesis will be fulfilled. But you get no increased probability of any event from hypothesizing power unless you also establish a prior on behavior. From the little I’ve seen of the mugger, if he has the power to do what he claims, he is malevolent. If he doesn’t have the power, he is impotent to deliver and deluded or dishonest besides. Either way, I have no expectation of gain by appealing to such a person.

Yes, privileging a hypothesis isn’t discussed in great detail, but the alternatives you mention in your post don’t resolve the dilemma. Even if you think that that the probabilities of a “good” and “bad” alternatives balance each other out to the quadrillionth decimal point, the utilities you get in your calculation are astronomical. If you think there’s a 0.0000

quadrillion zeros1 greater chance that the beggar will do good than harm, the expected utility of your $5 donation is inconceivably greater than than a trillion years of happiness. If you think there’s at least a 0.0000quadrillion zeros1 chance that $5 will cause the mugger to act malevolently, your $5 donation is inconceivably worse than a trillion years of torture. Both of theses expectations seem off.You can’t just say “the probabilities balance out”. You have to explain why the probabilities balance out to a bignum number of decimal points.

Actually, I don’t. I say the probabilities are within my margin of error, which is a lot larger than “0.0000quadrillion zeros1”. I can’t discern differences of “0.0000quadrillion zeros1″.

OK, but now decreasing your margin of error until you

canmake a determination is the most important ethical mission in history. Governments should spend billions of dollars to assemble to brightest teams to calculate which of your two options is better—more lives hang in the balance (on expectation) than would ever live if we colonized the universe with people the size of atoms.Suppose a trustworthy Omega tells you “This is a once in a lifetime opportunity. I’m going to cure all residence of country from all diseases in benevolent way (no ironic or evil catches). I’ll leave the country up to you. Give me $5 and the country will be Zimbabwe, or give me nothing and the country will be Tanzania. I’ll give you a couple of minutes to come up with a decision.” You would not think to yourself “Well, I’m not sure which is bigger. My estimates don’t differ by more than my margin of error, so I might as well save the $5 and go with Tanzania”. At least I hope that’s not how you’d make the decision.

Seems a lot like learning a proof of X. It shouldn’t surprise us that learning a proof of X increases your confidence in X. The mugger genie has little ground to accuse you of inconsistency for believing X more after learning a proof of it.

Granted the analogy isn’t exact; what is learned may fall well short of rigorous proof. You may have only learned a good argument for X. Since you assign only 90% posterior likelihood I presume that’s intended in your narrative.

Nevertheless, analogous reasoning seems to apply. The mugger genie has little ground to accuse you of inconsistency for believing X more after learning a good argument for it.

Continuing from what I said in my last comment about the more general problem with Expected Utility Maximizing, I think I might have a solution. I may be entirely wrong, so any criticism is welcome.

Instead of calculating Expected Utility, calculate the probability that an action will result in a higher utility than another action. Choose the one that is more likely to end up with a higher utility. For example, if giving Pascal’s mugger the money only has a one out of a trillionth chance of ending up with a higher utility than not giving him your money, you wouldn’t give it.

Now there is an apparent inconsistency with this system. If there is a lottery, and you have a

^{1}⁄_{100}chance of winning, you would never buy a ticket. Even if the reward is $200 and the cost of a ticket only $1. Or even regardless how big the reward is. However if you are offered the chance to buy a lot of tickets all at once, you would do so, since the chance of winning becomes large enough to outgrow the chance of not winning.However I don’t think that this is a problem. If you expect to play the lottery a bunch of times in a row, then you will choose to buy the ticket, because making that choice in this one instance also means that you will make the same choice in every other instance. Then the probability of ending up with more money at the end of the day is higher.

So if you expect to play the lottery a lot, or do other things that have low chances of ending up with high utilities, you might participate in them. Then when all is done, you are more likely to end up with a higher utility than if you had not done so. However if you get in a situation with an absurdly low chance of winning, it doesn’t matter how large the reward is. You wouldn’t participate, unless you expect to end up in the same situation an absurdly large number of times.

This method is consistent, it seems to “work” in that most agents that follow it will end up with higher utilities than agents that don’t follow it, and Expected Utility is just a special case of it that only happens when you expect to end up in similar situations a lot. It also seems closer to how humans actually make decisions. So can anyone find something wrong with this?

So if I’m getting what you’re saying correctly, it would not sacrifice a single cent for a 49% chance to save a human life?

And on the other hand it could be tempted to a game where it’d have 51% chance of winning a cent, and 49% chance of being destroyed?

If the solution for the problem of infinitesmal probabilities, is to effectively ignore every probability under 50%, that’s a solution that’s worse than the problem...

I stupidly didn’t consider that kind of situation for some reason… Back to the drawing board I guess.

Though to be fair it would still come out ahead 51% of the time, and in a real world application it would probably choose to spend the penny, since it would expect to make choices similarly in the future, and that would help it come out ahead an even higher percent of the time.

But yes, a 51% chance of losing a penny for nothing probably shouldn’t be worth more than a 49% chance at saving a life for a penny. However allowing a large enough reward to outweigh a small enough probability means the system will get stuck in situations where it is pretty much guaranteed to lose, on the slim, slim chance that it could get a huge reward.

Caring only about the percent of the time you “win” seemed like a more rational solution but I guess not.

Though another benefit of this system could be that you could have weird utility functions. Like a rule that says any outcome where one life is saved is worth more than any amount of money lost. Or Asimov’s three laws of robotics, which wouldn’t work under an Expected Utility function since it would only care about the first law. This is allowed because in the end all that matters is which outcomes you prefer to which other outcomes. You don’t have to turn utilities into numbers and do math on them.

Here’s a question, if we had the ability to input a sensory event with a likelyhoodratio of 3^^^^3:1 this whole problem would be solved?

Assuming the rest of our cognitive capacity is improved commensurably then yes, problem solved. Mind you we would then be left with the problem if a Matrix Lord appears and starts talking about 3^^^^^3.

This seems like an exercise in scaling laws.

The odds of being a hero who save 100 lives are less 1% of the odds of being a hero who saves 1 life. So in the absence of good data about being a hero who saves 10^100 lives, we should assume that the odds are much, much less than 1/(10^100).

In other words, for certain claims, the size of the claim itself lowers the probability.

More pedestrian example: ISTR your odds of becoming a musician earning over $1 million a year are much, much less than 1% of your odds of becoming a musician who earns over $10,000 a year.

I don’t know of any set of axioms that imply that you should take expected utilities when considering infinite sets of possible outcomes that do not also imply that the utility function is bounded. If we think that our utility functions are unbounded and we want to use the Solomonoff prior, why are we still taking expectations?

(I suppose because we don’t know how else to aggregate the utilities over possible worlds. Last week, I tried to see how far I could get if I weakened a few of the usual assumptions. I couldn’t really get anywhere interesting because my axioms weren’t strong enough to tell you how to decide in many cases, even when the generalized probabilities and generalized utilities are known.)

Isn’t this more of social recognition of a scam?

While there are decision-theoretic issues with the Original Pascal’s Wager, one of the main problems is that it is a scam (“You can’t afford not to do it! It’s an offer you can’t refuse!”). It seems to me that you can construct plenty of arguments like you just did, and many people wouldn’t take you up on the offer because they’d recognize it as a scam. Once something has a high chance of being a scam (like taking the form of Pascal’s Wager), it won’t get much more of your attention until you lower the likelihood that it’s a scam. Is that a weird form of Confirmation Bias?

But nonetheless, couldn’t the AI just function in the same way as that? I would think it would need to learn how to identify what is a trick and what isn’t a trick. I would just try to think of it as a Bad Guy AI who is trying to manipulate the decision making algorithms of the Good Guy AI.

The concern here is that if I reject all offers that superficially pattern-match to this sort of scam, I run the risk of turning down valuable offers as well. (I’m reminded of a TV show decades ago where they had some guy dress like a bum and wander down the street offering people $20, and everyone ignored him.)

Of course, if I’m not smart enough to actually evaluate the situation, or don’t feel like spending the energy, then superficial pattern-matching and rejection is my safest strategy, as you suggest.

But the question of what analysis a sufficiently smart and attentive agent

coulddo, in principle, to take advantage of rare valuable opportunities without being suckered by scam artists is often worth asking anyway.But wouldn’t you just be suckered by sufficiently smart and attentive scam artists?

It depends on the nature of the analysis I’m doing.

I mean, sure, if the scam artist is smart enough to, for example, completely encapsulate my sensorium and provide me with an entirely simulated world that it updates in real time and perfect detail, then all bets are off… it can make me believe anything by manipulating the evidence I observe. (Similarly, if the scam artist is smart enough to directly manipulate my brain/mind.)

But if my reasoning is reliable and I actually have access to evidence about the real world, then the better I am at evaluating that evidence, the harder I am to scam about things relating to that evidence, even by a scam artist far smarter than me.

I disagree. All the scam artist has to know is your method of coming to your conclusions. Once he knows that then he can probably exploit you depending on his cleverness (and then it becomes an arms race). If anything, trying to defend yourself from being manipulated in that way would probably be extremely difficult in of itself. Either way, my initial guess is that your methodology would still be superficial pattern-matching, but it would just be a deeper, more complex level of it.

This seems to be what Eliezer is doing with all the various scenarios. He’s testing his methodology against different attacks and different scenarios. I’m just suggesting is to change your viewpoint to the Bad Guy. Rather than talk about your reliable reasoning, talk about the bad guy and how he can exploit your reasoning.

Fair enough. If I accept that guess as true, I agree with your conclusion.

I also agree that adopting the enemy’s perspective is an important—for humans, indispensible—part of strategic thinking.

I also think that the variant of the problem featuring an actual mugger is about scam recognition.

Suppose you get an unsolicited email claiming that a Nigerian prince wants to send you a Very Large Reward worth $Y. All you have to do is send him a cash advance of $5 first …

I analyze this as a straightforward two-player game tree via the usual minimax procedure. Player one goes first, and can either pay $5 or not. If player one chooses to pay, then player two goes second, and can either pay Very Large Reward $Y to player one, or he can run away with the cash in hand. Under the usual minimax assumptions, player 2 is obviously not going to pay out! Crucially, this analysis does not depend on the value for Y.

The analysis for Pascal’s mugger is equivalent. A decision procedure that needs to introduce ad hoc corrective factors based on the value of Y seems flawed to me. This type of situation should not require an unusual degree of mathematical sophistication to analyze.

When I list out the most relevant facts about this scenario, they include the following: (1) we received an unsolicited offer (2) from an unknown party from whom we won’t be able to seek redress if anything goes wrong (3) who can take our money and run without giving us anything verifiable in return.

That’s all we need to know. The value of Y doesn’t matter. If the mugger performs a cool and impressive magic trick we may want to tip him for his skillful street performance. We still shouldn’t expect him to payout Y.

I generally learn a lot from the posts here, but in this case I think the reasoning in the post confuses rather than enlightens. When I look back on my own life experiences, there are certainly times when I got scammed. I understand that some in the Less Wrong community may also have fallen victim to scams or fraud in the past. I expect that many of us will likely be subject to disingenuous offers by unFriendly parties in the future. I respectfully suggest that knowing about common scams is a helpful part of a rationalist’s training. It may offer a large benefit relative to other investments.

If my analysis is flawed and/or I’ve missed the point of the exercise, I would appreciate learning why. Thanks!

When you say that player 2 “is obviously not going to pay out” that’s an approximation. You don’t know that he’s not going to pay off. You know that he’s

very, very, very, unlikelyto pay off. (For instance, there’s a very slim chance that he subscribes to a kind of honesty which leads him to do things he says he’ll do, and therefore doesn’t follow minimax.) But in Pascal’s Mugging, “very, very, very, unlikely” works differently from “no chance at all”.That does not matter. If you think it is scam, then size of promised reward does not matter. 100? Googol? Googolplex? 3^^^3? Infinite? It just do not enter calculations in first place, since it is made up anyway.

Determining “is this scam?” probably would have to rely on other things than size of reward. That’ avoids whole “but but there is no 1 in 3^^^3 probablility because I say so” bs.

There’s a

probabilityof a scam, you’re notcertainthat it is a scam. The small probability that you are wrong about it being a scam is multiplied by the large amount.What if the probability of it being a scam is a function of the amount offered?

There seems to be this idea on LW that the probability of it being not a scam can only decrease with the Kolmogorov complexity of the offer. If you accept this idea, then the probability being a function of the amount doesn’t help you.

If you accept that the probability can decrease faster than that, then of course that’s a solution.

I can’t come up with any reasons why that should be so.

I suppose that people who talk about Kolmogorov complexity in this setting are thinking of AIXI or some similar decision procedure.

Too bad that AIXI doesn’t work with unbounded utility, as expectations may diverge or become undefined.

I think this comes down to bounded computation

With a human’s bounded computational resources, maybe assuming that it balances out is the best you can do. You have to make simplifying assumptions if you want to reason about numbers as large as 3↑↑↑3.

But we can see why the probability isn’t counterbalanced without having a visceral grasp on the quantities involved. There may be some uncertainty in our view that there aren’t enough counterbalancing forces we’ve taken into account, but in practice uncertainty almost always places you some nontrivial distance away from .5 credence. We still have to have credence levels, even about quantities that are very uncertain or beyond our ability to compute. Metauncertainty won’t drag your confidence to .5 unless your uncertainty disproportionately supports ‘I’m underestimating the likelihood that there are counterbalancing risks’ over ‘I’m overestimating the likelihood that there are counterbalancing risks’.

I considered this, and I’m not sure if I am considering the mugging from the right perspective.

For instance, in the case of a mugger who is willing to talk with you, even if the actual amount of evidence was mathematically indeterminate (Say the amount is defined as ‘It’s a finite number higher than any number that could fit in your brain.’ and the probability is defined as ‘closer to 0 then any positive number you can fit in your brain that isn’t 0’) you might still attempt to figure out the direction that talking about evidence made the evidence about the mugger go and use that for decision making

If as you talk to him, the mugger provides more and more evidence that he is a matrix lord, you could say “Sure, Here’s 5 dollars.”

Or If as you talk to him, the mugger provides more and more evidence that he is a mugger, you could say “No, go away.”

(Note: I’m NOT saying the above is correct or incorrect yet! Among other things, you could also use the SPEED at which the mugger was giving you evidence as an aid to decision making. You might say yes to a Mugger who offers a million bits of evidence all at once, and no to a Mugger who offers evidence one bit at a time.)

However, in the case below, you can’t even do that—Or you could attempt to, but with the worry that even talking about it itself makes a decision:

Cruel Mugger: “Give me 5 dollars and I use my powers to save a shitload of lives. Do anything else, like talking about evidence or walking away, and they die.”

So, to consider the problem from the right perspective, should I be attempting to solve the Mugging, the Cruel Mugging, both separately, or both as if they are the same problem?

The scenario is already so outlandish that it seems unwarranted to assume that the mugger is saying the truth with more than 0.5 certainty. The motives of such a being to engage in this kind of prank, if truly in such a powerful position, would have to be very convulted. Isn’t it

at least aslikely that the opposite will happen if I hand over the five dollars?Okay, I guess if that’s my answer, I’ll have to hand over the money if the mugger says “

don’tgive me five dollars!” Or do I?Reversed stupidity is not intelligence. You are not so confused as to guess the opposite of what will happen more often than what will actually happen. All your confusion means is that it is almost as likely that the opposite will happen.

This is one of many reasons that the “discover novel physics that implies the ability to affect (really big number) lives” version of this thought experiment works better than the “encounter superhuman person who asserts the ability to affect (really big number) lives”. That said, if I’m looking for reasons for incredulity and prepared to stop thinking about the scenario once I’ve found them, I can find them easily enough in both cases.

Well, one of my responses to the superhuman scenario is that my prior depends on the number, so you can’t exceed my prior just by raising the number.

The reasons I gave for having my prior depend on the number don’t still apply to the physics scenario, but there are new reasons that would. For instance, the human mind is not good at estimating or comprehending very small probabilities and very large numbers; if I had to pay $5 for research that had a very tiny probability of producing a breakthrough that would improve lives by a very large amount of utility, I would have little confidence in my ability to properly compute those numbers and the more extreme the numbers the less my confidence would be.

(And “I have no confidence” also means I don’t know my own errors are distributed, so you can’t easily fix this up by factoring my confidence into the expected value calculation.)

Yes, agreed, a researcher saying “give me $5 to research technology with implausible payoff” is just some guy saying “give me $5 to use my implausible powers” with different paint and has many of the same problems.

The scenario I’m thinking of is “I have, after doing a bunch of research, discovered some novel physics which, given my understanding of it and the experimental data I’ve gathered, implies the ability to improve (really big number) lives,” which raises the possibility that I ought to reject the results of my own experiments and my own theorizing, because the conclusion is just so bloody implausible (at least when expressed in human terms; EY loses me when he starts talking about quantifying the implausibility of the conclusion in terms of bits of evidence and/or bits of sensory input and/or bits of cognitive state).

And in particular, the “you could just as easily

harm(really big number) lives!” objection simply disappears in this case; it’s no more likely than anything else, and vanishes into unconsiderability when compared to “nothing terribly interesting will happen,” unless I posit that I actually do know what I’m doing.Suppose you could conceive of what the future will be like if it were explained to you.

Are there more or less than a googleplex differentiable futures which are conceivable to you? If there are more, then selecting a specific one of those conceivable futures is more bits than posited as possible. If fewer, then...?

Why does “Earthling” imply sufficient evidence for the rest of this (given a leverage adjustment)? Don’t we have independent reason to think otherwise, eg the Great Filter argument?

Mind you, the recent MIRI math paper and follow-up seem (on their face) to disprove some clever reasons for calling seed AGI

actuallyimpossible and thereby rejecting a scenario in which Earth will “affect the future of a hundred billion galaxies”. There may be a lesson there.Typo:

No, it’s supposed to say that. 10^80 is earlier defined as a small large number.

Maybe “smallishly large”? That makes it clearer that you are saying “(small-kind-of-large)-kind-of number”, not “number that is small and large”

I missed that. It’s bad enough notation that I expect others to stumble over it, too.

I think it’s good bad notation.

So it looks like the Pascal’s mugger problem can be reduced to two problems that need to be solved anyway for an FAI: how to be optimally rational given a finite amount of computing resources, and how to assign probabilities for mathematical statements in a reasonable way.

Does that sound right?

I’m not sure I agree with that one—where does the question of anthropic priors fit in? The question is how to assign probabilities to

physicalstatements in a reasonable way.You may be aware of the use of negative probabilities in machine learning and quantum mechanics and, of course, Economics. For the last, the existence of a Matrix Lord has such a large negative probability that it swamps his proffer (perhaps because it is altruistic?) and no money changes hands. In other words, there is nothing interesting here- it’s just that some type of decision theory haven’t incorporated negative probabilities yet. The reverse situation- Job’s complaint against God- is more interesting. It shows why variables with negative probabilities tend to disappear out of discourse to be replaced by the difference between two independent ‘normal’ variables- in this case Cosmic Justice is replaced by the I-Thou relationship of ‘God’ & ‘Man’.

Can you give me an example of something with negative probability?

I will offer you a bet: if it doesn’t happen, you have to give me a dollar, but if it does happen, you have to give me everything you own. I find it hard to believe that there’s anything where that’s considered good odds.

If it has such a large negative probability, wouldn’t you try to avoid ever giving someone five dollars, since they anti-might be a Matrix Lord, and you can’t risk a negative probability of them sparing 3^^^3 people?

Also, when you mention quantum mechanics, I think you’re confusing waveform density and probability density. The waveform can be any complex number, but the probability is proportional to the square of the magnitude of the waveform. If the waveform density is 1, −1, i, or -i, the probability of seeing the particle there is the same.

Quantum mechanics actually has lead to some study of negative probabilities, though I’m not familiar with the details. I agree that they don’t come up in the standard sort of QM and that they don’t seem helpful here.

TL:DRI don’t see why this is necessarily the case. What am I missing here?

Here is a Summary of what I understandso farA “correct” epistemology would satisfy our intuition that we should ignore the Pascal’s Mugger who doesn’t show any evidence, and pay the Matrix Lord, who snaps his fingers and shows his power.

The problem is that no matter how low a probability we assign to the mugger telling the truth, the mugger can name an arbitrarily large number of people to save, and thus make it worth it to pay him anyway. If we weigh the mugger’s claim at infinitesimally small, however, we won’t be sufficiently convinced by the Matrix Lord’s evidence.

The matter is further complicated by the fact that the number of people Matrix Lord claims to save suggests a universe which is so complex that it gets a major complexity penalty.

Here is my Attempt at solutionHere is the set of all possible universes

Each possible universe has a probability. They all add up to one. Since there are infinite possible universes, many of these universes have infinitesimally low probability. Bayes theorem adjusts the probability of each.

The Matrix Lord / person turning into a cat scenario is such that a universe which previously had an infinitesimally low probability now has a rather large likelihood.

What happens when a person turns into a cat?

All of the most likely hypothesis are suddenly eliminated, and everything changes.Working through some examples to demonstrate that this is a solutionYou have models U1, U2, U3...and so on. P(Un) is the probability that you live in Universe n. Your current priors:

P(U1) = 60%

P(U2) = 30%

P(U3) = epsilon

P(U4) = delta

...and so on.

Mr. Matrix turns into a cat or something. Now our hypothesis space is as follows:

P(U1) = 0

P(U2) = 0

P(U3) = 5% (previously Epsilon)

P(U4) = delta

In essence, the

utter eliminationofalltheremotely likelyhypothesis suddenly makes several universes which were previously epsilon/delta/arbitrarily small in probability much more convincing.Basically, if the scenario with the Time Lord happened to us, we aught to act in

approximatelythe same way that the idealized “rational agent” would act if it were givenno information whatsoever(so all prior probabilities are assigned using complexity alone), and then a voice from the sky suddenly specifies a hypothesis of arbitrarily high complexity from the space of possible universes and claims that it is true.Come to think of it, you might even think of your current memories as playing the role of the “voice from the sky”. There is no meta-prior saying you should trust your memories, but you have nothing else. Similarly, when Mr. Matrix turned into a cat, he eliminated

allyour non-extremely-unlikely hypotheses, so you have nothing to go on but his word.Eliezer:

Huh? You don’t need to

concludeanything whose prior probability was “on the order of one over googolplex.”You just need to believe it enough that it out-competes the suggested actions of any of the other hypotheses...and nearly

allthe hypothesis which had, prior to the miraculous event, non-negligible likelihood just gotfalsified, so there is very little competition...Even if the probability of the Matrix lord telling the truth is 1%, you’re still going to give him the five dollars, because there are infinite ways in which he could lie.

In fact, even if the universes in which the Matrix Lord is lying are

allsimpler than the one in which he is telling the truth, the actions proposed by the various kinds of lie-universes cancel each other out. (In one lie-universe, he actually saves only one person, in another equally likely lie-verse, he actually kills one person, and so on)When a rational agent makes the decision, it calculates the expected value of the intended action over

every possible universe, weighted by probability.By analogy:

If I tell you I’m going to pick a random natural number, and I

additionallytell you that there is a 1% chance that I pick “42”, and ask you to make a bet about which number comes up. You are going to bet “42″, because the chance that I pickany other numberis arbitrarily small...you can even try giving larger numbers a complexity penalty, it won’t change the problem.Anyevidence foranynumber that brings it up above “arbitrarily small” will do.Analogy still holds. Just pretend that there is a 99% chance that you misheard me when I said “42”, and I might have said any other number. You

stillend up betting on 42.“Robin Hanson has suggested that the logic of a leverage penalty should stem from the general improbability of individuals being in a unique position to affect many others (which is why I called it a leverage penalty).”

As I mentioned in a recent discussion post, I have difficulty accepting Robin’s solution as valid—for starters it has the semblance of possibly working in the case of people who care about people, because that’s a case that seems as it should be symmetrical, but how would it e.g. work for a Clippy who is tempted with the creation of paperclips? There’s no symmetry here because paperclips don’t think and Clippy knows paperclips don’t think.

And how would it work if the AI in question in asked to

evaluatewhether such a hypothetical offer should be accepted by a random individual or not? Robin’s anthropic solution says that the AI should judge that someoneelseought hypothetically take the offer, but it would judge the probabilities differently if it had to judge things in actual life. That sounds as if it ought violate basic principles of rationality?My effort to steelman Robin’s argument attempted to effectively replace “lives” with “structures of type X that the observer cares about and will be impacted”, and “unique position to affect” with “unique position of not directly observing”—hence Law of Visible Impact.

I think this is captured by the notion that a causal node should only improbably occupy a unique position on a causal graph?

Yeah, that’s probably generalized enough that it works, though I suppose it didn’t really quite click for me at first because I was focusing on Robin’s “ability to affect” as corresponding to the term “unique position”, and I was instead thinking of “inability to perceive”—but that’s also a unique position, so I suppose the causal node version you mention covers that indeed. Thanks.

That’s not at all how validity of physical theories is evaluated. Not even a little bit.

By that logic, you would have to reject most current theories. For example, Relativity restricted the maximum speed of travel, thus revealing that countless future generations will not be able to reach the stars. Archimedes’s discovery of the buoyancy laws enabled future naval battles and ocean faring, impacting billions so far (which is not a googolplex, but the day is still young). The discovery of fission and fusion still has the potential to destroy all those potential future lives. Same with computer research.

The only thing that matters in physics is the old mundane “fits current data, makes valid predictions”. Or at least has the potential to make testable predictions some time down the road. The only time you might want to bleed (mis)anthropic considerations into physics is when you have no way of evaluating the predictive power of various models and need to decide which one is worth pursuing. But that is not physics, it’s decision theory.

Once you have a testable working theory, your anthropic considerations are irrelevant for evaluating its validity.

That’s perfectly credible since it implies a lack of leverage.

10^10 is not a significant factor compared to the sensory experience of seeing something float in a bathtub.

To build an AI one must be a tad more formal than this, and once you start trying to be formal, you will soon find that you need a prior.

Oh, I assumed that negative leverage is still leverage. Given that it might amount to an equivalent of killing a googolplex of people, assuming you equate never being born with killing.

I see. I cannot comment on anything AI-related with any confidence. I thought we were talking about evaluating the likelihood of a certain model in physics to be accurate. In that latter case anthropic considerations seem irrelevant.

It’s likely that anything around today has a huge impact on the state of the future universe. As I understood the article, the leverage penalty requires considering how unique your opportunity to have the impact would be too, so Archimedes had a massive impact, but there have also been a massive number of people through history who would have had the chance to come up with the same theories had they not already been discovered, so you have to offset Archimedes leverage penalty by the fact that he wasn’t uniquely capable of having that leverage.

Neither was any other scientist in history ever, including the the one in the Eliezer’s dark energy example. Personally, I take a very dim view of applying anthropics to calculating probabilities of future events, and this is what Eliezer is doing.

Informally speaking, it seems like superexponential numbers of people shouldn’t be possible. If a person is some particular type of computation, and exactly identical copies of a person should only count once, then number of people is bounded by number of unique computations (exponential). It does not seem like the raw Kolmogorov complexity of the number will be the right complexity penalty if each person has to be a different computation.

I find it truly bizarre that nobody here seems to be taking MWI seriously. That is, it’s not 1 person handing over $5 or not, it’s all the branching possible futures of those possibilities. In other words, I hand over $5, then depending how my head radiates heat for the next second there are now many copies of me experiencing $5-less-ness.

How many? Well, answering that question may require a theory of magical reality fluid (or “measure”), but naively speaking it seems that it should be something more akin to 3^^^3 (or googolplex) than to 3^^^^3. So the problem may still exist; but this MWI issue certainly deserves consideration, and the fact that Eliezer didn’t apparently consider it makes me suspicious that he hasn’t thought as deeply about this as he claims. Even if throwing this additional factor of 3^^^3 into the mix doesn’t dissolve the problem entirely, it may well put it into the range where further arguments, such as earthwormchuck163′s “there aren’t 3^^^^3 different people”, could solve it.

Any reasonably useful decision theory ought to work in Newtonian worlds as well.

Damn right! I wish I could trade some of my karma for extra upvotes.

(This comment was originally written in response to shminux below, but it’s more directly addressing nshepperd’s point, so I’m moving it to here)

I understand that you’re arguing that a good decision theory should not rely on MWI. I accept that if you can build one without that reliance, you should; and, in that case, MWI is a red herring here.

But what if you can’t make a good decision theory that works the same with or without MWI? I think that in that case there are anthropic reasons that we

shouldprivilege MWI. That is:The fact that the universe apparently exists, and is apparently consistent with MWI, seems to indicate that an MWI universe is at least possible.

If this universe happens to be “smaller than MWI” for some reason (for instance, we discover a better theory tomorrow; or, we’re actually inside a sim that’s faking it somehow), there is some probability that “MWI or larger” does actually exist somewhere else. (You can motivate this by various kinds of handwaving: from Tegmark-Level-4 philosophizing; to the question of how a smaller-than-MWI simulator could have decided that a pseudo-MWI sim would be interesting; and probably other arguments).

If intelligence exists in both “smaller than MWI” domains and “MWI or larger” domains, anthropic arguments strongly suggest that we should assume we’re in one of the latter.

(And to summarize, in direct response to nshepperd:)

That’s probably true. But it’s not a good excuse to ignore how things would change if you are in an MWI world, as we seem to be.

If your decision theory doesn’t work independently of whether MWI is true or not, then what do you use to decide if MWI is true?

And if your decision theory

doesallow for both possibilities (and even if MWI somehow solved Pascal’s Mugging, which I also disagree with) then you would still only win if you assign somewhere around 1 in 3^^^3 probability to MWI being false. On what grounds could you possibly make such a claim?I’m not saying I have a decision theory at all. I’m saying that whatever your decision theory, MWI being true or not could in principle change the answers it gives.

And if there is some chance that MWI is true, and some chance that it is false, the MWI possibilities have a factor of ~3^^^3 in them. They dominate even if the chance of MWI is small, and far more so if the chance of it being false is small.

Wait, so you’re saying that if MWI is true, then keeping $5 is not only as good as, but

outweighssaving 3^^^3 lives by a huge factor?Does this also apply to regular muggers? You know, the gun-in-the-street, your-money-or-your-life kind? If not, what’s the difference?

No. I’m saying that if there’s (say) a 50% chance that MWI is true, then you can ignore the possibility that it isn’t; unless your decision theory somehow normalizes for the total quantity of people.

If you’ve decided MWI is true, and that measure is not conserved (ie, as the universe splits, there’s more total reality fluid to go around), then keeping $5 means keeping $5 in something like 3^^^3 or a googleplex or something universes. If Omega or Matrix Lord threatens to steal $5 from 3^^^3 people in individual, non-MWI sim-worlds, then that would … well, of course, not actually balance things out, because there’s a huge handwavy error in the exponent here, so one or the other is going to massively dominate, but you’d have to actually do some heavy calculation to try to figure out which side it is.

If there’s an ordinary mugger, then you have MWI going on (or not) independently of how you choose to respond, so it cancels out, and you can treat it as just a single instance.

But if Pascal’s Mugger decides to torture 3^^^3 people because you kept $5, he also does this in “something like 3^^^3 or a googleplex or something” universes. In other words, I don’t see why it doesn’t

alwayscancel out.I explicitly said that mugger stealing $5 happens “in individual, non-MWI sim-worlds”. I believe that a given deterministic algorithm, even if it happens to be running in 3^^^3 identical copies, counts as an individual world. You can stir in quantum noise explicitly, which effectively becomes part of the algorithm and thus splits it into many separate sims each with its own unique noise; but you can’t do that nearly fast enough to keep up with the quantum noise that’s being stirred into real physical humans.

Philosophy questions of what counts as a world aside, who told you that the mugger is running some algorithm (deterministic or otherwise)? How do you know the mugger doesn’t simply have 3^^^3 physical people stashed away somewhere, ready to torture, and prone to all the quantum branching that entails? How do you know you’re not just confused about the implications of quantum noise?

If there’s even a 1-in-a-googolplex chance you’re wrong about these things, then the disutility of the mugger’s threat is still proportional to the 3^^^3-tortured-people, just divided by a mere googolplex (I will be generous and say that if we assume you’re right, the disutility of the mugger’s threat is effectively zero). That still dominates every calculation you could make...

...and even if it didn’t, the mugger could just threaten 3^^^^^^^3 people instead. Any counter-argument that remains valid has to

scalewith the number of people threatened. Your argument does not so scale.At this point, we’re mostly both working with different implicitly-modified versions of the original problem, and so if we really wanted to get anywhere we’d have to be a lot more specific.

My original point was that a factor of MWI in the original problem

mightbe non-negligible, and should have been considered. I am acting as the Devil’s Concern Troll, a position which I claim is useful even though it bears a pretty low burden of proof. I do not deny that there are gaping holes in my argument as it relates to this post (though I think I am on significantly firmer ground if you were facing Galaxy Of Computronium Woman rather than Matrix Lord). But I think that if you look at what you yourself are arguing with the same skeptical eye, you’ll see that it is far from bulletproof.Admit it: when you read my objection, you knew the conclusion (I am wrong) before you’d fully constructed the argument. That kind of goal-directed thinking is irreplaceable for bridging large gaps. But when it leads you to dismiss factors of 3^^^3 or a googolplex as petty matters, that’s mighty dangerous territory.

For instance, if MWI means someone like you is legion, and the anthropic argument means you are more likely to be that someone rather than a non-MWI simulated pseudo-copy thereof, then you do have a pertinent question to ask the Matrix Lord: “You’re asking me to give you $5, but what if some copies of me do and others don’t?” If it answers, for instance, “I’ve turned off MWI for the duration of this challenge”, then the anthropic improbability of the situation just skyrocketed; not by anything like enough to outweigh the 3^^^^3 threat, but easily by enough to outweigh the improbability that you’re just hallucinating this (or that you’re just a figment of the imagination of the Matrix Lord as it idly considers whether to pose this problem for real, to the real you).

Again: if you look for the weakest, or worse, the most poorly-expressed part of what I’m saying, you can easily knock it down. But it’s better if you steel-man it; I don’t see where the correct response could possibly be “Factor of 3^^^3? Hadn’t considered that exactly, but it’s probably irrelevant, let’s see how.”

On an even more general level, my larger point is that I find that multiplicity (both MWI and Tegmark level 4) is a fruitful inspiration for morals and decision theory; more fruitful, in my experience, than simulations, Omega, Matrix Lords, and GOCW. Note that MWI and TL4, like Omega and GOCW, don’t have to be true or falsifiable in order to be useful as inspiration. My experience includes thinking about these matters more than most, but certainly less than people like Eliezer. Take that as you will.

I think we’re talking past each other, and future discussion will not be productive, so I’m tapping out now.

(Moved my reply, too)

This contradicts the premise that MWI is untestable experimentally, and is only a Bayesian necessity, the point of view Eliezer seems to hold. Indeed, if an MWI-based DT suggests a different course of action than a single-world one, then you can test the accuracy of each and find out whether MWI is a good model of this world. If furthermore one can show that no single-world DT is as accurate as a many-world one, I will be convinced.

It is also consistent with Christianity and invisible pink unicorns, why do you prefer to be MWI-mugged rather than Christ-mugged or unicorn-mugged?

No it doesn’t. DT is about what you should do, especially when we’re invoking Omega and Matrix Lords and the like. Which DT is better is not empirically testable.

Yes, except that MWI is the best theory currently available to explain mountains of experimental evidence, while Christianity is empirically disproven (“Look, wine, not blood!”) and invisible pink unicorns (and invisible, pink versions of Christianity) are incoherent and unfalsifiable.

(Later edit: “best theory currently available to explain mountains of experimental evidence” describes QM in general, not MWI. I have a hard time imagining a version of QM that doesn’t include some form of MWI, though, as shminux points out downthread, the details are far from being settled. Certainly I don’t think that there’s a lot to be gained by comparing MWI to invisible pink unicorns. Both have a p value that is neither 0 nor 1, but the similarity pretty much ends there.)

You ought to notice your confusion by now.

What is your level of understanding QM? Consider reading this post.

Re DT: OK, I notice I am confused.

Re MWI: My understanding of QM is quite good for someone who has never done the actual math. I realize that there are others whose understanding is vastly better. However, this debate is not about the equations of QM per se, but about the measure theory that tells you how “real” the different parts of them are. That is also an area where I’m no more than an advanced amateur, but it is also an area in which nobody in this discussion has the hallmarks of an expert. Which is why we’re using terms like “reality fluid”.

And my violin skills are quite good for someone who has never done the actual playing.

Different parts of what? Of equations? They are all equally real: together they form mathematical models necessary to describe observed data.

Eliezer is probably the only one who uses that and the full term is “magical reality fluid” or something similar, named this way specifically to remind him that he is confused about it.

I have actually done the math for simple toy cases like Bell’s inequality. But yeah, you’re right, I’m no expert.

(Out of curiousity, are you?)

ψ

I have a related degree, if that’s what you are asking.

I’m yet to see anyone writing down anything more than a handwaving of this in MWI. Zurek’s ideas of einselection and envariance go some ways toward showing why only the eigenstates survive when decoherence happens, and there is some experimental support for this, though the issue is far from settled.

Precisely; the issue is far from settled. That clearly doesn’t mean “any handwavy speculation is as good as any other” but it also doesn’t mean “speculation can be dismissed out of hand because we already understand this and you’re just wrong”.

Suppose 3^^^3 copies of you are generated in the first second after you decide. Each one will have $5 less as a result of your decision. (for the sake of argument, lets say your responsibility ends there) Let’s take a dollar as a utility unit, and say that by giving the matrix lord $5 you produce 5x3^^^3 disutility points across future worlds. But since everyone is producing copies at roughly the same rate (I think), any utility gained or lost is always multiplied by 3^^^3. This means that you can just cancel the 3^^^3 business out: for everyone you benefit, the positive utility points are also multiplied by 3^^^3, and so the result is the same.

Why was this downvoted? Because everyone knows that Matrix Lord simulations don’t actually follow MWI, they just seem to for the poor deluded scientists trapped inside? Sure, I know

that. But I was just saying, what if theydid. Riddle me that, downvoter person!Seriously: I’ve now posted variants of this idea (that MWI means we are all legion, which makes threats/promises involving simulations significantly less scary/enticing) at least 5 or 6 times, between here and Quora. And it’s downvoted to oblivion every time. Now, obviously, this makes me question whether there’s something stupid about the idea. But though I’m generally acknowledged to be not a stupid guy, I can’t see the fatal flaw. It’s very tempting to think that you cats are all just too mainstream to see the light, man. That kind of thinking has to overcome a large self-servingness penalty, which is why I state it in ridiculous terms, but unless someone can talk me down here, I’m close to embracing it.

So: what is so very wrong about this thought? Aside from the fact that it embraces two premises which are too unconventional for non-LW’ers, but reaches a conclusion that’s too mainstream for LW’ers?

And please, don’t downvote this comment without responding. I’m happy to take the karma penalty if I learn something, but if all you get for being wrong is downvoted, that’s just a dead end. So, to sweeten the pot: I will upvote any even-minimally-thoughtful response to this comment or to the one above.

I didn’t downvote, but I couldn’t see what MWI actually changed about the problem. The simulations are also subject to MWI, so you’re multiplying both sides of the comparison by the same large number. Hmm. Unless the simulations are implemented on quantum computers, which would minimize the branching. It’s not clear to me that you can mimic the algorithm without having the same degree of total decoherence.

No, the simulations are not subject to MWI. I mean, we don’t know what “matrix lord physics” is, but we have his word that there are 3^^^^3 individuals inside those simulations, and presumably that’s after any MWI effects are factored in.

If instead of Matrix Lord, we were just facing Galaxy Of Computronium Woman, we’d be even better off. She can presumably shift any given bit of her galaxy between quantum and normal computation mode, but it doesn’t help her. If GOCW is in normal computation mode, her computations are deterministic and thus not multiplied by MWI. And if she’s in quantum mode, she only gets a multiplier proportional to an exponential of the number of qubits she’s using. In order to get the full multiplier that ordinary made-of-matter you are getting naturally, she has to simulate everything about the quantum wave function of every particle in you and your environment. We don’t know how efficient her algorithms are for doing so, but presumably it takes her more than a gram of computronium to simulate a gram of normal matter at that level of detail, and arguably much more. Obviously she can do hybrid quantum/conventional tricks, but there’s nothing about the hybridization itself that increases her multiplier.

So you’re saying, what if MWI is just a local phenomenon to our world, and doesn’t apply to these 3^^^^3 other simulations that the matrix lords are working with, because they aren’t quantum in the first place?

I agree that in the case of a mere galaxy of computronium, it’s much less credible that one can simulate an extremely high number of people complex enough that we wouldn’t be able to prove that we aren’t them. In the former case, we’ve got much less information.

Unlike Eliezer, I very publicly do not privilege MWI on this site, but let’s assume that it’s “true” for the sake of argument. How many (subtly different) copies of you got offered the same deal? No way to tell. How many accept or reject it? Who knows. If there are 3^...^^3 copies of you who accepted, then the matrix lord has a lot of money (assuming they care for money) to do what it promised. But what if there are only 3^^^3 (or some other conveniently “small” number) of you who accept? Then you are back to the original problem. Until you have a believable model of this “magical reality fluid”, adding MWI into the mix gives you nothing.

(Note: this comment now moved to respond to nshepperd above)

This contradicts the premise that MWI is untestable experimentally, and is only a Bayesian necessity, the point of view Eliezer seems to hold. Indeed, if an MWI-based DT suggests a different course of action than a single-world one, then you can test the accuracy of each and find out whether MWI is a good model of this world. If furthermore one can show that no single-world DT is as accurate as a many-world one, I will be convinced.

it is also consistent with Christianity and invisible pink unicorns, why do you prefer to be MWI-mugged rather than Christ-mugged or unicorn-mugged?

Isn’t the thought that even if only one Homunq is offered the deal and accepts, the next few seconds will generate [insert some large number] of worlds in which Homunq copies have $5 less because of that one original Homunq’s decision? I don’t think Homunq means to refer to preexisting other worlds (which couldn’t be affected by his actions), but to the worlds that will be generated just after his decision.

They aren’t generated. The one world would be split up among the resulting worlds. The magical reality fluid (a.k.a. square amplitude) is conserved.

I strongly disagree that you can make that assumption; see my comment on your larger explanation for why.

Okay, thanks. But I don’t know what magical reality fluid is, so I don’t really understand you.

Before I answer, I’d like to know how much you do understand, so I can answer at an appropriate level. Is this a ‘I don’t know what’s going on here’ question, or is it a statement that you understand the system well enough that the basics no longer are convincingly basic?

The former, mostly. I’ve read the sequences on this point and done a little side reading on my own, but I don’t understand the math and I have no real education in quantum physics. In other words, I would really appreciate an explanation, but I will also entirely understand if this is more work than you’re prepared to put in.

To condense to a near-absurd degree:

QM indicates that if you take any old state of the universe, you can split it up any way you feel like. Take any state, and you can split it up as a sum of 2 or more other states (A = B + C + D+ E, say). If you then ‘run’ each of the parts separately (i.e. calculate what the future state would be, yielding B’, C’, D’, E’) and then combine the results by adding, it’s the same as if you ran the original (A’ = B’ + C’ + D’ + E’).

This is because QM is a

lineartheory. You can add and subtract and rescale entire states and those operations pass right through into the outcomes.This doesn’t mean that you won’t get any surprises if you make

predictionsbased on just B, C, D, and E individually, then add those together. In general, with arbitrary B, C, D, and E, combining them can yield things thatjust don’t happenwhen you’d expect that they would based on the parts individually (and other things that happen more than you’d expect, to compensate).Decoherence tells you how and when you can pick these B, C, D, and E so that you in fact won’t get any such surprises. That this is possible is how we can perceive a classical world made of the quantum world.

One tiny and in no way sufficient part of the technique of decoherence to require that B, C, D and E are all perpendicular to each other. What does that do? You can apply the Pythagorean theorem. When working with vectors In general, with A being the hypotenuse and B, C, D, and E the perpendicular vector components, we get AA = BB + CC + DD + EE (try doing this with three vectors near the corner of a room. Have a point suspended in air. Drop a line to the floor. Construct a right triangle from that point to one of the walls. You’ll get AA = WW + ZZ, then split W into X and Y, for AA = XX + YY + ZZ)

Anyway, what the Pythagorean theorem says is that if you take a vector and split it up into perpendicular components, one thing that

stays the sameis the sum of the squared magnitudes.And it turns out that if you do the math, the mathematical structure that works like

probabilityin QM-with-decoherence is proportional to this squared magnitude. This is the basis of calling this square magnitude ‘reality fluid’. It seems to be the measure of how much something actually happens—how real it is.Thanks, that’s really quite helpful. I take it then that the problem with Homunq’s objection is that all the subsequent ‘worlds’ would have the same total reality fluid as the one in which he made the distinction, and so the ‘splitting’ wouldn’t have any real effect on the total utility: $5 less for one person with reality R is the same disutility as $5 less for a [large number of] people with reality R/[large number]?

But maybe that’s not right. At the end, you talked about ‘how much reality fluid something has’ as being a matter of how much something happens. This makes sense as a way of talking about events, but what about substances? I gather that substances like people don’t see much play in the math of QM (and have no role in physics at all really), but in this case the questions seems relevant.

Your first paragraph is correct.

As for the second, well, substances are kind of made of colossal numbers of events in a convenient pattern such that it’s useful to talk about the pattern. Like, I’m not falling through my chair over and over and over again, and I anticipate this continuing to be the case… that and a bunch of other things lead me to think of the chair as substantial.

Right, but I’m not something that happens. The continuation of me into the next second might be something that happens, and so we might say that this continuation have more or less reality fluid, but I don’t know that the same can be said of

mesimpliciter. You might think that I am in fact something that happens, a series or pattern of events, but I think this a claim that would at least need some working out: one implication of this claim is that it takes time (in the way a motion takes time) to be me. But this is off the QM (maybe off the scientific) path, and I should say I very much appreciate your time thus far. I can’t take it personally if you don’t want to join me in some armchair speculation.Your thoughts are things that happen. Whatever’s doing those is you. I don’t see the problem.

But it seems problematic to say that I am my thoughts. I seem to persist in time despite changes in what I think, for example. Afew days ago, I thought worlds were ‘generated’ on the MWI view. I now no longer think that. I’m different as a result, but I’m not a different person. I wasn’t destroyed, or remade. (I don’t mean this to be a point specifically about human personal identity, this should apply to animals and plants and maybe blocks of wood too).

To reiterate my concern in the grandparent, if my thoughts are a process that takes time (as they seem to be), and I am my thoughts, then it takes time to be me. Being me would then be something interruptible, so that I could only get half way to being me. This is at least odd.

I don’t mean to suggest that this is a knock down argument or anything, it’ not. It’s little more than an armchair objection on the basis of natural language. But it’s the sort of thing for which this theory should have an answer. We might just discover that the temporal persistance or identity of macroscopic objects is a physically incoherent idea (like identity based on having a certain set of atoms). But if we do discover something radical like that, we should have something to say to ward off the idea that we’ve just misunderstood the question or changed the topic. Again, thanks for your indulgence.

You are a 4-dimensional region of spacetime. What you normally call ‘you’ is a mutually-spacelike-separated cut of this 4-dimensional region, but the whole reason for calling this slice special is because of causal chains that have extent in time. For instance, your hand is considered yours because your brain can tell it what to do*. That causal chain takes time to roll out.

if each of us had a partner and could control the other’s hands, the terms would probably soon switch so that your hands are the pair on their body, not the pair on your own body.

Do you think there is a meaningful distinction to be drawn between the kinds of things I can talk about via mutually-spacelike cuts (like arrangements, shapes, trombones, maybe dogs) versus the kinds of things I cannot talk about via mutually-spacelike cuts, like the motion of a fast-ball, Beethoven’s Ode to Joy, or the life of a star? Processes that take time versus...I donno,

things?I ask because natural language and my everyday experience of the world (unreliable or irrelevant though they may be to the question of physical reality) makes a great deal of fuss over this distinction.

There is a distinction, and you just gave it—some things are defined by their processes, and some things are not. Imagine instantaneously reducing something to an arbitrarily low temperature and leaving it that way forever as a substitute for stopping time, and see if the thing still counts as the same thing (this rule of thumb is not guaranteed to apply in all cases).

A frozen human body is not a human. It’s the corpse of a now-defunct human (will stay this way forever, so no cryonic restoration). So, the life—a process—is part of the definition of ‘human’. BUT since it was done instantaneously you could say it’s a corpse with a particular terminal mental state.

A trombone or triangle that’s reduced to epsilon kelvins is just a cold trombone or triangle.

A computer remains a computer, but it ceases to have any role-based identities like ’www.lesswrong.com′ or 230.126.52.85 (to name a random IP address). But, like the corpse, you can say it has a memory state corresponding to such roles.

Very interesting answer, thank you. So, for those things not defined by processes, is it unproblematic to talk about their being more or less real in terms of reality fluid?

Well, we haven’t exactly nailed down the ultimate nature of this magical reality fluid, but I don’t think that whether you define an object by shape or process changes how the magical reality fluid concept applies.

Alright, thanks for your time, and for correcting me on the MWI point. I found this very interesting and helpful.

What’s this “me” thing? Your thoughts are most likely reducible to an arrangement of neurons, their connections and electric potentials and chemical processes (ion channels opening and closing, Calcium and other ions going in and out of dendrites, electric potential rising and falling, electric impulses traveling back and forth, proteins and other substances being created, deposited and removed, etc.) Some of these processes are completely deterministic, others are chaotic, yet others are quantum-random (for example, ion channel opening and closing is due to quantum-mechanical tunneling effects). In that sense, your thoughts do take time, as it takes time for chemical and electrical effects to run their course. But what do you mean by “it takes time to be me”?

Let’s drop the talk of people, that’s too complicated. Really, I’m just asking about how ‘reality fluid’ talk gets applied to everyday things as opposed to ‘happenings’. The claim on the table is that everyday things (including people)

arehappenings, and I’m worried about that.Suppose ‘being a combustion engine’ meant actually firing a piston and rotating the drive shaft 360 degrees. If that what it meant to be a combustion engine, then if I interrupted the action of the piston after it had only rotated the drive shaft 180 degrees, the thing before me wouldn’t be a combustion engine. At best it would be sort of half way there. The reason being that on this account of combustion engines, it takes time to be a combustion engine (specifically, the time it takes for the drive shaft to rotate 360 degrees).

If we did talk about combustion engines this way, for example, it wouldn’t be possible to point to a combustion engine in a photograph. We could point to something that might be a sort of temporal part of a combustion engine, but a photograph (which shows us only a moment of time) couldn’t capture a combustion engine any more than it could capture a piece of music, or the rotation of a ball, or a free throw or anything that consists in being a kind of motion.

But, at least so far as I know, a combustion engine, unlike a motion, is not divisible into temporal parts. If all happenings take time and are divisible into temporal parts, and if combustion engines are not so divisible, then combustion engines are not happenings. If they’re not happenings, how does ‘reality fluid’ talk apply to them?

EDIT:

Really? That’s fascinating, I have to look that up.

A combustion engine is deterministic. The behavior of a combustion engine is defined by the underlying physics. If properly designed, tuned and started as prescribed, it will cause the drive shaft to rotate a number of turns. A complete specification of the engine is enough to predict what it will do. If you design something that gets stuck after half a turn, it’s not what most people would consider a proper combustion engine, despite outward appearances. If you want to use the term “reality fluid”, then its flow is determined by the initial conditions. You can call this flow “motion” if you like.

I think you think I’m saying something much more complicated than what I’m trying to say. Nothing I’m saying has anything to do with prediction, design, determinism, (not that I know of, anyway) and I’m certainly not saying that ‘reality fluid’ moves. By ‘motion’ I mean what happens when you throw a baseball.

The distinction I’m trying to draw is this: on the one hand, some things take time and have temporal parts (like a piece of music, a walk in the park, the life-cycle of a star, or the electrochemical processes in a neuron). Call these processes. These are opposed, on the other hand, to things which so far as I can see, don’t have temporal parts, like a trombone, a dog, an internal combustion engine, or a star. Call these fubs (I don’t have a good name).

If reality fluid is a way of talking about decoherence, and decoherence talk always involves distinctions of time, then can we use reality fluid talk to talk about how real fubs are? We could if all fubs were reducible to processes. That would be a surprising result. Are all fubs reducible to processes? If so, is this an eliminative reduction (fundamentally, there are no fubs)? If not...well, if not I have some other, even weirder questions.

You seem to have a philosophical approach to this, while I prefer instrumental reductionism. If a collection of “fubs” plus the rules of their behavior predict what these fubs do at any point in time, why do you need to worry about some “temporal parts”? If you take an MP3 file and a music player and press “start”, you will have music playing. If this time stuff sounds mysterious, consider Eliezer’s timeless picture, where these fubs are slices of the flow. You can generalize it somewhat to quantum things, but there will be gaps (denied by handwaving MWIers, explicit in shut-up-and-calculate), hence the probabilistic nature of it.

We share the impression that the right answer will be a reductive, empirically grounded one. We might differ on the instrumentalism part: I really do want to know what the furniture of the universe is. I have no intended use for such knowledge, and its predictive power is not so important. So far as I understand instrumentalism, you might just reply that I’m barking up the wrong tree. But in case I’m not...

But let me ask this question again directly, because I think I need an answer to understand where you’re coming from: are fubs (everyday objects like tables and chairs and people, or if you like elementary particles or whatever) reducible to processes at some level of physical explanation? Or is the whole idea of a fub incoherent? Is the question somehow incoherent? Or would you guess that when we arrive at the right physical theory, it will include reference to both processes (like decoherence, motion, heating, etc.) and fubs?

Hmm, I’m not sure how to avoid repeating myself. I’ve already said, and so has Luke_A_Somers, that “fubs” are 3d spatial slices of 4d spacetime regions. If this statement does not make sense to you, we can try to dissect it further. is there a particular part of it that is problematic?

Ah! I didn’t catch that. Thanks. Suppose a man-made satellite (Fubly 1) is released into (non-geosynchronous) orbit around the earth directly over Phoenix, Arizona. Each time it orbits the earth, it passes over Phoenix, and we can count its orbits this way. One orbit of Fubly 1 is extended in time in the sense that it takes one month (say) to get around the whole planet. In any time less than one month, the orbit is incomplete. So the orbit of Fubly 1 is temporally divisibile in the sense that if I divide it in half, I get two things neither of which is an orbit of Fubly 1, but both of which are parts of an orbit of Fubly 1.

Now, Fubly 1 itself seems different. Suppose Fubly 1 only completes one orbit and then is destroyed. Supposing it’s assembled and then immediately released, the spaciotemporal region that is Fubly 1 and the spaciotemporal region that is the orbit of Fubly 1 have the same extension in time. If I divide the spaciotemporal region of the orbit in half, time-wise, I get two halves of an orbit. If I divide the spacio-temporal region of Fubly 1 itself, I don’t get two halves of a satellite. Fubly 1 can’t be divided time-wise in the way its orbit and its lifespan can. Does that make any sense? My question, in case it does, is this ’Is the distinction I’ve just made likely to be meaningful in the correct physics, or is this a mere artifact of intuition and natural language?

It’s already the result of such a division. As for orbits and lifespans, they are not physical objects but rather logical abstractions, just like language is (as opposed to the air released from the mouth of the speaker and the pressure waves hitting the ear of the listener).

If you mean that Fubly 1 is a given 3d slice, can Fubly 1 persist through time? I mean that if we take two temporally different 3d slices (one at noon, the other at 1:00PM), would they be the same Fubly 1? I suppose if we were to call them ‘the same’ it would be in virtue of a sameness of their 3d properties, abstracted from their temporal positions.

I don’t know what sameness is, sorry. It’s not a definition I have encountered in physics, and SEP is silent on the issue, as well. I sort of understand it intuitively, but I am not sure how you formalize it. Maybe you can think about it in terms of the non-conservation of the coarse grained area around the evolved distribution function, similar to the way Eliezer discussed the Liouville theorem in his Quantum Sequence. Maybe similar areas correspond to more sameness, or something. But this is a wild speculation, I haven’t tried to work through this.

Well, thanks for discussing it, I appreciate the time you took. I’ll look over that sequence post.

Good explanation. But you’re assuming a theory in which “reality fluid” is conserved. To me, that seems obviously wrong (and thus even more obviously unproven). I mean, if that were true, my experiences would be getting rapidly and exponentially less real as time progresses and I decohere with more and more parts of the wave function.

I acknowledge that it is difficult to make probability work right in MWI. I have an intuitive understanding which feels as if it works to me, that does not conserve “reality fluid”; but I’m not so unwise as to imagine that a solid intuition is worth a hill of beans in these domains. But again, your theory where “reality fluid” is equal to squared amplitude seems to me probably provably wrong, and definitely not proven right. And it was not the assumption I was working under.

Well, yes, I’m assuming that QM is correct.

That’s kind of the point:we’re talking about predictions of QM.No… why do you think that you would be able to

feelit? It seems to me rather like the argument that the Earth can’t be moving since we don’t feel a strong wind.An important part of QM being a linear theory is that it is

100% independent of overall amplitude. Scale everything up or down by an arbitrary (finite nonzero) factor and all the bits on the inside work exactly the same.So, whether something likely happens or something unlikely happens, the only difference between those two outcomes is a matter of scale and whatever it was that happened differently.

QM has no “reality fluid”. The whole point of calling it “reality fluid” is to remind yourself that it’s standing in for some assumptions about measure theory which are fuzzy and unproven.

My own (equally fuzzy and unproven) notion about measure theory is that anything which has nonzero amplitude, exists. Yes, you can then ask why probabilistic predictions seem to work, while my measure theory would seem to suggest that everything should be

^{50}⁄_{50}(“maybe it happens, maybe it doesn’t; that’s 50/50”). But I believe that there is some form of entropy in the wave function, and that probable outcomes are high-entropy outcomes. No, I obviously don’t have the math on this worked out; but neither do you on the “reality fluid”.I could easily be wrong. So could you. Probably, we both are. Measure theory is not a solved problem.

QM may not have ‘reality fluid’, but the thing we’re tongue-in-cheek

calling‘reality fluid’ is conserved under QM!Right, I should have been clearer. What I meant is that s/he is privileging one aspect of MWI from unimaginably many, and I simply pointed out another one just as valid, but one that s/he overlooked. Once you start speculating about the structure of Many Worlds, you can come up with as many points and counterpoints as you like, all on the same footing (of the same complexity).

I don’t think I had overlooked the point you brought up: I said ”...naively speaking it seems that [MWI] should be something more akin to 3^^^3 (or googolplex) than to 3^^^^3. So the problem may still exist...”

As to the idea that everything is just a hopeless mess once you bring MWI into it: that may indeed be a reason that this entire discussion is irresolvable and pointless, or it may be that the “MWI” factors precisely balance out on either side of the argument; but there’s no reason to assume that either of those is true until you’ve explored the issue carefully.

As I said, I don’t think MWI leads to really large numbers of copies; back-of-the-envelope calculations suggest it should be “closer to” 3^^^3 or googlplex than to 3^^^^3. So yes: I tried to indicate that this idea does NOT solve the dilemma on its own. However, even if 3^^^^3 is so big as to make 3^^^3 look tiny, the latter is still not negligible, and deserves at least a mention. If Eleizer had mentioned it and dismissed it, I would have no objection. But I think it is notable that he did not.

For instance: Say that there earthwormchuck163 is right and there are fewer than 3^^^^3 intelligent beings possible before you start to duplicate. For instance say it’s (x^^^x)^y, and that due to MWI there are (x^^^x) copies of a regular human spawned per fortnight. So MWI is reducing Matrix Lord’s threat from (x^^^x)^y to (x^^^x)^(y-1). Doesn’t seem like a big change; but if you suppose that only one of them is decisive for this particular Matrix Lord threat, you’ve just changed the cost/benefit ratio from order-of-1 to order-of-1/(x^^^x), which is a big shift.

I know that there are a number of possible objections to that specific argument. For instance, it’s relying on the symmetry of intelligence; if Matrix Lord were offering 3^^^^3 paperclips to clippy, it wouldn’t help figure out the clipperific thing to do. The intent is not to make a convincing argument, but simply to demonstrate that a factor on the order of x^^^x can in principle be significant, even when the threat is on the order of 3^^^^3.

Someone who reacts to gap in the sky with “its most likely a hallucination” may, with incredibly low probability, encounter the described hypothetical where it is not a hallucination, and lose out. Yet this person would perform much more optimally when their drink got spiced with LSD or if they naturally developed an equivalent fault.

And of course the issue is that maximum or even typical impact of faulty belief processing which is described here could be far larger than $5 - the hypothesis could have required you to give away everything, to work harder than you normally would and give away income, or worse, to kill someone. And if it is processed with disregard for probability of a fault, such dangerous failure modes are rendered more likely.

This is true, but the real question here is how to fix a non-convergent utility calculation.

One of the points in the post was a dramatically non Bayesian dismissal of updates on the possibility of hallucination. An agent of finite reliability faces a tradeoff between it’s behaviour under failure and it’s behaviour in unlikely circumstances.

With regards to fixing up probabilities, there is an issue that early in it’s life, an agent is uniquely positioned to influence it’s future. Every elderly agent goes through early life; while the probability of finding your atheist variation on the theme of immaterial soul in the early age agent is low, the probability that an agent will be making decisions at an early age is 1, and its not quite clear that we could use this low probability. (It may be more reasonable to assign low probability to an incredibly long lifespan though, in the manner similar to the speed prior).

What Eliezer is actually saying about this kind of hallucination:

The kind of ‘hallucination’ that is discussed in the posts is more about the issues of being forced you believe you are a boltzmann brain or a descendent human who is seamlessly hallucinating being an ‘ancestor’ before being able to believe that it is likely that there will be many humans in the future. This is an entirely different kind of issue.

I question whether keeping probabilities summing to one is a valid justification for acting as if the mugger being honest has a probability of roughly 1/3^^^3. Since we know that due to our imperfect reasoning, the probability is greater than 1/3^^^3, we know that the expected value of giving the mugger $5 is unimaginably large. Of course, acknowledging this fact causes our probabilities to sum to above one, but this seems like a small price to pay.

Edit: Could someone explain why I’ve lost points for this?

You lost points because nothing you said even begins to address the problem. You seem to be arguing that contradicting ourselves isn’t that bad, which might be defensible if we observed that some particular type of improper prior got good results in practice. (Though Eliezer would still argue against using it unless you’ve tried and failed to find a better way.) But here we want to know:

whether or not we have a reason to act on bizarre claims like the mugger’s—which we presumably don’t if the argument for doing so is incoherent

what principle we could use to

rejectthe mugger’s unhelpful and intuitively ridiculous demand without causing problems elsewhere.On a side-note, we don’t care whether this seemingly crazy person is “honest”, but whether his claim is correct (or whether paying him has higher expected value than not).