Introducing Bayesian Regret

tl;dr bayesian regret == opportunity cost, but economists had better PR

I encountered Bayesian Regret in two specific and rather odd corners: voting reformists who support Range Voting¹ use it to argue that despite theoretical flaws their process results in the most actual democracy². Pre-LLM AI engineers also talked about it–called reinforcement learning.

When you are trying to convince people to stop using gamed voting systems with high spoil rates–you give up, because internet autists will scream something about “later no harm” and purity spiral until you’re just back to First Past The Post and everyone slamming between two heads of the same hydra.

When you are trying to teach computers to make decisions good: you have to actually do the math.

Rewards and Punishments#

Everyone has an intuitive understanding of what it means to be rewarded: some event happens, some token of pleasure is dispensed. Punishment is much the same: some event happens, some token of pain is dispensed. Seeking out rewards and avoiding pain is basic neurological wiring. Then comes the field of behaviorism which argues you can elicit compliance from people by shaping the rewards and punishments in their environment.

Pavlov demonstates on dogs that when the feedback cycle is extremely short those behaviors become autonomous. The bell signals pain, the steak signals tasty meat, but there is no nuance or complexity. Pavlov cannot affect long-term planning decisions.

Skinner³ demonstrates on birds that picking an arbitrary behavior and rewarding it will cause attractors to form near that behavior. If something does the behavior designated as to-be-rewarded then it gets a reward and slowly aligns towards doing it again. Skinner cannot affect what does not just happen to luck across the victory line–it requires traction to respond to.

Seligman notes that if penalty and reward don’t correlate to behaviors at all–you just get a complete collapse of the state space. Entropy claims all decisions as doing nothing is the most energy-efficient way to be unable to affect anything.

Bayesian Regret#

What people don’t have intuitive understandings of is performance relative to what was genuinely possible. Here enters a concept called bayesian regret. Bayesian Regret asks “what was the best realistic or theoretical outcome?” Then asks “how much worse did we do than that?”

Say we’re at a silly carnival game and the maximum balloons that you can pop to win prizes was 100 balloons. The theoretical maxima of any outcome would be to pop all 100 of those balloons. We might use a strategy of halfassedly throwing needles at them and pop only 20 balloons. In this case our regret would be scored as 80–the difference from our outcome to what the best possible outcome was.

Another strategy might be to simply walk up to the balloon and pop it with the needle directly. This is much easier to do and you pop all 100 balloons. In this case our regret is zero. We accomplished the best outcome possible.

Now imagine someone else pops 120 balloons. That number is bigger, so people are taking all of your social status and transferring it on to the second popper. The second popper used the throwing strategy from before on a room with many, many more balloons.

This leads to something stupid we’re going to call the CEO problem for now. A CEO with a terrible market terrain (only 100 balloons to work with) can do a perfect job. That job will still appear worse than his successor who inherits a strong firm and improving economy–then does a terrible job. The tide raises all ships and the second popper is heralded as a hero because lines went up. Then he goes to write books about “stack ranking” and everyone takes it as gospel.

Behaviorists don’t like to talk about this one.

Theoretical Profits & Losses#

Economists and strategists sometimes refer to something called opportunty cost. The idea is that you can choose one of three options and the payment is what you lost not picking the others. The opportunity cost for staying around to paint a house is that you get left out of seeing the movie with friends. The idea here being that a profitable choice may not have been the most profitable choice and therefore is still objectively wrong⁴ despite seeming very good.

This is exactly bayesian regret in an economist trenchcoat.

If we flip the terms then not gaining as much as optimal can be seen as a loss. This is exactly the argument used by the RIAA et all in old music piracy lawsuits–a lot of unpaid downloads occurred, which theoretically could have been paid sales but were not, therefore they lost infinity trillion dollars. Then there is a dispute about whether the losses were real⁵ and the judge sides with the RIAA because he’s in an undeclared conflict of interest⁶ and the points don’t matter.

Though a final note I will say on this is to be very careful. Real losses exist. Punishment and pain happen, injuries happen, capabilities are lost. These are real and tangible outcomes that are serious⁷. Simply not getting as much as you could have is never as serious as actual hazards being realized.

Re-summarized#

Regret is the difference from how well you did to how well you could possibly or realistically have done⁸
A lack of getting a thing can be conversely framed as a theoretical loss of the thing (you can treat regret as a “loss function”)
It is possible to model and calculate these things
Keep in mind not getting what you want is not the same as losing what you had

The olympic’s voting system. Judges write down a score from 1 to 10 and the average of all scores is the answer. Its simple and it does work quite well in practice. ↩︎
A grand popularity contest where PR firms and spy agencies convince uninformed peasantry why selling their future to adversaries is good actually. ↩︎
B. F. Skinner. Famous bird scientist. Inventor of the “Skinner box,” the premier way to imprison people’s attention when violence isn’t an option. ↩︎
At least in a post Dodge v. Ford world where companies no longer exist to accomplish objectives profitably but to simply make lines on charts go up for shareholders. ↩︎
Economic friction suggests that cost destroys activity that isn’t worth doing. So the argument that an unpaid download is a lost sale is muddied by the idea that they would have bought it if they couldn’t steal it. While this does happen, its incalculable how many of these “sales” were never going to happen in the first place. For example, I don’t buy or download games with Denuvo period and this protest behavior will never show up in Excel balance sheets. ↩︎
Judges are apparently allowed to be members of rightsholding groups and hear cases that the rightsholding group has clearly stated desired outcomes for. Despite being the most obvious conflict of interest in the universe this is not considered a capital failure in the legal system. ↩︎
“System-Theoretic Process Analysis” or STPA is a structured method of enumerating losses that can occur, what harms can cause those losses, what processes a proposal does and which hazards those processes embody. It’s a rubric published by MIT for making sure people keep all their limbs. ↩︎
A certain dose of reality is needed here. Just beware to thread the needle that discounting cases for being unrealistic is because they are genuinely improbable–lest you write off very probably outcomes in your post-loss depression while trying to make yourself feel better about it. ↩︎