Goodhart’s Law states: When a measure becomes a target, it ceases to be a good measure.
Take evaluating real estate as an example.
When most people buying/renting a house or a condo, the internet search order goes something like:
- Pick the location
- Pick the price range
- Pick the square footage (or number of bedrooms)
- Finally, bring in other parameters, such as a garden, backyard, etc.
These are all, in their way, good measures of a place to live. However, when they become targets, it starts to miss the point.
Rory Sutherland gives a wonderful example of when he moved into an apartment designed by Robert Adam, a Scottish neoclassical architect in the 1700s and one of the best of his era.
Eighteen years ago, when we had twins and decided to move out of London, my wife discovered a four-bedroom apartment in the roof of a Robert Adam house a mile outside the M25. To our astonishment, it was barely more expensive than ordinary housing of similar size nearby. I recently asked my neighbour, an economist, what premium we pay for the house’s architectural quality. ‘Between 0 and 5 per cent,’ he estimated.
Architectural quality is decisive only at the end stages of looking for a place to live when people have already filtered their choice down to a dozen places that match their previous criteria. Since it is difficult to quantify architectural quality, it gets relegated to an afterthought.
My current apartment has far more natural light and higher ceilings than my previous place but it’s cheaper because it has fewer square feet.
Because things like location, price, and square footage are easily measurable, they have become not just measures, but targets. There is nothing wrong with them as measures, but living in a large house in a nice location at an affordable price that has no light, low ceilings, and feels sterile is a much worse lived experience than the alternative. It’s just that things like natural light or architectural style are hard to quantify, illegible, and so they tend to get ignored.
Imagine what the art market would look like if it worked this way. “I want a painting that’s 2ft x 4ft, painted in France, and features some women in dresses.” Only in the fifth iteration or so would we decide between artists!
A measure is always a proxy. It is a form of compression that loses some of its original fidelity. Revenue is an important measure of a business’s health, but it is not the only one. There’s nothing wrong with using measures as proxies as long as you recognize that’s what’s going on.
However, when we forget that they are simplifications of a much more complex underlying reality and make them into targets, they can cease to become good measures. Real Estate developers know that location, size and price are the most important factors.
Not surprisingly, most of them develop big houses in good locations while minimizing the costs that go into things like creative or unique architectural approaches. When these measures become targets, you get massive, soulless developments that look great on a listing page but the lived experience is far inferior, a sort of Stepford vibe.
In the case of buying a house, Goodhart’s Law seems fairly innocuous, does it really matter that much that people don’t tend to buy beautiful houses?
Consider another, more pernicious, example of Goodhart’s Law: the use of simplified measures in investing and investment management. Take, as a single example, the Sharpe Ratio.
The Sharpe ratio is the asset management industry’s go-to statistic for summarizing achieved (or back-tested) performance. According to Institutional Investor magazine, it is the most-cited reason to hire or fire individual money managers or invest in an asset.
So, an asset with a high historical return that hasn’t been very volatile has a high Sharpe ratio while something with a comparatively lower return and more volatility has a lower Sharpe.
This has a reasonable logic underlying it, investments usually don’t go to zero at a slow and steady rate, it tends to be accompanied by volatility and so using volatility as one proxy for risk is not entirely unreasonable.
However, using it as a target becomes deeply problematic. It is Goodhart’s Law in all its glory and ugly consequences: responsible for many billions (trillions?) in unexpected losses.
First, Sharpe isn’t a measure of risk in the way any rational person would define risk. The Sharpe ratio was originally called “reward-to-variability” because volatility is not the same thing as risk. In 2007, volatility measures would have told you that U.S. mortgage bonds had never been safer, on a risk-adjusted basis. There had been very little volatility over the past few decades.
This would be like saying “because this area hasn’t had a forest fire in a long time, it is unlikely to have forest fires in the future.” It’s possible that is true – maybe the area is a swamp full of moisture that will never catch on fire. But, as many Californians found out in 2020, it may just be that there’s a whole lot of dry tinder waiting to spark off.
What matters is not the volatility of an investment over the past few years (or even a few decades), but the risk of ruin – can it go to zero?
If you knew with 100% confidence that an investment had no chance of going to zero and that it would generate 20% returns every year (not possible to know, but as a thought experiment…), but it had large 5% volatile swings from day-to-day, would you consider that risky?
According to the Sharpe ratio, that would be riskier than something which could go to zero, but had less historical volatility.
Second, a long period of good risk-adjusted returns (whether risk is defined by Sharpe or something else) doesn’t mean anything about the future.
Long-Term Capital Management boasted a glowing 4.35 Sharpe ratio (anything above a Sharpe ratio of 1 is considered good. 4.35 is amazing) before it collapsed in 1998, losing $4.6 billion and nearly taking the financial system down with it. There’s a lesson here: We don’t know what the future will look like, but we can be sure it won’t look like the past. History is a non-ergodic process.
Six months before hedge fund Malachite Capital Management’s spectacular failure (they lost about a billion dollars), consultants were recommending it as a “diversifying strategy.” Malachite’s extremely attractive Sharpe (around 1.2) made it easy to sell but certainly did not capture the fund’s true risk.
Just like real estate developers build houses to fit the existing measures, some investment managers manipulate their products to engineer a favorable Sharpe statistic. They are akin to teachers who teach to the test, not to actually understanding the subject.
Consider an ETF provider that launched maybe five years that will remain nameless whose main products are Cannabis, Uranium and E-commerce ETF.
Did this person in a burst of vision in the early 2010’s realize the wave of the next half decade would include Cannabis, uranium and e-commerce and launch these ETFs? No, they launched a bunch of random ones and killed the ones that didn’t have good returns and Sharpe ratios. If you’d taken financial advice from this person, you would not be happy with the outcome.
This is how Goodhart’s Law functions. Because investors only look at one or two simplified metrics (typically return and one measure of risk, usually Sharpe), the market (rationally) crafts products to fit that even though it’s worse for everyone in the long-term.
The third problem with the Sharpe ratio is that investors tend to look at the return per unit of risk of an individual investment as opposed to their overall portfolio.
In the same way that someone looking for a house first screens by location, price and size, most investors evaluate investments by a small number of factors. Most investors just look at returns and some measure of risk like Sharpe and that’s it. They never look at a strategy’s holistic effect on their portfolio, however, that’s really all that matters in the long run.
Imagine you have an opportunity to buy an asset, “Zig,” whose price will either go up 10% or down 4% over the next 12 months. There is an equal chance of it going up or down.
In statistical terms, Zig has an expected return of 3%. A 50% chance of going up 10% and a 50% chance of going down 4% works out to an expected return of 3% on average. However, in any given 12-month period it won’t actually earn that “expected” 6% — it will either gain 10% or lose 4%.
There are no predictable patterns to Zig’s ups and downs. It may go up 10% three years in a row, then down 4% for two years, then up four years and back down for three. Zig feels like a risky investment, what if it goes down the next three or four years? Even though it may feel like it should have a positive outcome in the long run, many “down” years could happen before an “up” year and you may be left wondering whether Zig really behaves like you expected, or if your analysis was wrong. On its own, Zig looks kind of risky. It would have a Sharpe ratio of about ~0.43 (assuming a risk-free rate of zero and equal numbers of up and down years).
Now imagine another investment, Zag, which also goes up 10% or down 4% per year. But here’s the interesting thing: Whenever Zig has an “up” year, Zag has a “down” year, and vice versa. If Zig goes up by 10%, Zag goes down by 4%. And if Zig goes down by 4%, Zag goes up by 10%. Since it has the same profile as Zig, Zag also looks kind of risky and would have the same Sharpe ratio of about 0.43 (again, assuming a risk-free rate of zero and equal numbers of up and down years).
However, what if you combined them? If you invest equal amounts of money in Zig and Zag — with ups and downs that are perfectly offsetting — you would earn 3% every year as long as you rebalance between them.
Most investors fail to understand that combining two seemingly “risky” but negatively correlated assets, actually creates a less risky portfolio. The Zig+Zag portfolio has a Sharpe ratio of, effectively, infinity.1
Though there are no such guarantees of two assets being perfectly negatively correlated in the real world, of course, it is a good toy model for the power of constructing a portfolio of truly diversified, negatively correlated assets.
This means it is impossible to know if something is a good investment for an individual without understanding what the rest of their portfolio looks like. If your entire portfolio is in cash because you are terrified of another 2008, then getting a little exposure to stocks may be a good idea. If your entire portfolio is in Tesla stock and Bitcoin and you work at a Bitcoin company and your spouse works at Tesla then you should probably hedge that out some and maybe hold some more cash.
The whole is greater than the sum of its parts: a portfolio of seemingly risky assets that are not correlated to each other actually creates a safe portfolio.
A combination of individual assets with good historical returns or high Sharpe ratios doesn’t necessarily result in a portfolio with a good Sharpe ratio. On the contrary, strategies and asset classes that have performed well over a period likely share exposure to something in common. If Zig has had three positive years in a row, it’s recent Sharpe ratio will look excellent and many investors will build a portfolio of Zig 1, Zig 2, and Zig 3.
In that case, the whole is worse than the sum of its parts: three assets which look less risky by themselves create a more risky portfolio.
Strategies that generate steady profits punctuated by periods of sharp losses are in vogue right now across a range of asset classes. As bond yields have fallen since the financial crisis, investors have looked for ways to increase returns. One common one is shorting volatility, effectively selling insurance against stock market crashes. Until March of 2020, the Sharpe ratio for many of these types of strategies is fantastic.
But when losses do occur, they tend to quickly spiral into giant, brutal wounds. As Alberta, Canada’s public investment arm learned in March 2020, it’s not very hard to lose a couple of billion selling volatility. That’s upward of C$480 ($363) per woman, man, and child in the province; but who’s counting? It seemed so safe. The Sharpe ratio was amazing. Until…
And, to be honest, I am probably being particularly generous with how most people make investment decisions. Most people just look at historical returns and hold their thumb up in the air and do some mental calculation of risk that is typically like “do my friends and neighbors do this and if so, then I’m sure one of them did due diligence and it’s not too risky” (Spoiler: nobody does due diligence because everyone assumes everyone else did it!).
Focusing purely on returns is an even more dramatic example of Goodhart’s Law. When returns become the only measure that matters, people tend to end up with highly correlated portfolios where everything goes down at once, often in a spectacular and tragic fashion.
The truth is that reality has a surprising amount of detail. Measures like square footage or Sharpe ratios or historical returns can be useful measures, but they should never become the target.
Recognizing where Goodhart’s law is at play can be a big advantage, it creates what I call an illegibility arbitrage. There are many areas where Goodhart’s Law takes effect and you can benefit from recognizing which things aren’t being measured, but are very valuable. In the case of choosing where to live, you might be able to get a home that is somewhat smaller but vastly more comfortable and livable for a lower price than a larger, more sterile place.
In the case of your portfolio, it could mean the difference between a sustainable long-term approach and holding a ticking time bomb.
Because reality has a surprising amount of detail, we can never fully understand the nuances of every last thing. This is why Goodhart’s law exists. In an incredibly complex world, we must rely on simplified measures.
For the most important areas of your life (your money, your health, and where you live are good candidates), I think it’s worth learning at least enough about the basics to be able to ask intelligent questions and see where the illegibility arbitrage can be exploited.
I’ll never understand human biology as well as a doctor, but I think benefit a lot from spending a couple of hours before a doctor’s appointment reading or listening to podcasts and making a list of questions to ask them. (If you do this, you will be dismayed by how few doctors can actually respond to them intelligently).
If an investment advisor is pointing to a strategy’s Sharpe ratio as proof of its value, you want to be able to ask them about whether there is a hidden risk of ruin? What are the fundamental drivers of its correlations to other asset classes or strategies? How would it affect the rest of the portfolio, not just act as a standalone?
We can never fully understand the complexity of the world around us, but having some epistemic humility and admitting what you do and don’t know is a tremendous long-term advantage.
Last Updated on May 5, 2022 by Taylor Pearson