Sunday, March 20, 2016

Economists Get Real About Measuring Inequality

Inequality is exceptionally hard to measure: you need lots of data, on lots of different people, in lots of different circumstances, and then a bunch of high level tools to figure it out.

Politicians, bureaucrats, and advocates claims about inequality should probably be dismissed out of hand as not credible.

Having said that, it doesn’t take a rocket scientist to recognize that there’s some.

There’s four big easy ideas, right at the top of the list, that should be grasped by every student at this level.


Income and wealth are not the same. Income is a flow, wealth is a stock. Economists know a lot more about income, because it’s in everyone’s paychecks, and the government keeps tabs on most of us. Economists know a lot less about wealth because it’s harder to value (for example, what your car is actually worth isn’t known until someone buys it from you).

But, what you know about your neighbors is mostly about wealth. You don’t see their paychecks. You don’t know all their sources of income. But their wealth is sitting right there in front of you.

So, we worry about what is mostly wealth inequality, and then politicians, bureaucrats and advocates do bait and switch and tell us they can fix that by going after income. And they do.


What we really want to be more equal is consumption not income (both are flows). We could get a lot less income inequality if we went and punished people for working long hours … you know … go watch more TV or we’ll get the IRS after you. But we don’t do that because it’s dumb. We could also get less income inequality by making sure the poor worked more … you know … no TV for you until you put in more hours. Somehow our society seems to think this one is a little better, but I don’t always agree with it.

Think about it, who bugs you more, the rich person who eats in fancy restaurants, or the rich person who eats at home? It’s not their income that bugs us, it’s what they do with it.

This means we need to be concerned about consumption inequality. So, politicians, bureaucrats, and advocates … focus instead on income. Learn to ask them why. The answers aren’t pretty.


A lot of inequality is between young and old. Some of the poorest zip codes in the country are located in … college towns (because of the low income of most students). Do note that I do not want to diminish the areas of real poverty, I just want to point out that the data isn’t what we think it is. And, a lot of the rich live in coastal Florida, south Texas, Arizona, and southern California. Hmmm … the places that attract retirees.

But the old vote. So the response of politicians, bureaucrats, and advocates has been to shelter a lot of their income (which is often lower because the house is paid off and the kids are done with college anyway) from redistribution. One of the biggest sources of consumption for the old is … Medicare.

BTW: our fiscal problems with Medicare are much larger than those with Social Security (even though that gets more press). The reason is that Medicare is open ended during any given month for senior, while their Social Security checks have known and stable values.


We already do a lot to reduce inequality. We can debate about whether that’s enough or not, but we can’t deny that we’re trying.

This means that if we want to assess whether inequality is getting better or worse, we have to do it after taxes have been collected, and redistributed at transfer payments.

That’s sensible. The only reason to not do that would be that you really don’t want to know the answer.

So what do our politicians, bureaucrats, and advocates do? Base almost all of their discussion and policy prescriptions on pre-tax, pre-transfer, income.


All of that was a preface. There’s a new paper out that does a better job at this. It’s entitled “U.S. Inequality, Fiscal Progressivity, and Work Disincentives: An Intragenerational Accounting ”, and it’s by Alan J. Auerbach Laurence J. Kotlikoff Darryl R. Koehler. It's part of the NBER Working Paper Series: a lot of papers by top economists appear their first before they come out in a formal journal publication.

These guys are not conservatives/Republicans. A more accessible discussion appeared in The New Republic (a liberal/progressive magazine). It’s entitled “We’ve Been Measuring Income Inequality All Wrong” and it’s required.

Let me note that none of this is surprising to economists. It’s just a big job, and holy cow am I glad someone else did the work so that I just have to read it.

Here’s their big findings.


First, spending inequality—what we should really care about—is far smaller than wealth inequality. This is true no matter the age cohort you consider.

… The poorest are able to spend far more than their wealth would imply …

The fact that spending inequality is dramatically smaller than wealth inequality results from our highly progressive fiscal system, as well as the fact that labor income is distributed more equally than wealth.


… U.S. fiscal policy acts as a serious disincentive to work longer hours or harder for more pay.


Our standard means of judging whether a household is rich or poor is based on current income. But this classification can produce huge mistakes.

… Nearly a third of the people we identified as middle income are being mis-classified as either richer or poorer.

And what are their policy recommendations?

Raising taxes and benefits as Democrats advocate will, unless existing tax and benefit systems are properly reformed, come at the cost of even larger work disincentives. Lowering taxes as Republicans advocate—presumably funding this with benefit cuts—will improve work incentives but may exacerbate spending inequality unless the benefit cuts disproportionately hit the rich.

Wednesday, March 16, 2016

Just to be Fair - Let's Beat Up Trump's Proposals Too

Unlike the other candidates, Trump’s proposals are backed up with very few specifics. In some sense, serious critics have very little tinder to start their fires.

By far the biggest of Trump’s proposals is a series of tax reforms that amount to very large tax cuts.


A problem of sorts with Republican proposals over the last few decades has been a ridiculous sales pitch — supporters are sold a story that tax cuts will “pay for themselves” when even conservative economists can’t support any conclusion even close to that. All of these proposals (or actions in the case of Bush II’s tax cuts) amount to back door plans to increase government borrowing to offset reduced tax revenue.

Yes, there is something called a Laffer curve which says that cutting rates when they are already low will reduce tax revenue. That makes sense. It also says that if cutting rates when they are already high will increase tax revenue. That also makes sense: it’s because the tax rate is a discouragement, and cutting it creates more taxable economic activity. That’s the theory. The empirical evidence is that almost all tax rates are way below the level where behavior would switch. There are some exceptions, usually involving taxes on one-time purchases or sales of assets. But for most taxes most of the time, you should assume that cutting rates will cut revenue. Unfortunately, Republican politicians have bought wholesale into the part of the argument they like (that cutting tax rates raises tax revenue) and ignored the evidence that this result is just not common at all.


What Republicans should pay more attention to is something called Ricardian equivalence (this is related to a newer result in finance known as the Modigliani-Miller theorem). The evidence is that how governments finance themselves — either through taxes or borrowing — is Ricardian equivalent. This means it doesn’t matter, and also the corrolary that the size of the deficit doesn’t matter.

When people first hear of Ricardian equivalence the think it sounds crazy. Let me explain why it’s not. The way the government pays for the stuff it provides is either with taxes or with borrowing. What Ricardian equivalence says that when evaluating a program, how it’s paid for doesn’t make any difference to its effectiveness. When you put it that way, it makes a lot of sense.

A useful analogy is buying vegetables or illegal drugs. Let’s presume the first is good for you, and the second one is bad. For the government, tax revenue is their income, and deficits are their borrowing. So the analogous choice for a person is whether to pay for something out of cash you received as income, or to charge it. Would you ever tell someone that vegetables are less good for you if you charge them? Or would you tell someone that illegal drugs are less harmful if you pay for them with cash? Of course not. If that analogy works for you, then it shouldn’t make a difference whether a government program is financed by tax revenues or deficit borrowing.

That’s actually a useful thing if you’re predisposed to liking smaller government. Because it says that all you really ought to think about is not how big or how small the government budget, deficit, debt, or taxes are … but just whether the program the money is being spent for is a good one or not. I think a lot of the public would benefit from thinking that way.

But honestly, I don’t think anyone involved in government likes smaller government. So, while Republicans say they want smaller government, when they’re in power they seem to be pretty good at making it bigger. For example, a Republican President and Republican-dominated Congress passed passed Medicare Part D (providing subsidization for seniors to afford pharmaceuticals). That’s a fairly liberal/progressive program to be added by the people who claim they want smaller government. Oh … and in case you had any doubts … they were paying for wars in Iraq and Afghanistan at the same time.

So Republicans cover their tracks by saying they want to cut taxes. And for a lot of people, reducing the pain of government taxes is hard to differentiate from reducing the size of government.

Except that Ricardian equivalence means that to be in favor of cutting taxes without cutting spending is to also be in favor of more borrowing.

To go back to the analogy, the Democrats are saying “more vegetables for everyone, all the time, and the government will pay for it”. And the Republicans are saying “more vegetables for everyone, but just when we’re in power, and government will pay for it” … “oh … and … um … the vegetables will be better for you if we pay for them with a credit card”. Then the whole Laffer curve bit amounts to justifying the credit card purchases because you get points you can use to buy even more vegetables (when, really, we’re all supposed to know that the points are a way to take away the sting of spending your money, and not a worthwhile financial objective in and of themselves).


Lastly, I think everyone should always, everywhere, be in favor of simplifying the tax code. Republicans tend to push this more than Democrats. That’s a good thing, but it’s hard to quantify how useful it is.


This is all important because when the Republicans make concrete proposals, they’re usually about tax rates, and when Republicans are criticized the basis is often the amount of borrowing the government does. Ricardian equivalence should be part of that conversation, and it is at some levels, but not in common public discussion.

So, what can we say about Trump’s proposals?

Here’s a piece from The New York Times entitled "Analysts Question Viability of Deep Tax Cuts Proposed by Republican Candidates" from which I drew this chart:

16-02-22 NYT capture about Republican tax cut proposals

Most of the article is about the size of the projected cuts to tax revenue: “By most estimates of the outside groups, the costliest plan is Mr. Trump’s.”

Do note that The Times does make the Ricardian equivalence argument, but it doesn’t use that name:

“I believe by cutting taxes and simplifying the tax code, we will grow our economy and create more taxpayers rather than more taxes,” Senator Marco Rubio of Florida has said.

Tax policy groups agree generally, but only if the revenue losses are offset by budget savings that avoid piling up more debt that would be counterproductive to spurring the economy.

“The candidates need to present real specifics for how they would address our record levels of debt,” said Maya MacGuineas, the president of the bipartisan Committee for a Responsible Federal Budget.

“Massive tax cuts and few specifics for what spending to reduce will only make the challenges much worse,” she added. “And miraculous growth projections and ‘waste, fraud and abuse’ are just not credible solutions.”

Now, here’s a much larger analysis of all Trump’s proposals from the Committee for a Responsible Federal Budget, a bipartisan think tank. It’s pretty detailed, but if you parse it out, by far the biggest economic proposal of Trump’s it the tax cuts. So it’s fair to stay relatively focused on that.

Trump has also argued that he’d like to balance the government’s budget. Think about that: 1) the government budget is already in deficit (so it’s partially financed by borrowing), and 2) Trump wants to cut taxes (which will require even more borrowing), so 3) he’s implicitly stating that he wants to cut spending. Now that we’ve gotten to the heart of the matter, the useful question is by how much?

But … wait for it … Trump has also said there’s some categories of spending that he won’t cut. He wants more spending for veterans and immigration, so no cuts there. But he doesn’t want to cut anything for seniors, which takes Social Cecurity and Medicare off the table. And he won’t cut defense either. Here’s the chart:

The result is on the right: Trump is arguing for cutting 60-80% of everything else.

Of course, he can’t say that, because no one would vote for that.

And, I actually doubt that he means it. What politicians do is throw out policy proposals and hope that no one actually does the analysis I’ve just reported here.

Now, obviously, economic growth could make all of this possible. But the estimates are that Trump’s proposals would collectively require sustained real GDP growth rates of about 11% per year. Compare that with Sanders’ estimates assuming sustained real GDP growth rates of 5.3% per year to pay for his proposals.


Paul Krugman, the Nobel Prize winning economist, who writes a column for The New York Times, is a solid Democrat/Progressive on the leftish side of their spectrum.

He’s written about how he’s conflicted about the Democrats proposals because they’re so disconnected from reality. (He says worse things about the Republicans) but his own Democrats are making him feel conflicted.

Anyway, he has a great metaphor about this. He says that Clinton is proposing to give voters a unicorn, and Sanders is proposing to give them a magic unicorn.

You probably shouldn’t believe either one of them. And you probably shouldn’t prefer the magic unicorn because … you’re never going to get any unicorn at all. And … oh my gosh … if Sanders’ numbers are supported by a math mistake, then it’s more like the Sanders is offering a unicorn that he says is magical because the unicorn already appeared to him and told him to say that.

The Ricardian equivalence argument implies that choosing between the Republicans is more innocuous: along the lines of each candidate is offering a differently colored unicorn. No doubt, Trump’s unicorn would be aggressively colored in bold hues of red and blue stripes, with white stars. And when Trump (or others) starts making outlandish claims about cutting taxes and balancing the budget and not cutting most spending … then their unicorn is far more magical than Sanders.

Tuesday, March 15, 2016

Towards a Measure of National Wealth

I’ve emphasized in class that we’d get better macroeconomic decision-making if spent more time looking at national wealth and less time looking at GDP.

Do note that this is not because we don’t have data on national wealth. But we do have is not as accurate as GDP, and no where near as commonly used.

Someone in class mentioned that Josh Price had talked about measuring national wealth in class. I talked to Josh, and he was referring to a paper entitled “Sustainability and the Measurement of Wealth” by Arrow, Dasgupta, Goulder, Mumfor, and Oleson.


First a note about Ken Arrow. He’s huge in economics. He won a Nobel Prize. You usually win these when you are old for work you did in your 30’s. Arrow won his over 40 years ago for work he did in the 1950’s. This was mostly about how to generalize the idea of equilibrium from supply and demand, including the first proof that perfect competition is Pareto optimal (which is the basis for doing so much supply and demand in principles of micro).

Just because Arrow wrote it doesn’t make it great. But it’s a pretty good sign.


The idea of the paper is that economic growth is sustainable (forever) if national wealth per capita is persistently increasing.

They look at 5 countries. Here’s what they find:

… Our results show that the United States, China, India and Brazil are currently meeting the sustainability criterion, although Brazil meets the requirement by a narrow margin. Venezuela fails to meet this requirement as a result of substantial depletion of natural capital and negative estimated TFP growth. In the United States and India, investments in human capital prove to be very important contributors to increases in per capita wealth; in China, investments in reproducible capital dominate. Accounting for improvements in health dramatically affects the estimates of changes in per capita wealth. We estimate the value of health capital to be more than twice as large as all other forms of capital combined.

Now, this isn’t weird to an econmist, but it might be something you haven’t thought about before. The biggest component of national wealth is the value of citizens themselves. Educate them and they’re worth more. Keep them healthy and they’re worth more. And young peoplle are worth more than old people because so much of their productivity is in the future rather than the past.

For example, I personally might earn the equivalent of $4,000,000 (in constant 2016 dollars) over my lifetime. So, in some sense, if we went back to 1985 we could have said that I was a piece of capital with a net present value to society of $4,000,000 (the value of my leisure might be half of that more added on top). Now compare that to the other capital I have: a house worth about $250,000, a $30K car, a retirement account, and some other stuff that’s just not as significant. In short, if we just count up the physical stuff we see, then we’re missing most of the wealth.

What this means is that the most important things that countries do is keep their populations alive for a long time, and keep them productive for a long time.

And if we want to find the wealthy countries around the globe, we need to look for the ones that are capable of nurtuing citizens through a lifetime of productive work, and putting them in a position where they can work when they are old (but don’t have to if they don’t want to).

Some Growth Isn’t Just Economic

Macroeconomists stress that real GDP is correlated with a lot of desirable non-economic outcomes.

I feel that sometimes non-economists don’t believe us when we say that.

Here’s an example from North and South Korea. After World War II this was the poorest place on Earth (or not far off of that).* Since then South Korea has displayed phenomenal growth, arguably the longest sustained high growth period of any country. On the other hand, North Korea has struggled, and even had periods of widespread famine.

There’s now been a formal study of a tidbit of data that economists have been talking about for a couple of decades: you can tell that the South is richer because its people are 1-3 inchest taller than those from North Korea. This is interesting because the Korean peninsula has a relatively homogenous population, with little historical colonization. So the only way to reasonably explain that difference is with better diet coming from a richer economy.

* Interestingly, South Korea was the poorer of the two. Most of the limited industriatlization before 1950 had taken place in the North.

Watch This Space

It’s Tuesday morning. The interesting data news this morning is that the initial data on (nominal) retail sales shows a drop in February, but also that the revision of (nominal) retail sales for January also shows a large drop.

Real retail sales is one of the “big four” coincident indicators used to evaluate whether or not the economy has peaked.

To convert those nominal retail sales into real ones, we will have to wait for the release of the CPI for February. That’s at 6:30 tomorrow morning.

Lots of people will be watching that because recession indicators are being pulled down strongly by industrial production. This has been offset by good numbers in employment, real income … and real retail sales. If the initial numbers for that last one have fallen enough, then maybe we do need to be thinking that the economy has peaked.


March 16, 2016 @ 1:15

Here’s the updates.

16-03-16 4 Big Turning Point Indicators

Real (Retail) Sales fell in January, but grew ever so slightly last month. Industrial production had a good month in January. Put it all together, and it looks less like a peak than it did 2 months ago when we looked at this.

Keep in mind that these data series do continue to get revised. And, the NBER Business Cycle Dating Committee would wait until all those revisions are done to go and look several months back and pick out where the turning point occurred. What they’re doing here is mimicking that process.

Roughly, that is to add up all the current green ones (0.17 + 0.02 + 0.43 = 0.62), and all the consequtive pink ones within a row (just the –0.49 now), to get 0.13. The threshold I was looking for to declare a peak was –2.00 or below.

Tuesday, March 1, 2016

Extrapolation, Sanders, Proposed Minimum Wage Increases, and the Challenger Disaster

You all know what extrapolation is: you measure some effect in the range of data that you have, and you presume that it will continue to operate in a range of data that you don’t have.

All of us extrapolate, but we also know that we should be careful because it can get us into trouble. For example, if we extrapolated Stephen Curry’s shooting, we might claim in a few years he’ll be hitting mostly 80 footers. He’s good, but not that good.

Anyway, this has come up in a couple of contexts in lecture, so I figured I’d lay them out for you.

One of those is Galbraith’s support of Sanders. It’s basically that Sanders is projecting bigger results because he’s making bigger proposals. That’s an extrapolation because Sanders is making proposals of a size about which we have very little experience. Just because it’s an extrapolation doesn’t make it wrong, but it should signal you to be extra careful.

The same thing has also happened with the large minimum wage increases that have been proposed nationwide. There is some empirical evidence that labor demand is inelastic with respect to the minimum wage (see below). This makes the typical theoretical complaint that minimum wage increases cause job losses into a weak counterargument. The thing is, all the evidence showing this is drawn from modest increases in the minumum wage of something like 10%. It’s an extrapolation to presume that if we increase the minimum wage by 100% — which is the amount under discussion in many places — that we will get the same behavior.

I mention the Challenger Disaster because it’s the best known example of extrapolation gone wrong. Challenger was the space shuttle that blew up after launch in 1986. It blew up because it was launched on a cold day, some rubber o-rings shrank in the cold, and hot combustion gases blew threw the resulting gaps until they burned through other critical parts.

The thing is, rubber parts shrinking in the cold is to be expected, so the engineers had been measuring o-ring damage versus launch temperature for a while. The measurements of all flights before the crash are the points on the right. The temperature for the Challenger launch is the gray bar on the left. Managers extrapolated the large number of launches with no damage at all to assert that it was safe to launch on such a cold day. To do this, they had to dismiss as outliers the four points in the middle showing that on the coldest extent launches they always observed damage, and further that it was negatively associated with temperature.

This chart is part of the discussion of poor decision-making due to bad graphics in Edward Tufte’s (no relation) treatment of the Challenger disaster in his book Visual Explanations, pages 38 – 53.


About that minimum wage evidence. Check out this chart. Menzie Chen posted this at EconBrowser. It shows all published estimates of the elasticity of labor with respect to wage changes. This is called a funnel graph: points toward the top are more accuate than those below them.

Most of the estimates are negative: increasing wages decreases labor demand. That’s the sensible theory you hear in micro. But note that quite a few of them are positive too: that’s the more newfangled idea that labor markets are a bit goofy, and minimum wage increases might actually be beneficial. The red line seems like a reasonable guess at the true effect: an elasticity of –0.2. Values that close to zero mean that labor demand is about as responsive to wage increases as smokers are to cigarette price increases — not very. If you work out the math it means that a 5% job loss would be associated with a 25% increase in the minimum wage (that the other 95% would get). These results tend to support the Democratic position strongly. Having said that, I’d be pretty leery of any increase larger than the 10% or so that the studies were based on: if that turns out OK, we can repeat it.


Figure 2 from Doucouliagos, Hristos, and Tom D. Stanley. “Publication Selection Bias in Minimum-Wage Research? A Meta-Regression Analysis.” British Journal of Industrial Relations 47.2 (2009): 406-428. [ungated working paper version], with red line drawn in at elasticity = -0.2.

The Sanders Saga Continues

Here’s the story so far:

  • Sanders, running as a Democrat, has proposed sweeping changes on the campaign trail. Proposals that are quite far to the left of Obama, and arguably further left than any presidential hopeful since Wallace in the 1940’s.
  • A letter supporting Sanders was signed by 170 economists.
  • Friedman wrote a paper providing economic support for Sanders’ claims.
  • This does not attract much attention since the claims sound too good to be true. Sanders has been making claims like that for a while, and he didn’t attract much attention before Christmas.
  • Four former chairs of the Council of Economic Advisors (Krueger, Goolsbee, Romer, and Tyson) denounced all the above.
  • Galbraith fired back with a letter noting some flaws in that argument, and emphasizing that Sanders’ proposals get big results from big changes.

Now we have a paper from Christina Romer and David Romer. She’s the Romer listed above. They’re both macroeconomists at Berkeley. David Romer wrote the first level Ph.D. macroeconomics text that just about everyone uses (including me when I taught that class from 2008 to 2011 as a guest at another university).

It’s scathing on the majority of issues, and presents them at an accessible level. More on that later (even though this is what the media will focus on).

But they were very diplomatic about a technical issue. They can’t explain a result that Friedman got. But they’re willing to speculate. And their speculation is that he made a conceptual mistake in his economics that led to math errors.

That’s a big deal: presidential candidate makes economic proposals that sound too good to be true supported by economist that can’t do the economics right.

Even better, the mistake is at an advanced undergraduate level, and related to issues covered in your handbook. They relate to growth vs. level effects, and permanent vs. transitory effects of macroeconomic shocks.

An example from retirement planning may be helpful.

When investing, everyone tries to get an edge: either a better return from the same risk, or the same return from less risk. If you can do that, you can permanently outperform your competitors. Not surprisingly, an investment edge that yields permanent performance improvements is very hard (if not impossible) to find. Instead, most of the time if you get an edge it’s transitory: you beat the market for a bit, but someone mimics what you’re doing until that edge is arbitraged away.

As to growth and level effects, which would you rather have gifted to you: access to a better return on your investments, or a lump-sum gift? Most people would prefer the access to a better return, because with compounding eventually you’ll make more money. That’s a growth effect because it helps your investment grow. A lump-sum gift, like an inheritance, is a level effect: it increases the level of your investment … but just once.

The way that it works almost any time someone gets an investment edge is that they get a transitory growth effect, that makes permanent improvements to the level of their investment. Basically, good luck makes your investment bigger, but you can’t plan on it. The whole point of learning about efficient markets in your finance classes is that permanent growth effects are very hard to come by, and if you could develop one it would require a lot of input on your part each period to sustain.

That whole argument carries over to macroeconomics. We think most shocks have transitory effects on growth rates, so to maintain a growth effect you need to somehow continue providing beneficial shocks to the economy. If you do that, you could get a bigger level effect every period.

So what do Romer and Romer find in Friedman? They can’t explain some of his more outlandish assumptions about growth. Here’s what they suspect: Friedman presumed that a temporary shock to growth rates had a permenent effect on them, leading to estimates of ongoing growth and level effects. In the investment example, this is like assuming that one lucky stock pick in turn makes all your stock picks lucky … forever … and your investment nest egg pulls away rapidly and permanently from your competitors. The implication is that Friedman’s work is no better than a fairy tale.

We have a conjecture about how Friedman may have incorrectly found such large effects. Suppose  one  is considering  a  permanent  increase  in  government spending  of 1%  of GDP,  and suppose one assumes that government spending raises output one-for-one. Then one might be tempted to think that the program would raise output growth each year by a percentage point, and so raise the level of output after a decade by about 10%.

To the public, this sounds like jargon. To a macroeconomist, this sounds like “made a mistake on Tufte’s ECON 3020 Exam 3 that he’ll take off full credit for”.

I remarked above that the surface issues of Romer and Romer are more accessible to the general public. Here’s their summary of what they find (their original had emphasis that does not come through a cut and paste operation):

Unfortunately,  careful  examination  of  Friedman’s  work  confirms  the  old  adage,  “if something seems too good to be true, it probably is.” We identify three fundamental problems in Friedman’s analysis.
•    First, all the effects of Senator Sanders’s policies that he identifies are assumed
to  come  through  their  impact  on  demand.  However,  his  estimates  of  those
demand effects are far too large to be credible—even given Friedman’s
own assumptions.
•    Second, in assuming that demand stimulus can raise output 37% over the next
10  years  relative  to  the  Congressional  Budget  Office’s  baseline  forecast,
Friedman is implicitly assuming that the U.S. economy is (and will continue to
be for a long time) dramatically below its productive capacity. However, while
some  output  gap  likely  still  exists,  the  plausible  range  for  the  output
gap  is  much  too  small  to  accommodate  demand  effects nearly as
large as Friedman finds. As a result, capacity constraints would likely lead
to  inflation  and  the  Federal  Reserve  raising  interest  rates  long  before  such
high growth rates were realized.
•    Third,  a  realistic  examination of  the  impact of  the  Sanders  policies  on
the economy’s productive capacity suggests those effects are likely to
be small at best, and possibly even negative.

I encourage you to, but won’t require you to, read the Romer and Romer paper. It’s fairly accessible, and has lots of clear thinking about the data, different viewpoints, and how economists assess policy.

Testing, Forecasting, Positives, and Negatives

There was some weakness on the exam answers, mostly on the background for thinking about false positives and false negatives. I’m going to start at the beginning and walk you through it.

First, there has to be some thing we’re measuring. In macro, this is most often real GDP. I’ve written in a bunch of other places (like the handbook, and this semester’s quodlibet) why we measure real GDP.

Second, we need to have some reason for forecasting that series. If we know it’s value in the future, there’s no need to forecast other than curiosity. The adjective we use for knowing a series’ future values is deterministic. Parts of real GDP might be deterministic, but the whole thing is not, so we want to forecast it. Alternatively, if no forecast is very good, then we might give up. GDP isn’t too hard to forecast, but it’s hard to forecast well: not because it requires a lot of skill, but rather just because it's fairly unpredictable (that's part of the reason no one believes the numbers coming out of China — they're too predictable to be real).

Third, fitting is a little different than forecasting. With both, we’re trying to predict one thing (that we can’t figure out well) from something else. With fitting, we’re using a variable we have right now to predict values for the thing we can’t figure out either at the same time or in the past. This is what we do with a coincident indicator like industrial production: we get new values of it first and we use them to fit what we think real GDP is going to be before we actually get its measurements. Forecasting is doing something similar, but now we’re trying to use a variable we have now to figure out some other variable we don't have yet. This is what we use leading indicators for.

Fourth, testing is a little different from forecasting. When we test for something, we’re looking for a yes or no answer. This is usually along the lines of asking if this thing is different than these other things. And that difference has to be worthwhile or we wouldn’t bother doing the test. In macroeconomics, we’re interested in whether we continue to be in an expansion (or have switched) or continue to be in a contraction (or have switched).

Note that it’s one thing (forecasting) to say that you have data on a leading indicator, and you expect it to be able to help explain the future behavior of a variable you’re interested in. But, it’s another thing (testing) to say that if your indicator has peaked, does that mean the variable you’re really interested in will peak too.

We do a lot of this very naturally when we care for a sick child at home. The first thing we’re interested in is the well-being of the child. The second thing is that we have some reason to forecast how the child feels, because this can tell us about their well-being. Third, we have coincident and leading indicators for well-being, like how much the child eats and does the child have sniffles. And fourth, sometimes we do a test to assess how the child is doing, by taking their temperature: most people will keep a child home from school if they have a fever even if they say they feel OK.

There’s an interesting notion here that you may not have thought about before: no one knows what temperature you should be when you’re sick! What we know is the temperature you should be when you’re not sick.

When we put the implication of that in words it sounds confusing: we test whether you’re sick by assessing whether you’re not sick. When taking a child’s temperature, it’s either right, or it’s not. When it’s not, we declare that the child is sick.

Statisticians are pretty careful about the language they use for this. Most people aren’t very good at following their lead.

At home, we might say to ourselves that a child seems sick. It sounds technical, but some might say that we’re hypothesizing that the child is sick. So we take their temperature.

A statistician will be careful to state a null hypothesis and an alternative hypothesis. A test differentiates between those two. What’s important about choosing a null hypothesis is not that it be true (we may never know that) or even plausible. Instead, what’s important is that we know how the data will behave if it is true.

In the case of a child, sometimes we may never really know for sure whether they are sick or not. But, whether we know that or not, we know that when they are not sick their temperature will be around 98.6° F.

So the null hypothesis is that the child is not sick, and the alternative is that they are sick. We take their temperature, and if it is close to 98.6° F we don’t reject the null hypothesis that the child is not sick. If their temperature is far enough away from 98.6° F we reject the null that the child is not sick.

I put all those "nots" in italics for a reason. Since we’re testing the null that they’re not sick, it’s called a negative if we can’t reject it. If we can reject then it’s called a positive.

But is the negative or positive correct? We may never know. We don’t usually use the modifiers for true, but we do use them for false, and I’ll use both here to make a point. So, our test might deliver a negative to us, but we have to figure out whether it’s a (true) negative or a false negative: either the child is not sick or they have some illness which doesn’t cause fever . Alternatively, the test might deliver a positive to us, but we have to figure out whether it’s a (true) positive or a false positive: either we reject the null that the child is not sick or their temperature is off for some other reason.

Now I’m ready to add the fifth thing we need: an ability to explain how the data is going to behave if the null hypothesis is true (whether or not we can ever know that for sure). We take kid's temperatures because there's a fair presumption that if the kid is OK their temperature will be normal.

OK, are you with me? Let’s work backwards through what I just explained. If you get a false positive or false negative, it’s only because you did a test. But you’d only do the test if you can explain how the data will behave if some particular thing is true. And the test is only worthwhile if something is different that can be differentiated by it. Ideally this helps you make forecasts about a variable you’re interested in, and maybe that helps you figure out something even more important. One more thing: you're doing tests all the time whether or not you think about them that way (just like parents who listen for sniffles and coughs around the house) ... but your thinking about those tests may be clearer once you realize they're all around you.

What were some of the exam answers that motivated me to do this post then?

Some students just confused getting a positive or negative from a test, with data being positively or negatively correlated. Industrial production is negatively correlated with the unemployment rate, and it does lead it so it might form a good basis for a test. But if industrial production fell, and unemployment rose, I'd call that a positive (result from my test) because it's probably not dumb luck that I found something.

Some students said false positive and false negatives are just errors. They're actually much more than that, because we've chosen to do the test, and we never would have gotten to the error if we hadn't taken that action.

Further, those errors are always in relation to a null hypothesis, so you have to know what that was to figure out what could go wrong. Sending a sick kid to school is unfortunate, but it's not an error unless you took their temperature, found out it was normal, and acted on that.

False positives and negatives are also not just a matter of incorrectly reporting of something that should be obvious; instead they're mistakes we make about interpreting something that's murky to begin with. But they're really not even a misinterpretation: we'd be better off acknowledging that mistakes are going to be made even when we do everything right. Caregivers aren't always sure how sick kids are, and kids fake being sick too.

It's also not enough for the data to, say, go up and then down. You have to be thinking about actually doing something with those movements. You're probably not a good caregiver if you keep taking a child's temperature and then not doing anything with the information when you see they're running a fever.

Finally, while it can be about symptoms you've already observed, the point of doing the test is that you usually haven't observed them yet. In my house I used to drive my wife crazy when my kids were sick: I didn't bug them to much with the thermometer if it was already clear to me they were pretty sick — I'd just be bugging them, and it wouldn't tell me much I didn't know already. This is why hospitals take patients temperatures continuously in many cases, and do almost nothing with that information most of the time. They monitor it as a matter or routine, but it's just one component of what they do.

A lot of this probably seems like common sense. It is; I'm just being specific about the wording and the implications.

Except that when we get to macroeconomics, a lot of people don't practice this common sense.
Historians, politicians, bureaucrats, pundits, and many economists tell us that the stock market is like the thermometer. Fair enough. What evidence do the provide to support that position? Well ... because it seems to have been right once in 1929. It's been right other times (I can confirm that), but if you think about it, it's really weird that you can't actually name any of those other times. You probably don't know anyone who can either.

Then the news media trumpets what our stock market thermometer is doing ... every day, and more often if you'd like to pay attention. Investors may have reason to pay attention to that, but there are also people around who've bought into following the stock market and have been waiting a long time to get their 1929 signal. How on Earth are so many people so certain that stock market behavior caused the 2007-9 recession? Heck, economists aren't even sure it contributed to causing the recession, and we have all the data and computers and know-how and stuff. It's so bad that if we admit we looked and didn't find much, the public tunes us out.

And almost no one admits that business cycles might be inherently difficult to predict, that the data is uncertain, and that it's a big world with a lot of transactions to worry about. It's messy! Every family has stories about how hard it is to figure their kids out, how they tested out different approaches and got contradictory results, and society accepts that. Few people accept that business cycles are hard to figure out, or that our tests might not work very well.

But we plug away at it. Macroeconomic outcomes are important for personal well-being. We can measure them fairly well with real GDP. We can forecast that with some accuracy. But the economy changes from well to sick and back once in a while. It's even pretty easy to forecast real GDP if we assume it will stay expanding, or stay contracting if we're in a recession ... it's the timing of the switches from the one to the other that are tough to figure out. And we're testing all the time, whether it's explicit or implicit, but the tests aren't very good.

In some sense, getting false positives and false negatives is a good sign. It means we care enough to be trying to figure things out. And we can't get false ones at all without running the tests and getting some true results too. Caregivers are doing their job when they take a kid's temperature once in a while, and we cut them some slack about how they interpret the results. We need to take the same approach to macroeconomics.