## Saturday, January 25, 2014

### High R-Squared

An ongoing theme this semester has been that we need to respect the possibility that there’s good reason to think that some aspects of macroeconomics should not be predictable.

Econbrowser (the blog of Nobel Prize medium lister James Hamilton and Menzie Chen) has an excellent post by Hamilton on this, entitled “On R-squared and Economic Prediction”. More specifically, they discuss how you can get both high R-squared and low R-squared, not just from the same data, but from the same regression.

This is an idea that we’ll be brushing the surface of in February and March.

N.B. Dave Berri and I have a discussion that devolves to this about once a semester. Dave does mostly cross-sectional econometrics, where high R-squared is more or less the acid test of decent results. I do mostly time series (which is pretty common amongst macroeconomics and finance professors). And in time series, you have this issue that R-squared doesn’t mean what you think it does. So if Dave tells you something different, it doesn’t mean that either one of us is wrong.

Anyway, back to Hamilton’s piece. He’s showing examples of something called the Engle-Granger Representation Theorem (both Engle and Granger won Nobel Prizes). This is just a little bit of algebra applied to econometrics (where it makes a huge difference).

In algebra, if you have the following equation:

X = a + bY

you wouldn’t think anything of rearranging it to get:

X – Y = a + (b-1)Y

Seems OK, right?

But, what if you’re doing a regression, and X and Y are the same variable measured at two different points in time? That sounds weird, but it would just mean that you’ve got a spreadsheet with X in one column, and Y in another column, and every cell for X is repeated one row down for Y.

Why might you have that? Well, you might be regressing stock price today on stock price yesterday.

In this case, the first equation above becomes:

P(t) = a + bP(t-1)

Note that the t in parentheses is just keeping track of the passage of time, or the row in your spreadsheet.

You’d probably suspect that this would fit really well, and you’d be right: the R-squared is really high, indicating that (perhaps) the best predictor for today’s price is yesterday’s price. Big deal, right?

But, the Engle-Granger Representation Theorem says that you can rearrange the third equation the same way you did the first one, to get:

P(t)-P(t-1) = a + (b-1)P(t-1)

Now, if you’ve learned a little about how to do regressions, you might say that you can’t have a variable like that on the left hand side. Well, you can, and the way to do that is just to create a third column where you do the subtraction, and then use that in your regression. So … no problem … just an inconvenience.

And now, maybe you’re ready to go back and read Hamilton’s post and get more out of it. Because Hamilton actually runs a few of those regressions and shows that the first one has high R-squared, the second one has low R-squared, and yet the coefficients in them are either the identical or split into two parts.

Read the comments too … some people get Hamilton’s point, and some are in denial.

This insight is huge in finance (where it’s related to why you can make money, but are unlikely to beat the market), and very big in macroeconomics (where it underlies why Obamacare isn’t working out as planned).

It’s also huge in politics, where we have a lot of people who obsess about controlling things. If what they desire to control can’t be controlled … are they just wasting their time (and our money)?