One version looks like this:
We look pretty good … almost as good as … say … North Dakota
You can go to The New York Times website and play with this.
But, there’s a statistical problem with this sort of thing. What’s worse is that this problem has been known for a long time (I first heard of it in 1989 shortly after Places Rated Almanac started being published) and the legacy media does not take measures to fix it.
It’s similar to the easily avoidable problem in regression analysis: you need more observations than you have right hand side variables (more rows than columns in your spreadsheet). In the case of a map like this, you need more ranking variables than you do things ranked (the districts). This site uses 20 ranking variables: not enough to even rank the states appropriately.
So, what’s the problem? It’s not that you can’t do the ranking — obviously, you can. It’s that there is not a best ranking that is unique. Every ranking uses weights to add the variables together. So, this may be a pretty good ranking (based on its weights). But … there are other just as good rankings that will come out differently (because they have different weights). It also might be a pretty bad ranking, but you can’t tell that either. And there is no criteria to establish the best set of weights if you have less “rows than columns”.
All of this does not make this sort of chloropleth useless. It does mean that every time you examine one of these you need to play with it a bit and make sure that it behaves in a way that you find sensible.
No comments:
Post a Comment