In my last post I plotted the NASA CO_{2} and the HadCRUT5 records from 1850 to 2020 and compared them. This was in response to a plot posted on twitter by Robert Rohde implying they correlated well. The two records appear to correlate because the resulting R^{2} is 0.87. The least square’s function used made the global temperature anomaly a function of the logarithm to the base 2 of the CO_{2} concentration (or ‘log_{2}CO_{2}‘). This means the temperature change was assumed to be linear with the doubling of the CO_{2}concentration, a common assumption. The least squares (or ‘LS’) methodology assumes there is no error in the measurements of the CO_{2} concentration and all error resulting from the correlation (the residuals) resides in the HadCRUT5 global average surface temperature estimates.

In the comments to the previous post, it became clear that some readers understood the computed R^{2}(often called the coefficient of determination), from LS, was artificially inflated because both X (log_{2}CO_{2}) and Y (HadCRUT5) were autocorrelated and increased with time. But a few did not understand this vital point. As most investors, engineers, and geoscientists know, two time series that are both autocorrelated and increase with time will almost always have an inflated R^{2}. This is one type of “spurious correlation.” In other words, the high R^{2} does not necessarily mean the variables are related to one another. Autocorrelation is a big deal in time series analysis and in climate science, but too frequently ignored. To judge any correlation between CO_{2} and HadCRUT5 we must look for autocorrelation effects. The most tool used is the Durbin-Watson statistic.

The Durbin-Watson statistic tests the null hypothesis that the residuals from a LS regression are not autocorrelated against the alternative that they are. The statistic is a number between 0 and 4, a value of 2 indicates non-autocorrelation and a value < 2 suggests positive autocorrelation and a value >2 suggests negative autocorrelation. Since the computation of R^{2} assumes that each observation is independent of the others, we hope that we get a value of 2, that way the R^{2} is valid. If the regression residuals are autocorrelated and not random—that is normally distributed about the mean—the R^{2} is invalid and too high. In the statistical program R, this is done—using a linear fit—with only one statement, as shown below:

…

…

La géologie, une science plus que passionnante … et diverse