arrest the crimatologists
Posts tagged data manipulation
Hadley Centre releases data–but questions remain
Dec 26th
From examiner.com //
December 23, 2009 //
The UK Meteorological Office released what it described as temperature data for over a century and a half from more than 1500 land weather stations throughout the world, along with what the Met Office described as the code used to plot temperature trends for the time frame in question. But the data are apparently not the raw data, and questions remain as to the validity of the adjustment of the data, an inconsistency in reporting a lack of valid temperature measurements in one year, and the accuracy of the source code.
The Met Office states that the land station records, going back to 1850, came from the East Anglia Climatic Research Unit (CRU). This has long been the understanding of the genesis of the HadCRUT dataset: the CRU collected land station data, and the Hadley Centre collected temperatures from ships at sea.
John Graham-Cummings was the first person to take note of the release, and to examine the source code for accuracy. Within hours, Graham-Cummings had found two possible errors. First, he found an apparent error in the code that caused the program to use suspect data in its data plots. Graham-Cummings introduced a quick correction to the code, and was able to produce a trend line similar to the official version, with this difference: the data point for the year 1855 is missing, and this gap in the data is not shown on the official version.
That’s a bit odd, but not serious. But it makes me suspect something: I’ll bet a mince pie that this code the Met Office has released is not the code they actually use to create CRUTEM3. I bet they wrote it especially for this release.
Blogger Bishop Hill took note of the release next, and of Graham-Cummings’ initial analysis. Subsequently Steve McIntyre of Climate Audit noted that the Hadley Centre had only last summer refused to release any such data to him when he had requested it, on the grounds that some of the data were subject to previous agreements that forbade data sharing. That any possible legal and/or diplomatic considerations were suddenly resolved, was not clear to McIntyre and still cannot be reliably discerned from the Met Office’ FAQ links. McIntyre speculated that the Met Office might have realized that their legal position was never sound and that they prepared the release to keep themselves out of legal peril.
Jeff Id at The Air Vent bluntly noted that even this release is not entirely satisfactory. (He misidentified the CRU as the agency doing the release; in fact the latest release comes from the Hadley Centre using CRU-generated material.) His major complaint is that these data are still not the raw station data in all cases. The Met Office says this to describe their own release:
The database consists of the “value added” product that has been quality controlled and adjusted to account for identified non-climatic influences. It is the station subset of this value-added product that we have released. Adjustments were only applied to a subset of the stations so in many cases the data provided are the underlying data minus any obviously erroneous values removed by quality control. The Met Office do not hold information as to adjustments that were applied and so cannot advise as to which stations are underlying data only and which contain adjustments.
The Met Office now appears to be saying that even it does not know which numbers are raw data and which are adjusted data, nor which adjustments were made.
The Air Vent post also compares the answers given in the FAQ with some of the e-mails in the CRU Archive, including many written by Phil Jones, now under suspension from his post as CRU Director.
Perhaps the most problematic answer from the Met Office is their answer to the question of how one can be sure of the global temperature record:
The methodology is peer reviewed. There are three independent sets of global temperature that all clearly show the rise in global temperatures over the last 150 years. Furthermore, the strong scientific evidence that climate is changing as a result of human influence is also based on the growing evidence that other aspects of the climate system are changing; these include the atmosphere getting moister, global rainfall patterns changing, reductions in snow cover, glacier volume, and Arctic sea ice, increases in sea level and changes in global scale circulation patterns. There are also numerous changes in phenological records which point towards a general warming and support the veracity of the instrumental record.
As Jeff Id and others have pointed out, the peer-review process, certainly at CRU and presumably elsewhere, is now known to be fatally flawed on account of such questionable practices of friend reviewing friend and the rejection of non-”friendly” papers on what appears to be an a priori determination based solely on the paper’s conclusion. Peer review is supposed to check premises, not conclusions.
But the worse problem is that the Met Office tries to argue that the temperature record is trustworthy because a raft of other evidence supports it. The difficulties with that statement are twofold. First and foremost, that other evidence might or might not support a given conclusion says nothing about the validity of the specific evidence under discussion. Second, the validity of the other evidence cited by the Met Office is in question. American and British residents, having only recently finished digging out from record snowfalls in their countries, might be inclined to dispute the finding of “reduction in snow cover.” More to the point, much of what the Met Office cites depends on a host of confounding variables in addition to temperature.
This Examiner has previously reported that the Hadley Centre’s connection to CRU and its largely disgraced scientists means that it might not be as innocent as it pretends. More recently, the Centre came under fire following reports that it had overstated the warming trend in Russia by excluding land station data from 40% of Russian territory and relied on 25% of stations concentrated mainly in areas of dense human habitation.
Separately, the CRU website appears to be back on-line, but their home page still asks for patience during a “rebuilding” of their site.
Russians Are Right! Met Office Uses Selective Data
Dec 19th
via Canada Free Press
Barry Napier / December 17, 2009
The UK’s Meteorology Office has been accused by the Russians of lying. Well, I never! They say the Met Office has “cherry-picked climate change figures in a bid to increase evidence of global warming.”
I don’t like to keep on about it, but I said this in my book last year! I got the information via a science forum I belong to, populated by high-level scientists, including IPCC contributors. To get into my book I had to have had the information in early 2008, meaning that the truth about the Russian weather stations was made at least in 2007!
What we knew at that time was simple: the Russians closed down a huge number of temperature measuring stations, at the same time as Western scientists were collating world figures subsequently used in the IPCC Reports. Suddenly, there were incomplete data from a vast area of the northern hemisphere. But, the westerners continued without these vital figures! And, make no mistake, proxy measurements, whether tree-rings or satellite, are not equal to thermometer measurements and should never be mixed.
I see no problem in using them, but I see a very big problem when they are mixed by people like Michael Mann. The only reason he mixed them, was that between the various proxies they could be manipulated to produce the results he wanted. And we already know this is what he did. So, when the Russian stations were suddenly closed, Mann and others began to mix ‘n‘ match their figures, selecting whatever served their cause.
Russians Correct
So, the Russian accusation is correct. We knew about this manipulation and selective use of data a few years ago. The assurances given by the Met Office are, then, highly suspect, because everyone knows they, and others of pro-green persuasion, are being very selective in their data. The Met Office claim the measuring sites are chosen for them by someone else. Do we believe that? Frankly, no. Even if we believe it, do we believe they don’t tamper with the figures? No, we don’t.
Do I believe the Moscow-based Institute of Economic Analysis, when, as critics tell us, they are only making the accusations to safeguard their gas, coal and electricity industries? Yes, I do! If their reason is to safeguard their economy, I don’t blame them. Indeed, I applaud them for doing so. Obama and Brown et al certainly don’t care what happens to their economies, and seem bent on destroying them for the sake of socialist ideals not shared by their electorate. The Russians say the British have picked figures that suit their theories on global warming. This certainly appears to be the case, especially when they are adamantly protecting the frauds and liars mentioned in the email scandal and won’t release raw data.
Yamal: A “Divergence” Problem – Climate Audit
Dec 10th
by Steve McIntyre, posted on Sep 27, 2009 at 10:08 AM
The second image below is, in my opinion, one of the most disquieting images ever presented at Climate Audit.
Two posts ago, I observed that the number of cores used in the most recent portion of the Yamal archive at CRU was implausibly low. There were only 10 cores in 1990 versus 65 cores in 1990 in the Polar Urals archive and 110 cores in the Avam-Taimyr archive. These cores were picked from a larger population – measurements from the larger population remain unavailable.
One post ago, I observed that Briffa had supplemented the Taimyr data set (which had a pronounced 20th century divergence problem) not just with the Sidorova et al 2007 data from Avam referenced in Briffa et al 2008, but with a Schweingruber data set from Balschaya Kamenka (russ124w), also located over 400 km from Taimyr.
Given this precedent, I examined the ITRDB data set for potential measurement data from Yamal that could be used to supplement the obviously deficient recent portion of the CRU archive (along the lines of Brifffa’s supplementing the Taimyr data set.) Hantemirov and Shiyatov 2002 describe the Yamal location as follows:
The systematic collection of subfossil wood samples was begun, in 1982, in the basins of the Khadytayakha, Yadayakhodyyakha and Tanlovayakha rivers in southern Yamal in the region located between 67°00 and 67°50 N and 68°30 and 71°00 E (Figure 1). These rivers flow from the north to the south; hence, no driftwood can be brought from the adjacent southern territories At the present time, the upper reaches of these rivers are devoid of trees; larch and spruce-birch-larch thin forests are located mainly in valley bottoms in the middle and lower reaches.
Sure enough, there was a Schweingruber series that fell squarely within the Yamal area – indeed on the first named Khadyta River – russ035w located at 67 12N 69 50Eurl . This data set had 34 cores, nearly 3 times more than the 12 cores selected into the CRU archive. Regardless of the principles for the selection of the 12 CRU cores, one would certainly hope to obtain a similar-looking RCS chronology using the Schweingruber population for living trees in lieu of the selection by CRU (or whoever).
As a sensitivity test, I constructed a variation on the CRU data set, removing the 12 selected cores and replacing them with the 34 cores from the Schweingruber Yamal sample. As shown below, this resulted in a substantial expansion of the data set in the 19th and 20th centuries and a modest decline in the 18th century. (Hantemirov and Shiyatov 2002 had reported a selection of long cores of 200-400 years; while the CRU archive does not appear to be the precisely the same as the unavailable Hantemirov and Shiyatov 2002 archive, it does appear to be related. This pattern of change indicates that the age of the CRU cores is systematically higher than the age of the Schweingruber cores.)

Figure 1. Comparison of core count. Black – variation with Schweingruber instead of CRU; red- archived version with 12 picked cores.
The next graphic compares the RCS chronologies from the two slightly different data sets: red – the RCS chronology calculated from the CRU archive (with the 12 picked cores); black – the RCS chronology calculated using the Schweingruber Yamal sample of living trees instead of the 12 picked trees used in the CRU archive [leaving the rest of the data set unchanged i.e. all the subfossil data prior to the 19th century]. The difference is breathtaking.

Figure 2. A comparison of Yamal RCS chronologies. red – as archived with 12 picked cores; black – including Schweingruber’s Khadyta River, Yamal (russ035w) archive and excluding 12 picked cores. Both smoothed with 21-year gaussian smooth. y-axis is in dimensionless chronology units centered on 1 (as are subsequent graphs (but represent age-adjusted ring width). [Amended Sep 28 6 pm. Replaces url]
Finally, here is another graphic showing the same two RCS chronologies, but adding in an RCS chronology on the merged data set obtained by appending the Schweingruber population to the CRU archive – this time retaining the 12 cores. Unsurprisingly this is in between the other two versions, but most importantly it has no HS.

Figure 3. Also showing merged version up to 1990. (After 1990, there is only the few CRU cores and it tracks the CRU version.) [Amended Sep 28 6 pm. Replaces url ]
I hardly know where to begin in terms of commentary on this difference.
The Yamal chronology has always been an exception to the large-scale “Divergence Problem” that characterizes northern forests. However, using the Schweingruber population instead of the 12 picked cores, this chronology also has a “divergence problem” – not just between ring widths and temperature, but between the two versions.
Perhaps there’s some reason why Schweingruber’s Khadyta River, Yamal larch sample should not be included with the Yamal subfossil data. But given the use of a similar Schweingruber data set in combination with the Taimyr data (in a case where it’s much further away), it’s very hard to think up a valid reason for excluding Khadyta River, while including the Taimyr supplement.
Perhaps the difference between the two versions is related to different aging patterns in the Schweingruber population as compared to the CRU population. The CRU population consists, on average, of older trees than the Schweingruber population. It is highly possible and even probable that the CRU selection is derived from a prior selection of old trees described in Hantemirov and Shiyatov 2002 as follows:
In one approach to constructing a mean chronology, 224 individual series of subfossil larches were selected. These were the longest and most sensitive series, where sensitivity is measured by the magnitude of interannual variability. These data were supplemented by the addition of 17 ring-width series, from 200–400 year old living larches.
The subfossil collection does not have the same bias towards older trees. Perhaps the biased selection of older trees an unintentional bias, when combined with the RCS method. This bias would not have similarly affected the “corridor method” used by Hantemirov and Shiyatov themselves, since this method which did not preserve centennial-scale variability and Hantemirov and Shiyatov would not have been concerned about potential bias introduced by how their cores were selected on a RCS chronology method that they themselves were not using.
Briffa’s own caveats on RCS methodology warn against inhomogeneities, but, notwithstanding these warnings, his initial use of this subset in Briffa 2000 may well have been done without fully thinking through the very limited size and potential unrepresentativeness of the 12 cores. Briffa 2000 presented this chronology in passing and it was never properly published in any journal article. However, as CA readers know, the resulting Yamal chronology with its enormous HS blade was like crack cocaine for paleoclimatologists and got used in virtually every subsequent study, including, most recently, Kaufman et al 2009.
As CA readers also know, until recently, CRU staunchly refused to provide the measurement data used in Briffa’s Yamal reconstruction. Science(mag) acquiesced in this refusal in connection with Osborn and Briffa 2006. While the Yamal chronology was used in a Science article, it originated with Briffa 2000 and Science(mag) took the position that the previous journal (which had a different data policy) had jurisdiction. Briffa used the chronology Briffa et al (Phil Trans B, 2008) and the Phil Trans editors finally seized the nettle, requiring Briffa to archive the data. As noted before, Briffa asked for an extension and, when I checked earlier this year, the Yamal measurement data remained unarchived. A few days ago, I noticed that the Yamal data was finally placed online. With the information finally available, this analysis has only taken a few days.
If the non-robustness observed here prove out (and I’ve provided a generating script), this will have an important impact on many multiproxy studies that have relied on this study. Studies illustrated in the IPCC AR4 spaghetti graph, Wikipedia spaghetti graph or NAS Panel spaghetti graph (consult them for bibliographic refs) that use the Yamal proxy include: Briffa 2000; Mann and Jones 2003; Jones and Mann 2004; Moberg et al 2005; D’Arrigo et al 2006; Osborn and Briffa 2006; Hegerl et al 2007, plus more recently Briffa et al 2008, Kaufman et al 2009. (Note that spaghetti graph studies not included in the above list all employ strip bark bristlecone pines – some use both.)
Update: Sep 30: Here’s a blow-up of Figure 3 above, from 1850 on. Legend as in Figure 3. The “combined” information is shown to 1990, since post-1990 is, as noted above, limited to the CRU version and, obviously, reverts back to the CRU.





Most recent comments