Reply to Bell and Rosenfeld
An unpublished manuscript. For complete story, see here.
DAVID M. SCHULTZ (1,2), SANTTU MIKKONEN (3), MICHAEL B. RICHMAN (4), AND ARI LAAKSONEN (2,3)
(1) Division of Atmospheric Sciences and Geophysics, Department of Physics, University of Helsinki, Helsinki, Finland
(2) Finnish Meteorological Institute, Helsinki, Finland
(3) Department of Physics, University of Kuopio, Kuopio, Finland
(4) School of Meteorology, University of Oklahoma, Norman, Oklahoma, USA
We appreciate Bell and Rosenfeld's (2008; hereafter BR) comments on our paper (Schultz et al. 2007; hereafter S07). We are thankful for this opportunity to address the concerns raised by BR, extend our results from S07, and clarify the differences between S07 and Bell et al. (2008; hereafter B08). BR's principal concern seems to lie with the statement in S07 that our results contradict results from B08. This statement in S07 is meant to characterize the differences between B08, which purports to find weekly cycles of precipitation over certain locations over and near the United States during the summer in recent years, versus S07, which does not find any statistically significant signals. As BR discuss, no overlap exists between the TRMM dataset (1998-2005) in B08 and the rain-gauge data in S07, and clarification of that point by S07 might have mitigated BR's response.
2. Comparison of rain-gauge data from S07 with that in B08
B08 analyze rain-gauge data from 1901-2005 and claim to find a weekly cycle "strong enough to be detectable beginning in the 1980s" in the southeastern United States (B08). In contrast, BR states a "lack of a weekly cycle before about 1990." S07 could not address the validity of those statements because they did not subdivide their data as a function of year or by region of the United States. We do so in this section.
To see if there were spatially coherent patterns of weekly cycles naturally within the data, several varieties of cluster analysis, some with a spatial component, were applied. No statistical significant clustering could be seen. Next, to mimic region B in B08's Fig. 3b, we chose a study region including all stations with latitudes south of 40 deg N and longitudes east of 100 deg W. The resulting region included 74 stations from the southeastern United States. Temporally, we tested 1980-1992 (all 12 months) and the summer alone (June, July, and August) for these 74 stations.
First, to examine whether the precipitation amount varies as a function of the day of the week, we tested all 12 months together and the three summer months for each 74 stations separately and for the data averaged over the study area by the Kruskal-Wallis Rank Sum Test (Hollander and Wolfe 1973, pp. 115-120). Specifically, for the whole year of data, one station out of 74 had statistically significant differences (p<0.05) in the amount of rain between the days of the week, and, for summer only, two stations out of 74 had statistically significant differences. Thus, neither of the datasets showed any statistically significant weekly patterns, nor does the number of significant station exceed those expected by random chance.
Second, to see if precipitation occurrence varies as a function of the day of the week, we used the chi square test, following the methodology in section 4 of S07. For the whole year of data, two stations out of 74 had statistically significant differences in precipitation occurrence between the days of the week, and, for summer only, there were five stations with significant differences. Thus, neither of the datasets showed any statistically significant weekly patterns.
Finally, we looked for cyclic structures from our data using autocorrelation and partial autocorrelation functions (similar to those described in section 6 of S07), but they did not produce any positive results for any of the stations. A seasonal Autoregressive Integrated Moving Average model with seven-day cyclic structure was also fit to data averaged over the study area but the output from this model did not differ significantly from that of the null model (i.e. no weekly cycles were found).
Finally, all tests mentioned above were repeated for the last seven years of the data (1985-1992) to see if any weekly cycles would occur closer to the 1990s, as B08 suggested, but no cyclic structures emerged. Thus, we conclude from this new analysis that we are not able to replicate B08's results using our data and a variety of statistical techniques that differ from those in B08.
3. Critique of Cerveny and Balling (1998)
BR disagree with our critique of Cerveny and Balling (1998). Specifically, BR argue that using ozone and carbon monoxide concentrations at Sable Island as indicators of pollution transport into the North Atlantic Ocean is a "reasonable assumption." Remember that when considering aerosol effects on clouds and rain, cloud condensation nuclei (CCN) are what matter, and CCN are only a subset of the total aerosol, which moreover depends on environmental conditions (specifically, on updraft velocities) during cloud formation. The ocean is a source of very efficient CCN (sea-salt particles), which may damp any signal in anthropogenic CCN at a location such as Sable Island. Cerveny and Balling (1998) combined carbon monoxide and ozone - two gases with different sources, sinks, and atmospheric lifetimes - into a single variable, which they found to display a statistically significant weekly cycle at Sable Island. High concentrations of ozone, carbon monoxide, and aerosol particles all represent "pollution," but finding a cycle in the combined gas variable (or in the gases separately) in no way proves a similar cycle in aerosol, or CCN, concentration. Based on the fact that aerosol concentrations display weekly cycles over the continental United States, we agree that a similar cycle may exist over the nearby Atlantic Ocean. Cerveny and Balling's (1998) analysis, however, does very little to strengthen this assumption, and provides no quantitative information at all. This reasoning is one example of why we are critical of Cerveny and Balling (1998) - other examples are discussed in S07.
4. Statistical tests
In their section 3, BR apparently criticize our choice of statistical tests, but then state that "addressing the issues we mention below would probably only reinforce the null result S07 obtained." In fact, a careful reading of S07 shows that these concerns were addressed. Even so, we have concerns about their criticisms.
The first issue addressed by BR is the issue of field significance testing. BR state, "it is not clear to us how S07 address this problem." Further, they are troubled by the fact that the number of stations exceeding 90% confidence is less than we would expect by chance. In fact, these results indicate the strength of our approach. When a predictor (e.g., day of the week) is used to find a possible statistical relation to a response variable (e.g., precipitation), the strength of that relation is tested at a number of discrete locations. In most meteorological applications, the domain is oversampled. From the view of physical interpretation, such oversampling is advantageous as it allows for precise location of contours and identification of gradients. From a statistical view, however, oversampling is a disadvantage as it violates the assumption of independent samples. If no stations are locally significant as was found in S07, testing for field significance is unnecessary. The lack of field significance testing resulting from finding no locally significant tests in our study should not be construed as a suggesting that field significance is not useful. Indeed, field significance can be instrumental in determining significance (e.g., Montroy et al. 1998).
The second issue of statistical testing raised by BR regards spatial and temporal data dependence. Since each station did not show any statistically significant signal, spatial dependence is not relevant (as discussed above for field significance testing). Furthermore, the autocorrelation analysis performed in section 6 of S07 tests specifically for temporal dependence. Repeating our conclusion from S07, "we find no significant autocorrelation beyond a weak 1-day persistence, which is easily explainable from synoptic reasoning due to the persistence of rainfall from one day to the next."
The third issue of statistical testing raised by BR involves "appropriately averaging the data over many sites" to improve the signal detection of climate change. We agree, in part, with BR, that subdividing the data may have utility, provided that each member of the subgroup has been tested for uniformity. Given the network of stations, variation from station to station will exist that must be accounted for. Averaging without ensuring each member of the group represents the same physical process will: (1) reduce the interstation variation that can have the effect of making statistical tests that depend on such variability (e.g., involving the standard error) appear more significant, (2) reduce nonlinearities in the relationships being sought, and (3) possibly merge data that are inhomogeneous, resulting in a new type of relationship that does not portray well any of the individual station data. Clearly, one advantage to clustering data that represent the same process is addressing multiplicity, one of the primary reasons field significance is applied in a post-hoc fashion. However, averaging is a statistical model and must be done meticulously to document the gains against the losses. As reported in section 2, clustering analysis did not yield any gains in "appropriate averaging" using our dataset.
Finally, BR support harmonic analysis over the techniques S07 discussed, claiming that sinusoidal fitting is more physically likely. However, they neglect two aspects. First, is that the forcing function for the emission of anthropogenic aerosols is not sinusoidal - there are five weekdays and two weekend days. S07's nonparametric tests do not impose sinusoids. Second, S07 state why their tests are more powerful than harmonic analysis: "any existing seven-day cycle would be detected with the statistical tests already in the paper." Rather than asking whether a seven-day cycle exists, S07 ask a simpler question: whether any day of the week differs from any other day of the week (in terms of precipitation amount or occurrence). S07's results (as well as the additional calculations provided in section 2 of this reply) provide no evidence for a statistically significant favored day of the week. Simply put, we believe that by B08 assuming a seven-day cycle in their analysis methodology likely biases their conclusions in favor of finding a seven-day cycle.
B08, on the other hand, prefer their method of subdividing the data and selecting those regions, times, and situations where an effect is seen. If one were to keep subdividing data and running dozens of tests or creating dozens of experiments, some post hoc correction may be required to assess the real significance. Ironically, a good parallel exists between that approach and applying field significance. Field significance is supposed to make seemingly significant results at the local level more conservative by resampling tests accounting for the spatial autocorrelation of the data. By running many experiments on the same dataset, subdivided, a similar type of problem could arise. Moreover, the conditional probability looks good (e.g., high p values), but if those conditions occur in only very specific instances, when one examines the full probability space (such as forecasting the impact), the percentage of times a subdivided sample occurs must be factored back in, putting the situation into perspective. Such an assessment can be made in a Bayesian framework to determine the impact of false alarms.
5. "Physical understanding" of aerosol-cloud interaction
BR state that because their study takes into "account current physical understanding of how aerosols might affect storm development, B08 were able to construct averages of the data that emphasized where and when the expected signal should be strongest." However, B08 overstates the extent to which current physical understanding can be applied to their results.
First, a cloud-resolving model study (Tao et al. 2008) showed that increased aerosol concentrations can have either a suppressing or an enhancing effect on deep convective precipitation, depending on environment. Specifically, they found rain to be suppressed in Oklahoma and almost no effect in Florida, casting doubt on how good the physical understanding claimed by BR actually is.
Second, BR discuss the "the well-documented change over the decades in the concentrations of particulate types over the U.S." In B08, however, they argue that the "change in composition of air pollution ... from absorbing to less absorbing aerosols" might explain the appearance and disappearance of a weekly cycle at times during the twentieth century. This change in composition is attributed to two journal articles, both of which must estimate historical properties of aerosols because direct measurements were not available. Although we agree that changes have occurred in the concentrations (and likely also in the sizes and chemical compositions) of aerosols over time, BR overstate the evidence for these changes and provide little in the way of a possible physical argument that would explain the appearance of a weekly cycle around 1945-1950 and again after 1990 (B08, their Fig. 13a).
In closing, we wish to make two final points, both of which follow from the comment and reply exchange between BR and us.
Statistical significance does not equal physical significance. Just because our computations did not produce any statistically significant differences from day to day does not mean that we believe that anthropogenic aerosols are not affecting the weather and climate. Such a point was made in discussing the lack of statistical significance of the January thaw (Godfrey et al. 2002).
Satellites do not measure precipitation rates or amounts directly. In fact, changes in the cloud hydrometeor properties by changes in aerosol composition or content would change the properties of the clouds as viewed from satellite. Thus, extracting the indirect aerosol effect from satellite is not possible with 100% confidence. The analysis by B08 using several datasets (satellite, rain gauge, model reanalysis) obviates our concerns somewhat in this regard, although, as we have discussed in this paper, raises other significant questions.
Bell, T. L., D. Rosenfeld, K.-M. Kim, J.-M. Yoo, M.-I. Lee, and M. Hahnenberger (2008), Midweek increase in U.S. summer rain and storm heights suggests air pollution invigorates rainstorms, J. Geophys. Res., 113, D02209, doi:10.1029/2007JD008623.
Cerveny, R. S., and R. C. Balling, Jr. (1998), Weekly cycles of air pollutants, precipitation and tropical cyclones in the coastal NW Atlantic region. Nature, 394, 561-563.
Godfrey, C. M., D. S. Wilks, and D. M. Schultz (2002), Is the January Thaw a statistical phantom? Bull. Amer. Meteor. Soc., 83, 53-62.
Hollander, M., and D. A. Wolfe (1973), Nonparametric Statistical Inference. New York: John Wiley & Sons, pp. 115-120.
Montroy, D. L., M. B. Richman, and P. J. Lamb (1998), Observed nonlinearities of monthly teleconnections between tropical Pacific sea surface temperature anomalies and central and eastern North American precipitation. J. Climate, 11, 1812-1835.
Schultz, D. M., S. Mikkonen, A. Laaksonen, and M. B. Richman (2007), Weekly precipitation cycles? Lack of evidence from United States surface stations, Geophys. Res. Lett., 34, L22815, doi:10.1029/2007GL031889.