Objective
The first objective of this project was to develop an online data archive of high-resolution climate model projections for North America from the COordinated Regional Downscaling Experiment (CORDEX) program as a resource for research and decision-making communities concerned with climate impacts and adaptation, including those operating and managing military installations at different organizational levels. This archive provides easy access to high-quality, well-vetted future climate information for variables of interest to end-users at spatial and temporal resolutions needed for different decision-making contexts.
The archive collected outputs from regional climate models driven with boundary conditions from global climate model simulations from Coupled Model Intercomparison Project 5 (CMIP5) experiments as part of the global CORDEX program. These simulations were performed using latest-generation models running at high resolution (25 and 50 km) over North America and saving daily data for a continuous 150-year period from 1950 to 2100. The data was stored in standards-compliant Network Common Data Format format and published through a web-based data portal technology providing sophisticated search and subsetting capabilities based on the Thematic Real-Time Environmental Distributed Data Services (THREDDS) Data Server platform. In addition to the raw model data, the archive also provides versions of the data that have been regridded to a common latitude-longitude grid and bias-corrected for improved utility and usability.
An additional objective of the project was to perform regional process-oriented evaluations of the model outputs in selected regions, in order to provide appropriate guidance for proper usage of the data, demonstrate appropriate methodologies for performing such evaluations, and advance the state of the art of those methodologies and scientific understanding of the issues evaluated. These analyses were based on an approach established in previous scientific work of determining the drivers of phenomena that govern climate impacts of interest and evaluating those to understand whether projected changes are credible.
Finally, the last objective of the project was to apply the “Perfect Model” (PM) evaluation framework to popular statistical downscaling (SD) and bias correction technique using data from the archive in order to evaluate whether the assumptions underlying those techniques hold under conditions of climate change. In this framework, high resolution climate model outputs were used as a proxy for observations from the future in order to evaluate whether the statistical relationships between model outputs and observations used in these techniques remained applicable as climate changes.
Technology Description
The North American (NA)-CORDEX Data Archive is a collection of high-resolution Regional Climate Model (RCM) outputs downscaling CMIP5 global climate models over North America. The archive stores data at daily and longer frequencies from 1950-2100 for important impacts-relevant variables and includes regridded and bias-corrected versions of the data. The data portal used to access the data provides customized search and subsetting capabilities, and the archive website provides documentation and guidance. The data can be downloaded for use in climate impacts research, evaluation, and decision-making.
The project team performed evaluations of the simulations in the archive over two sub-regions: the Colorado River Basin and the Deep South. These evaluations were scientific analyses focusing on causal processes, the response of these processes to future forcing, and whether those responses were credible. The details of these analyses are described along with the analysis in the Performance Evaluation section of the full report.
The project team used data from the archive to apply the PM evaluation framework to commonly used statistical downscaling and bias-correction methods to evaluate whether the assumptions underlying those methods held under climate change. In this framework, high-resolution RCM results were used as a proxy for future observations.
Interim Results
To evaluate statistical models, they are applied to data that they have not been trained on and the results compared to the corresponding correct values (“truth”). In this case, however, there were no observations from the future to use for evaluation. Statistical downscaling and bias correction methods are based on the assumption that the relationship between the model outputs and the observations is unchanging (stationary) over time. But it is known that climate is changing. Therefore, the important question is whether the stationarity assumption holds as climate changes. In the PM framework, 25km regional climate model outputs served as “truth” (proxies for observations) both for the historical and future periods, allowing for a quantitative assessment of the extent to which SD performance may have degraded when applied to future conditions. This approach isolated the “stationarity assumption” described above—something that cannot be accomplished using more typical, real-world applications of these statistical methods.
This project succeeded in its performance objectives, detailed in the following paragraphs.
- Archive existence: The project team has successfully collected data for the archive and published it through the National Center for Atmospheric Research (NCAR) Climate Data Gateway, and in-house testing and communications with end users has shown that they can use it to download data. The project team expects to continue to add data to the archive as it becomes available, but all currently available essential and high-priority data has been archived and published.
- Added value: For all of the published simulations, the project team has included archived versions of the data that have been aggregated to longer timescales and regridded to a common latitude-longitude grid. The project team has also produced bias-corrected versions of the data, although this is an ongoing effort. The project team has also generated a collection of pregenerated visualizations for the data to enable users to explore the data before download. This, too, is an ongoing effort with more additions planned in the near future.
- Usability: Working with NCAR’s data portal development team, the project team was able to successfully leverage capabilities of the THREDDS Data Server software to provide a subsetting service that allows users to download data only for specified periods and regions. The project team also implemented a specialized search page that allows users to more easily find elements of the dataset that meet their needs.
- Improved scientific understanding and guidance for end users: The Regional Analysis and Statistical Downscaling Evaluation portions of this project were scientific research analyses. The goal of the Regional Analysis was to understand how and why the simulations do or do not successfully represent important climate processes relevant to the study regions. The goal of the Statistical Downscaling Evaluation was to understand under what conditions the assumptions underlying statistical methodologies that are commonly applied to the simulation outputs hold, given changing climate. The performance objectives for these elements were to improve scientific understanding and provide information that could be used as guidance for users of the data archive. In both cases, the results were generally successful, although not yet finished. Scientific papers based on these analyses have not been completed but are in preparation. The National Oceanic and Atmospheric Administration NOAA Earth System Research Lab - Physical Sciences Division team has developed web atlases of analysis results that will be linked to from the project website as guidance materials for end users.
Benefits
Based on the experience developing data archives for NA-CORDEX and North American Regional Climate Change Assessment Program, some recommendations can be made regarding the creation and management of this kind of data archive.
- First, plan to automate as much of the process as possible. Automation is the only way to make dealing with very large datasets manageable. It also reduces the number and variety of errors to correct. Automation requires uniformity, so aim for draconian uniformity that eliminates every element of variability in the dataset that isn’t essential. Any piece of data or metadata that is common to more than one element should be identical in absolutely every instance. Stringent conformity to standards and specifications will also solve many problems but requires a thorough understanding of the often-complex governing documents.
- With regard to archive and file structure, aim to maximize the segregation of things that are different and minimize the splitting of things that are the same. Lumping different elements together leads to confusion and muddles the metadata; splitting a single element into many pieces introduces opportunities for inconsistency, gaps, and overlaps between the pieces.
- Plan for iteration and change. Occasionally, a dataset element will flow only once through the data pipeline to reach its “finished” state, but that is more the exception than the rule. More often, there will be problems to solve that result in multiple passes through the pipeline. There will also be errors that are not (and in some cases, cannot be) detected until after the data has been published. The odds are very good that any given modeling group will need to re-run at least one of its simulations, and possibly all of them. It is also likely that new elements will be added and planned elements will drop out. Keep this in mind while planning the high-level organization of the project. Assign version numbers to the data and add unique identifiers to the metadata.
- Archiving and publishing data is not an indivisible and irreversible process; it has many steps that take significant time, and the archive will be in an intermediate state of completion for much of its life. Prioritize the elements of the archive based on what has the most value to the target user community.
- Evaluate what the target users know and don’t know about the data and determine what domain-specific details they need a specialist to handle in order to make best use of the data; then generate data products or data handling systems that take care of those details.
Specific issues that affected the development of the project:
- The simulations from the Ouranos modeling group were not performed until the project was well underway, and necessitated an update to the naming conventions to address the fact that there were now two sets of Fifth-generation Canadian Regional Climate Model (CRCM5) simulations with different configurations. The project team changed the name of the first from “CRCM5” to “CRCM5- Université du Québec à Montréal (UQAM)” and added “Ouranos configuration of CRCM5”.
- Some simulations were found to have problems sufficient to require a re-run: the original CRCM5-UQAM a configuration of the Max Planck Institute for Meteorology (MPI) Earth System Model-MR future run was continued off of a configuration of the MPI Earth System Model-LR (MPI-ESM-LR) historical run; both the original Weather Research and Forecasting (WRF) and Regional Climate Model 4 (RegCM4) MPI-ESM-LR runs had mis-specified surface temperatures over the oceans; and the original WRF European Centre for Medium-Range Weather Forecasts ReAnalysis-Interim runs had no sea ice due to configuration issues. In all these cases, the problems were not detected until after the data had been published, resulting in retractions from the data portal.
- There were differences in the handling of time coordinates between the native code for Kernel Density Distribution Mapping and its re-implementation in the PM framework that made it very difficult to validate the equivalence of the two, especially in the case of precipitation (which requires additional processing not required for temperature). This caused significant delays in processing Phase 1 data for precipitation, and ultimately it had to be dropped from the analysis.