NOAA’s National Centers for Environmental Prediction’s (NCEP) flagship numerical weather prediction system, the Global Forecast System (GFS) is used by meteorologists throughout the U.S. and around the world. The next upgrade to the GFS, version 16, is scheduled for implementation in February of 2021. Among other differences between this model and the current GFS, one significant change is the increased vertical resolution of the model, nearly doubling from 64 layers to 127, the first change to the GFS’s vertical resolution in nearly 20 years. This not only provides more detail at all layers, but this change also raises the model top, to better characterize the higher levels of the atmosphere, such as the stratosphere. The increased model top helps improve seasonal forecasts, essential for agriculture and energy trading, among other things, and more resolution near the surface can help better predict favorable conditions for tornadoes and other severe weather.
GFS version 15 uses a proprietary binary format for its model output, and each 3D file is just under 17GB in size. By doubling the number of model layers, using the same binary format, the output file size also doubles to over 33GB in size. Because of this significant increase in disk usage, NCEP decided to change the file format to netCDF version 4, the standard for many applications in the atmospheric sciences, as it supports compression with Zlib through the underlying HDF5 library. By applying Zlib compression to netCDF files, the size of the new 127 level 3D files were reduced to under 7GB in size, approximately 1/5 the size of the original binary format. For an entire day’s worth of model output, this change in the original model output format reduces the GFS’s total on-disk footprint from 43TB per day in version 15 to 29TB per day for version 16, even with the doubling of the number of model grid points.
The significant savings in disk space due to Zlib compression comes at the cost of write speed, however. Using the original binary format, the 33GB file can be written out in 80 seconds on NOAA’s operational HPC platforms. With the Zlib compressed netCDF files, writing the same file takes 400 seconds. So the 1/5 file size is mitigated by a 5x increase in write time. For NCEP operations, the amount of time it takes for software to run needs to be minimized. Users and downstream applications are dependent on the prompt arrival times of these products, and thus every second counts in order to meet the National Weather Service’s mission to protect lives and property and advance the nations’ economy.
RedLine support staff worked with NCEP scientists and engineers to alleviate this increase in I/O time by implementing parallelized read and write of netCDF files with Zlib compression throughout the GFS. While parallel read and write has been supported by the netCDF library for quite some time, the ability to write, in parallel, a file with Zlib compression was added as recently as March of 2020, to support development and testing of GFS version 16. First, the latest version of the netCDF C and Fortran libraries had to be deployed on all NOAA HPC platforms. After including parallel writes in the forecast model, the time to write this 7GB compressed netCDF file was reduced from 400 seconds to 40 seconds. In addition to modifications to the GFS model, support for compressed parallel I/O of netCDF files had to be added to all components of the GFS. This included changes to the model’s post-processing software, additional supporting utilities, and the Gridpoint Statistical Interpolation (GSI) data assimilation system, the latter effort led directly by a RedLine programmer.
The switch from binary serial I/O to netCDF parallel I/O with Zlib compression allowed for files containing twice the number of grid points in GFS version 16 compared to version 15 to decrease in size by more than half rather than double while keeping total I/O wall clock times similar between versions. Without these necessary changes, the implementation of GFS version 16 would have been difficult to achieve considering the operational constraints on total model execution time and disk usage. In addition to these advantages, adoption of the netCDF file format also better facilitates portability and usage by the broader meteorological community including the private sector and academia. RedLine is proud to be part of this team effort to significantly advance the U.S. global weather prediction capability.