Scientists and researchers increasingly use high-performance computers, among other tools, to acquire, organize, and process data. As experts in their domains, it’s natural to expect them to be more productive and successful if their primary focus is on the frontiers of their fields of study rather than on deeply understanding their digital and computational tools.
That’s where high-performance software libraries come in. As we’ll learn, such libraries serve as time-saving computational tools for scientists and researchers who utilize HPC in their work.
The Broad HPC View
Most HPC application software is very complex. Models of ever-increasing complexity are being used to simulate and solve problems with larger and larger data sets. As computer, networking, and memory system advances become available, HPC applications should be updated to take full advantage of the improved system features to solve larger problems with better models in a timely fashion.
What efficiencies are possible to simplify updating HPC software to take advantage of new hardware features?
Many HPC applications, such as weather and climate modeling, airplane wing design, radar cross section, structural analysis of dams and bridges, and drug design spend considerable compute time in solving dense linear algebra problems. In addition, Deep Learning is an emerging application that relies heavily on performing matrix-matrix multiplication on many small matrices which is, again, a linear algebraic operation. Thus, one can see that computational linear algebra is a collection of compute-intensive kernels for HPC.
Scientific researchers could invest a lot of time to develop the expertise needed to optimize these kernels to take full advantage of emerging hardware with multi-core architectures, graphics processing units available in a variety of sizes, and internal connectivity routes and data transfer performance characteristics. Fortunately, there are experts and vendors in this domain and they provide scientific high-performance computing libraries, such as Automatically Tuned Linear Algebra Software (ATLAS), Intel Math Kernel Library (Intel MKL), IBM Engineering and Scientific Subroutine Library (ESSL), AMD Core Math Library (ACML), and, more recently, the GPU-Accelerated Libraries included in Nvidia’s CUDA Toolkit, among others.
The use of these scientific high-performance computing libraries is an invaluable aid to clarity, portability, modularity, maintenance, and performance for scientific code developers, and they are among the first software components to be ported to and optimized for new hardware.
Dense Linear Algebra Evolution
There are several key libraries for dense linear algebra that cover functionality used by many scientific applications. Over time their implementations and internal algorithms have evolved to optimally map linear algebra computational kernels and functions to the available HPC hardware. This evolution can be seen as follows: de facto standard APIs → optimize for vector hardware → optimize for parallel vector machines → optimize for distributed, hierarchical memory on parallel machines → optimize for highly distributed memory and parallel machines.
The progression of high-performance dense linear algebra libraries driven by hardware improvements has been defined by experts in the field of numerical analysis. The introduction of new kernels and libraries is intended to replace the old, specifically, BLAS → BLAS Level-2 → BLAS Level-3. Additionally, the path of LINPACK → LAPACK → ScaLAPACK → PLASMA has allowed updating of the interfaces to yield higher performance of the computational kernels and functions. Most of these are included as kernels and functions provided by the Scientific High-Performance Computing Libraries mentioned earlier.
The HPC basic building block of computational linear algebra has provided us with the opportunity to focus on a reasonably small set of computational kernels and approaches for them. Performance can be obtained by using highly optimized implementations of these functions provided by Scientific High-Performance Computing Libraries. By using these libraries based on de facto interfaces, applications can be written to be portable and highly efficient over a wide range of machine architectures. Thus, these libraries enable scientists and researchers to develop their HPC applications more quickly, and to efficiently migrate their applications to new hardware.
We’d be happy to talk about how making use of these high-performance scientific libraries could improve the performance of your own HPC codes. Just reach out.