Skip to content
RedLiner Portal
  • About
    • Leadership
    • Team
    • RedLine Performance Methodology
  • Expertise
    • Scientific Programming & Analysis
    • Enterprise IT Technical Infrastructure
    • HPC Systems Deployment / Management
    • HPC Storage & Networking
    • Cloud Computing
    • Mission Support
  • Contracts
  • Case Studies
  • News
  • Careers
  • Contact
  • About
    • Leadership
    • Team
    • RedLine Performance Methodology
  • Expertise
    • Scientific Programming & Analysis
    • Enterprise IT Technical Infrastructure
    • HPC Systems Deployment / Management
    • HPC Storage & Networking
    • Cloud Computing
    • Mission Support
  • Contracts
  • Case Studies
  • News
  • Careers
  • Contact

News & Blogs

Get Started
red arc

Baselining, Benchmarking for Optimal HPC Performance: Part 1

  • System Administration
  • August 10, 2016
  • Chris Young

HPC Systems are frustratingly complex beasts to tame. Competing interests from hardware and software support teams, coupled with demands from IT security compliance, can make having “consistency” in a cluster tough to achieve.

Baselining is the natural answer. With baselining, we set a series of “custom” benchmarks. By custom, I mean they are a given set of benchmarks that come together to provide a quantitative analysis of the performance of an HPC system. Are the numbers used for the individual benchmarks industry leading? Perhaps not. But they quantify how your system needs to operate in order to run its applications properly.

Consider incorporating baselines into your normal preventative maintenance schedule. Work with your change management coordinator to ensure application benchmarks are performed prior to releasing the system to systems administrators for maintenance. Allow time for system hardware baselines to be executed during maintenance and make sure that application baselines are completed and the results are analyzed prior to exiting maintenance.

The importance of pre-maintenance baselines can’t be underestimated: they represent to everyone that the system is running as expected prior to engaging in system maintenance. With a pre-maintenance baseline, system operations teams and application development teams have assurance that the system operates in a consistent manner prior to engaging in maintenance. Executing the same baselines upon completing maintenance provides a similar assurance that un-intended/unknown changes were not introduced through the maintenance event.

Should you encounter a failure of the pre-maintenance baseline, it may be unwise to turn over the system to the systems administration team for them to introduce more change to a system that already has an unknown error. Rescheduling the maintenance, if possible, to permit the troubleshooting of the existing problem without introducing new change, may be a better course of action.

While change in an HPC system is inevitable, change in the core benchmarks and baselines is not. Finding the right mix of benchmarks that represent your system performance adequately takes time and effort but once established they should be maintained. They can live for years and even across systems.

These core baselines create a basis for comparing change in your system across updates, and even permit performance comparisons between dissimilar systems. For the life of the system, consider the core baseline benchmarks to be sacrosanct and immutable. There’s immense value in being able to quantify the performance of your system across its lifetime. Considering a change to the baseline benchmarks invalidates the results of future runs against the old baseline results, becoming an apples and oranges comparison.

For system administrators, having immutable core baselines gives them a target to keep the system running optimally as they apply system updates and changes. For application management teams, the immutability of a local application benchmark provides some interesting insights:

  • Certainty that the HPC system is executing as expected.
  • A point-in-time lookback of how their code executed in the past using a code-base they couldn’t have changed.
  • The quantitative expression, through an application benchmark result, of any changes in underlying system infrastructure. For example, perhaps a change in an OFA or OFED driver results in a change in RDMA communications resulting in a performance increase (or worse, decrease.)

Consider carefully the core benchmarks you want to include in your system’s baseline profile. Once set, they should be maintained. This does not mean additional benchmarks cannot be added. They do need to change to incorporate the development of new hardware or coding methods, but you need to keep a core set that doesn’t change such that you can measure/compare clusters from a historical or maintenance upgrade perspective.

As an example, MPI-IO is an I/O API developed upon MPI and you may want to include that type of benchmark if your applications start using it. You also may want to include different GPU benchmarks or updated applications. However, it doesn’t mean you should get rid of the old benchmarks or older applications if you want to maintain the historical perspective on performance.

In my next post, I’ll get into some specific advice about what sort of baselining you need to be conducting to optimize the performance of your HPC system, including guidance on memory bandwidth tests, high-intensity CPU checks, interconnect networks, and more. In the meantime, reach out with any questions or for a deeper conversation about your own situation.

More Posts

Warewulf: Supercharging High-Performance Computing Clusters

September 23, 2025

Enhancing Continuous Integration Practices at NOAA EMC

July 24, 2025

Using Spack to Streamline Software Development, Testing, and Deployment

May 5, 2025

Streamlining HPC Workflows with ecFlow: A Game-Changer for Operational Efficiency

March 10, 2025

Porting the Global Workflow to Google Cloud Platform: Challenges and Lessons Learned

January 17, 2025
Categories
Archives
Author
Picture of Chris Young
Chris Young
All Posts
NextVectors: How the Old Became New Again in SupercomputingNext
red arc
RedLine Performance Solutions logo

Stay Connected

301-685-5949
webinfo@redlineperf.com
Connect on LinkedIn
RedLiner Portal

Services

  • Scientific Programming & Analysis
  • Enterprise IT Technical Infrastructure
  • HPC Systems Deployment / Management
  • HPC Storage & Networking
  • Cloud Computing
  • Mission Support
  • Scientific Programming & Analysis
  • Enterprise IT Technical Infrastructure
  • HPC Systems Deployment / Management
  • HPC Storage & Networking
  • Cloud Computing
  • Mission Support

© 2025 REDLINE | PRIVACY POLICY | WEBSITE BY: SASSE AGENCY