EvoNexus Graduate Company, GigaIO’s New SuperNODE Takes-off with Record Breaking AMD GPU Performance

EvoNexus Graduate Company, GigaIO’s New SuperNODE Takes-off with Record Breaking AMD GPU Performance

HPC WIRE / AUGUST 10TH, 2023 / The HPC users dream is to keep stuffing GPUs into a rack mount box and make everything go faster. Some servers offer up to eight GPUs, but the standard server usually only offers four GPUs slots. Fair enough, using four modern GPUs offers a significant amount of HPC heft, but can we go higher? Before we answer that question, consider a collection of eight servers each with four GPUs, for a total of 32 GPUs. There are ways to leverage all these GPUs for one application by using MPI across servers, but many times this is not very efficient. In addition, shared computing environments often have GPU nodes that may sit idle because they are restricted to GPU-only jobs leaving the CPUs and memory unavailable for any work.

Stranded Hardware

In the past, a server with a single socket processor, moderate amount of memory, and a single GPU were much more granular than today’s systems. This granularity allowed for more effective resource application. As servers have packed in more hardware (i.e., large memory multi-core nodes with multiple GPUs) the ability to share resources becomes a bit trickier. A four-GPU node server works great, but it may be used exclusively for GPU jobs and otherwise sit idle. The large granularity of this server means an amount of memory and CPUs may be stranded from use. Simply put, packing more memory, cores, and GPUs into a single server may reduce the overall cost, but for HPC workloads it may end up stranding a lot of hardware over time.

Composable Hardware

The “stranded” hardware situation has not gone unnoticed and Compute Express Link™ (CXL™) was established to help with this trend. The CXL standard, which is rolling out in phases, is an industry-supported Cache-Coherent Interconnect for Processors, Memory Expansion and Accelerators. CXL technology maintains memory coherency between the CPU memory space and memory on attached devices, which allows resource sharing for higher performance, reduced software stack complexity, and lower overall system cost.

While CXL is not quite available yet, one company, GigaIO, does offer CXL capabilities today. Indeed, GigaIO has just introduced a single-node supercomputer that can support up to 32 GPUs. These GPUs are visible to a single host system. There is no partitioning of the GPUs across server nodes, the GPUs are fully usable and addressable by the host node. Basically, GigaIO offers a PCIe network called FabreX™ that creates a dynamic memory fabric that can assign resources to systems in a composable fashion.

Using the FabreX technology, GigaIO demonstrated 32 AMD Instinct MI210 accelerators running in a single-node server. Available today, the 32-GPU engineered solution, called SuperNODE, offers a simplified system capable of scaling multiple accelerator technologies such as GPUs and FPGAs without the latency, cost, and power overhead required for multi-CPU systems. The SuperNODE has the following benefits over existing server stacks:

  • Hardware agnostic — use any accelerator including GPU or FPGAs.
  • Connects up to 32 AMD Instinct ™ GPUs or 24 NVIDIA A100s to a single node server.
  • Ideal to dramatically boost performance for single node applications.
  • The simplest and quickest deployment for large GPU environments.
  • Instant support through TensorFlow and PyTorch libraries (no code changes).

As noted by Andrew Dieckmann, corporate vice president and general manager, Data Center and Accelerated Processing, AMD, “The SuperNODE system created by GigaIO and powered by AMD Instinct accelerators offers compelling TCO for both traditional HPC and generative AI workloads.”

Benchmarks Tell the Story

GigaIO’s SuperNODE system was tested with 32 AMD Instinct MI210 accelerators on a Supermicro 1U server powered by dual 3rd Gen AMD EPYC™ processors. As the following figure shows two benchmarks, Hashcat and Resnet50, were run on the SuperNODE.

  • Hashcat: Workloads that utilize GPUs independently, such as Hashcat, scale perfectly linearly all the way to the 32 GPUs tested.
  • Resnet 50: For workloads that utilize GPU Direct RDMA or peer-to-peer, such as Resnet50, the scale factor is slightly reduced as the GPU count rises. There is a one percent degradation per GPU, and at 32 GPUs, the overall scale factor is 70 percent.

These results demonstrate significantly improved scalability compared to the legacy alternative of scaling the number of GPUs using MPI to communicate between multiple nodes. When testing a multi-node MPI model, GPU scalability is reduced to 50 percent or less.

CFD Takes Off on the SuperNODE

Recently, Dr. Moritz Lehmann posted on his experiences using the SuperNODE for a CFD simulation on X/Twitter. The incredible videos are viewable on X/Twitter and available on YouTube.

Over the course of a weekend, Dr Lehmann tested FluidX3D on the GigaIO SuperNODE. He produced one of the largest CFD simulations ever for the Concorde, flying for one second at 300km/h (186 m/h), using 40 billion cells of resolution. The simulation took 33 hours to run on 32 AMDInstinct MI210 GPUs and 2TB VRAM housed in the SuperNODE. Dr Lehmann explains, “Commercial CFD would need years for this, FluidX3D does it over the weekend. No code changes or porting were required; FluidX3D works out of the box with 32-GPU scaling on AMD Instinct and an AMD Server.”


About GigaIO


As the global data explosion and AI revolution unfold, global computing, storage and networking infrastructure are undergoing a fundamental transformation. GigaIO’s technology enables data centers to leverage their existing hardware infrastructure and benefit from these new technology paradigms, to power the next generation of data centers. GigaIO FabreX breaks the constraints of old static architectures, opening up new configuration possibilities with Composable Disaggregated Infrastructure (CDI), to maximize utilization of all the elements within your racks. The network is finally the computer, with your entire rack being the computing unit instead of just the server. We exist to free you from the constraints of your infrastructure, so you can focus on your company’s mission: invent more, solve more, achieve more. To learn more, visit gigaio.com.

About EvoNexus

EvoNexus is California’s leading nonprofit technology startup incubator located in San Diego. EvoNexus has successfully incubated over 260 startups with a survival rate of over 85%. Since the incubator’s formation in 2010, EvoNexus companies have secured over $1.6B in venture funding, $10B in pre-exit valuation, and had 50 acquisitions valued at over $2.1B. EvoNexus is supported by corporate partners, including some of the largest multinational corporations in the world. Learn more at evonexus.org.