The convergence of artificial intelligence and high-performance computing is coming. And not in the far-off future. But now.
In large part, that’s because AI and HPC run similar data types. Both run highly compute-intensive workloads. And both need lots of scalability.
Until recently, most AI deep training work was done on single-node systems beefed up with multiple GPUs.
HPC workloads, by contrast, were run on clusters of multiple nodes. In a typical commercial setting, an HPC system has 16 or 32 nodes. But in theory, an HPC system could have thousands, or even hundreds of thousands, of nodes for maximum supercomputing.
What’s changed is that AI training can now be done on local workstations and small clusters running optimized software. These HPC clusters are a lot more flexible than single-node, AI-specific platforms.
Mainly, that's because HPC clusters can be used for other tasks efficiently. Those specialized single-node systems can’t. When they’re not running AI, they’re effectively stranded.
HPC clusters are also a lot more scalable, and that’s important for some AI problems. On a single-node setup, there are real limits to how much performance you can deliver.
An HPC cluster can exceed those limitations. Its discrete nodes work together in parallel, potentially delivering orders of magnitude higher performance.
That’s also why clustered platforms can help solve a huge AI problem: training the software. Because real-world data sets are huge and complex, the process can take months.
An AI system needs to be shown how to avoid errors and ignore misleading data. All that makes AI training “computationally expensive” — and a great fit with an HPC cluster.
Intel, by far the leading provider of data-center processors, is here to help. Intel is committed to making its processors the “platform of choice” for AI, and that includes:
> Intel Xeon Scalable family: These are highly scalable processors for evolving AI workloads and intensive deep-learning training. They’re versatile, too, as any architecture can be run, and adding nodes is easy.
The latest Intel Xeon Scalable processors deliver a 1.63x average performance increase over prior generations on HPC applications. (That figure is based on Intel’s internal testing on Normalized Generational Performance going from an Intel Xeon processor E5-26xx v4 to an Intel Xeon Scalable processor.)
> Intel FPGAs: These are programmable accelerators for deep-learning inference.
> BigDL: To simplify deep-learning development, this distributed deep-learning library for Apache Spark software lets users write deep-learning applications as standard Spark programs. These programs can then run on top of existing Spark or Hadoop clusters.
> Software Frameworks: These frameworks help users train neural networks faster on Intel architecture. Offerings include: Neon, a deep learning solution; TensorFlow, a Python-based deep learning framework; and Caffe, a framework for image recognition.
Intel is not only talking about AI, but also investing in it. For example, Intel in 2016 acquired Nervana Systems, maker of neural network processors for deep learning. And its Intel Capital group has invested some $1 billion in AI startups. These include Mighty AI, Data Robot and Lumiata.
“We are 100 percent committed to creating the roadmap of optimized products to support emerging mainstream AI workloads,” Intel’s CEO, Brian Krzanich, wrote in a recent editorial.
AI, data analytics and traditional simulation are converging. Now you can offer your customers a single HPC cluster that does it all.
> AI-HPC is happening now: InsideBigData white paper
> Intel Invests $1 Billion in the AI Ecosystem to Fuel Adoption and Product Innovation: editorial by Brian Krzanich, CEO of Intel.
> How Intel is bringing AI and machine learning to the people: video interview with Pradeep Dubey, director of Intel’s parallel-computing lab.