Posted on 21 March, 2021
Four times a year, Graphcore release a major update to their Poplar software stack. Typically, each one brings new features that help users develop novel applications for the IPU, simplifies the process of creating models natively or implementing them from third-party frameworks, and delivers performance gains.
Because Graphcore’s software and hardware are co-designed, Poplar’s features are inextricably linked to the evolving capabilities of their IPU systems. Our latest release, Poplar SDK 2.0, again delivers significant improvements that will allow customers to achieve more with their Graphcore technology.
Included in the update are new tools that expand the scale-out capabilities of Graphcore systems, as users look to deploy larger configurations to solve the toughest challenges in machine intelligence. These include support for IPU-Fabric up to IPU-POD128 with the Graphcore Communication Library, as well as distributed multi-host support.
This release also includes a number of optimisations for popular applications, and refinements to the tools that help users optimise their own workloads, with major updates to our PopVision analysis tools – the Graph Analyser and System Analyser.
New Features in Poplar SDK 2.0
- Support for IPU-Fabric up to IPU-POD128: new Graphcore Communication Library (GCL)
- Easy to use, distributed multi-host scale-out support – new command line feature (PopRun) and Poplar Distributed Configuration Library (PopDist)
- Major updates to our PopVision analysis tools – Graph Analyser and System Analyser
- Offline compilation: load and save compiled models from frameworks
- Long-sequence LSTM optimisation: LSTM and GRU enhancements
- Generalised CTCLoss: optimised CTC Loss support in PopLibs
- Enhanced support for Cholesky and triangular solve (TensorFlow)
- New major optimisations to sorting and topK operations
- New support for PyTorch 1.7.1 with additional ops added
- Improved documentation, optimisation and user guides, white papers
Supporting applications at scale
Support for IPU-Fabric up to IPU-POD128
The new Graphcore Communication Library (or GCL) is designed to enable high-performance scale-out for IPU systems. GCL utilises the IPU´s built-in hardware support for transferring directly from one IPU to another IPU's memory (via remote memory access or RMA) to make a low-overhead, high-throughput communication library. Specifically targeted at scale-out, this extension to the Poplar software stack provides support for our IPU-Fabric interconnect architecture up to IPU-POD128 systems.
In creating GCL, Graphcore has been able to successfully eliminate several of the major communication challenges facing modern machine learning networks today.
Graphcore´s deterministic communication scheduling efficiently eliminates jitter, a common performance limitation in high-performance computing. Jitter in this context is the timing variation in packet flows over a network. In host NIC-based systems, this can be caused by timing variations in the host operating system due to rescheduling of tasks (for example, handling high priority interrupts or garbage collection) or the network, such as differences in network latency for source and destination pairs, and traffic-dependent network congestion.
No operating system runtime overhead is incurred for memory management and the need for asynchronous communication handling is eliminated, enabling faster communication and improved performance.
Distributed multi-host scale-out support
PopRun and PopDist allow developers to run their applications across multiple IPU-POD systems.
PopRun is a command line utility for launching distributed applications on IPU-POD systems and the Poplar Distributed Configuration Library (PopDist) provides a set of APIs which developers can use to prepare their application for distributed execution.
When using large systems such as IPU-POD128, PopRun will automatically launch multiple instances on host servers located in another interconnected IPU-POD. Depending on the type of application, launching multiple instances can increase performance. With PopRun, developers are able to launch multiple instances on the host server with support for NUMA enabling optimal NUMA node placement.
For SDK 2.0, PopDist has been updated to include PyTorch support.
There are also new efficiency optimisations for PopRun:
- Offline mode: for running applications without requiring IPUs
- Temporary executable caching: to eliminate redundant same-host compilations
- Option to use and pin all available NUMA nodes consecutively
- Interactive runtime progress status
A new PopRun and PopDist User Guide is available here.
Major updates to PopVision analysis tools
The PopVision Graph Analyser and System Analyser provide granular detail for developers regarding application and host system performance on IPUs and IPU systems through a series of interactive, visual reports.
New functionality in the tools for this release is summarised in our recent PopVision tools blog and includes new floating point operation counts, enhanced framework debug information and a new table view for memory variables.
Optimisations for developers to accelerate experimentation
Offline compilation has been added to this latest release of the Poplar software stack. Now customers can load and save compiled models from machine learning frameworks such as PyTorch and TensorFlow. This accelerates experimentation, as developers can compile once and then run multiple experiments with different hyper-parameters, without needing to wait for recompilation.
Significant optimisations to improve compilation time generally have also been added.
New Features supporting applications and models
For long-sequence LSTM (long short-term memory) optimisation, new LSTM and GRU enhancements have been provided for improved application efficiency.
Our new generalised CTCLoss feature enables optimised CTC Loss support in Poplar libraries (PopLibs), helping developers to efficiently run a wider selection of time-series based models.
We have also provided enhanced support for Cholesky and triangular solve in TensorFlow.
We have introduced bitonic sort for PopLibs to enable major optimisations to sorting and topK operations.
Following the launch of our PyTorch production release for the IPU, we have added new support for PyTorch 1.7.1 with additional ops including numerous activation functions, random sampling operations and much more.
New and improved developer resources
Graphcore is constantly making new content available to IPU developers to ensure they can get the most out of their applications. For Poplar SDK 2.0, we have added new documentation, user guides and tutorials to support innovators as they program IPU systems. The examples and tutorials available on GitHub are now listed in full on our Documents Portal.
Visit our Developer Portal to access our extensive suite of Poplar SDK documents, GitHub code tutorials, video walkthroughs and application examples.