Blog
AutoDist Logo
Research

AutoDist, an Open Source distributed deep learning training engine from Petuum

6 min read

July 21, 2020

Introducing AutoDist

Petuum is excited to announce our latest Open Source project, AutoDist, a distributed deep learning training engine. It provides an easy to use interface to automatically distribute the training of a wide variety of deep learning models across many CPUs and GPUs at scale with very minimal code change.  

Why use AutoDist?

Are you searching for an intuitive library on TensorFlow for distributed training? As an alternative to Horovod with better performance, AutoDist allows a developer to scale a model from a single GPU to many, without requiring changes to your model building scripts. We’ve approached this from a different perspective — graph optimization with composable representation of strategies to enable manual crafting or automated selection of the best options for your model training.

How to use AutoDist? 

AutoDist is very easy and intuitive to use. It adopts the philosophy of minimizing users’ code modification by exposing a set of interfaces that allows the distribution of arbitrary single-node version code with little-to-zero modification.  

For example, after a developer prototypes and finalizes the model design, they may want to accelerate the training process with multiple GPU devices or nodes. However, it requires a lot of effort to change a model built for single device to fit parallel training; add to that the effort to configure the cluster, to install the required technology, to adopt another launcher and so on. AutoDist aims to free developers from those concerns: once you have a TensorFlow environment for prototyping models, all you need to do to use AutoDist is install one package:

pip install autodist 

Next, you make AutoDist aware of your resource specification. 

from autodist import AutoDist 
ad = AutoDist(resource_spec_file="resource_spec.yml") 
################ resource_spec.yml ##################### 
nodes: 
#  - address: 192.168.1.1 
#    gpus: [0,1,2,3] 
######################################################## 

This allows you to keep the single-device model building script under graph mode while training it in a distributed way. (If you are curious about eager mode, we may have the answers here at the FAQ page.) 

import tensorflow as tf 
with tf.Graph().as_default(), ad.scope(): 
    ################################################### 
    Build the original single-device model. 
    # Train it distributedly. 
    ################################################### 
    sess = ad.create_distributed_session() 
    sess.run(...) 

For more detailed instructions on how to run a simple example using AutoDistplease see the Getting Started doc.  

Composable Strategies 

As deep learning models become more structurally complex, existing distributed Machine Learning (ML) systems often have limitations on providing better all-round performance for a wide variety of models, since some of them are specialized for one monolithic architecture or technology e.g. MPI. 

AutoDist allows for different ML models (or different components in a complex model) to exhibit different runtime characteristics, and different learning algorithms that demonstrate distinct computational patterns. This demands model-and-algorithm-aware systems or parallelization treatments for distributed execution. AutoDist develops a unified strategy representation space to adapt different computational units or states to different synchronization configurations. 

diagram of unified strategy representation space and code examples

Such a representation is composed of multiple atomic deep learning parallelization techniques and factors contributing to performance. A complete representation strategy directs how the model should be parallelized on the target cluster — specifically how a single-device model is transformed into a distributed model. This approach isolates strategy prototyping from low-level distributed systems, allows composition, assessment and execution of complex synchronization strategies via a unified interface, and is extensible to emerging synchronization techniques.

If you are interested in learning more about this, please refer to our developer reference including the sections on Strategy Representation or Strategy Customization. 

Automatic Distribution 

The introduced representation spans a combinatorial space enclosing all possible strategies. AutoDist already provides some built-in strategy builders to choose from as a constructor argument to instruct the automatic graph transformation and optimization.

AutoDist(resource_spec_file="resource_spec.yml", strategy_builder=?) 

More importantly, AutoDist generates an autostrategy with an auto-optimization pipeline to efficiently propose the strategy adaptive to a certain model and resources. Our team at Petuum designed AutoDist to not only improve parallel performance, but for greater convenience for users by eliminating the need to manually select a strategyThe experimental autostrategy optimizer is built on top of data-driven ML models trained on low-shot trial-run data and will improve as more data is acquired.   

Looking ahead 

We, as ML technologistsconstantly face the challenges of training and tweaking models of ever-increasing complexity and data scaleInstead of further evolving expert-designed optimizers for better performance, we chose a different path. Rather than defaulting to the common develop-a-newmodeloptimizer technique, we decided to take another principled approachHere at Petuum we set out to build a system to optimize/compile/transform a computational graph together with cluster resources combined in a distributed setting. Note, as of now, the system only supports declarative frameworks like TensorFlow, but in the near future it can be extended to eager frameworks with JIT support.

This article is a brief introduction into one of the exciting projects (AutoDist, which has recently been open-sourced) the Scalable ML team at Petuum is working on. Winvite you to join us on our journey to explore novel approaches to push the limits of distributed machine learning and welcome any feedback and ideas you might have.  

To learn more about AutoDist, please review our documents covering RationalePerformance or Architecture, etc.

Thanks for reading! We at Petuum hope you have as much fun using AutoDist as we have developing it! 

Related

DL-ML-GPU-Featured

Distributed Deep Learning (DL) and Machine Learning (ML) Meets Graphical Processing Unit (GPU)

June 26, 2019

Intro to Distributed Deep Learning Systems

Intro to Distributed Deep Learning Systems

February 6, 2018