There is substantial demand in computational requirements for training neural networks in recent years. With the use of distributed training environments in which a neural network is split across multiple GPU and CPU devices. The challenge for distributed training is to find an optimized way to place multiple heterogeneous devices to achieve the fasts possible training speed.
This blog is intended to understand the basic concepts of device placement and different areas of research in this area.
What is device placement?
Let’s say we have 10 GPUs and have a graph with different TensorFlow operations like convolution, max pooling operations, and we need to optimally place these operations to 10 GPUs. Humans can do this task but determining an optimal device placement is very challenging. So with the help of machine learning, we can optimally design an algorithm for the execution of these graph operations.
Methods used in the device placement:
The device placement algorithm is constructed with different deep learning methods like RNN, LSTM but the most popular among them is using the Reinforcement Learning (RL) approach. With the RL method, it reduces the generalisability necessitates re-training from scratch every time a new graph is to be placed.
The figure below shows the initial runtime for the given environment is calculated and then it is updated with the placement algorithm to reduce the minimum execution of time. The main goal of RL here is to use the reward strategy to reduce the overall execution time of the graph and optimally place the device operation.
The algorithm is designed to execute an iterative placement improvement policy on the graph. The learning procedure includes the improvisation in Markov Design Process (MDP) in graph embedding and the neural network architecture for encoding the placement policy.
As stated in the above figure, the state input to the agent is represented as DAG (Directed Acyclic Graph) with features attached to each node. The agent uses a graph neural network to parse the input and uses the policy network to output probability distribution over devices for the current node. The incremental reward is the difference between the runtimes of consecutive placement plans.
The more in-depth about experimentation, database and it’s results for the baseline approach can be found in below research papers.