Performance_prediction_of_deep_learning_applications_training_in_GPU_as_a_service_systems

Journal paper

Abstract

Data analysts predict that the GPU as a service (GPUaaS) market will grow to support 3D models, animated video processing, gaming, and deep learning model training. The main cloud providers already offer in their catalogs VMs with different type and number of GPUs. Because of the significant difference in terms of performance and cost of this type of VMs, correctly selecting the most appropriate one to execute the required job is mandatory to minimize the training cost. Motivated by these considerations, this paper proposes performance models to predict GPU-deployed neural networks (NNs) training. The proposed approach is based on machine learning and exploits two main sets of features, thus capturing both NNs properties and hardware characteristics. Such data enable the learning of multiple linear regression models that, coupled with an established feature selection technique, become accurate prediction tools, with errors below 12% on average. An extensive experimental campaign, performed both on public and in-house private cloud deployments, considers popular deep NNs used for image classification and speech transcription. The results show that prediction errors remain small even when extrapolating outside the range spanned by the input data, with important implications for the models’ applicability.

Authors

Marco Lattuada, Politecnico di Milano, Italy
Eugenio Gianniti, Politecnico di Milano, Italy
Danilo Ardagna, Politecnico di Milano, Italy
Li Zhang, Amazon, USA

Keywords

GPUaaS, Deep learning, Machine learning-based performance prediction

follow us

Performance prediction of deep learning applications training in GPU as a service systems

Abstract

Authors

Keywords

Read the paper