Abstract
Data analysts predict that the GPU as a service (GPUaaS) market will grow to support 3D models, animated video processing, gaming, and deep learning model training. The main cloud providers already offer in their catalogs VMs with different type and number of GPUs. Because of the significant difference in terms of performance and cost of this type of VMs, correctly selecting the most appropriate one to execute the required job is mandatory to minimize the training cost. Motivated by these considerations, this paper proposes performance models to predict GPU-deployed neural networks (NNs) training. The proposed approach is based on machine learning and exploits two main sets of features, thus capturing both NNs properties and hardware characteristics. Such data enable the learning of multiple linear regression models that, coupled with an established feature selection technique, become accurate prediction tools, with errors below 12% on average. An extensive experimental campaign, performed both on public and in-house private cloud deployments, considers popular deep NNs used for image classification and speech transcription. The results show that prediction errors remain small even when extrapolating outside the range spanned by the input data, with important implications for the models’ applicability.
Authors
- Marco Lattuada, Politecnico di Milano, Italy
- Eugenio Gianniti, Politecnico di Milano, Italy
- Danilo Ardagna, Politecnico di Milano, Italy
- Li Zhang, Amazon, USA
Keywords
GPUaaS, Deep learning, Machine learning-based performance prediction