The new paper "Performance prediction of deep learning applications training in GPU as a service systems" has been published by AI-SPRINT partners. The paper is built on the premise that the GPU as a service (GPUaaS) market will grow to support 3D models, animated video processing, gaming, and deep learning model training in the upcoming years. Catalogues VMs are already offered by the main cloud providers, but they vary in terms of type and GPUs. This scenario drives a significant difference in terms of performance and cost of this type of VMs and drives the need to correctly select the most appropriate one to execute the required job to minimize the training cost.
- Proposes performance models to predict GPU-deployed neural networks (NNs) training, an approach based on machine learning that exploits two main sets of features, thus capturing both NNs properties and hardware characteristics.
- Enables the learning of multiple linear regression models that, coupled with an established feature selection technique, become accurate prediction tools, with errors below 12% on average.
- Shows that prediction errors remain small even when extrapolating outside the range spanned by the input data, with important implications for the models’ applicability.
- Marco Lattuada, Politecnico di Milano, Italy
- Eugenio Gianniti, Politecnico di Milano, Italy
- Danilo Ardagna, Politecnico di Milano, Italy
- Li Zhang, Amazon, USA
GPUaaS, Deep learning, Machine learning-based performance prediction