The GPU Scheduler tool determines the best scheduling and GPU allocation for Deep Learning training jobs, reducing energy and execution costs (in both private or public clouds) while meeting deadline constraints, starting from the list of submitted jobs (provided as Docker containers) with their characteristics (expected execution times, priorities and deadlines), and from a description of the available resources in the system.
Open source / proprietary
The GPU Scheduler tool is Open Source. Source code is available on the AI-SPRINT GitLab repository (https://gitlab.polimi.it/ai-sprint/GPUScheduler) licensed under Apache Licence, Version 2.0.
The GPU Scheduler architecture includes two main components: the first one (denoted in the figure as Jobs manager) is responsible for the submission process, i.e., it collects all data related to the list of jobs with their characteristics and to the description of the available resources in the system. Such data are transmitted to the second component, denoted as Jobs optimiser, responsible for determining the optimal schedule for the submitted jobs, according to different heuristic algorithms based on Randomised Greedy and Path Relinking strategies. The performance models are used to extrapolate the jobs profiling data stored in the Jobs Data database.