IDLEWiSE. A Project Concept for AI-Assisted Energy Efficiency in HPC Clusters
Datum
Herausgeber:innen
Autor:innen
Zeitschriftentitel
ISSN der Zeitschrift
Bandtitel
Verlag
Zusammenfassung
The growing energy demand for high-performance computing (HPC) systems raises severe concerns about their environmental impact. Novel system paradigms and computational schemes are needed to limit energy consumption while ensuring the efficiency and availability of computing resources. In this contribution, we introduce a concept for an Intelligent Decision Tool for Lowering Energy Waste in System Efficiency (IDLEWiSE), which aims to decrease the energy consumption of HPC clusters operating below total capacity by selectively shutting down idle computational units. This paper outlines an optimization tool using efficient machine-learning algorithms like decision trees to learn optimal shutdown policies online. We further locate our approach in the context of existing energy-economizing instruments and perform a strategic analysis and stepwise validation of the proposed concept. The study also includes qualitative anonymized findings from a survey of German scientific HPC cluster administrators, corroborating the urgent need for energy-efficient tools and practices for practitioners.