Data centers consume massive amounts of energy. The vision of a fully digitalized society can only be realized by creating sustainable digital infrastructures operating energy-efficiently based on precise utilization of resources. We develop an adaptive AI approach for solving complex combinatorial optimization problems at scale.
Dynamic resource orchestration is key to sustainable digitalization. Today, data centers hosting cloud services and applications consume massive amounts of energy while serving billions of connected devices - in the year 2025, the number of devices is estimated to increase well beyond 75 billion. Common practice is to overprovision resources to guarantee adherence to service level agreements. Overprovisioning consequently leads to underutilization of infrastructure resources, which at large scale quickly accumulates to a significant amount of wasted energy. Evidently the vision of a fully digitalized society can only be realized by creating sustainable digital infrastructures operating energy-efficiently based on precise utilization of resources. Besides, resource orchestration is also a key factor in realizing the (promised) large gains in shared cloud architectures.
Resource orchestration functions have to be fast and precise. Ongoing research include development of intelligent and unifying resource orchestration mechanisms for cloud, edge and fog infrastructures. The general purpose is to extend computing capabilities and resource availability to support digitalization at scale and deal with the issue of overprovisioning. To succeed, resource orchestration functions must provide fast solutions to support control loops operating at short time scales, and be capable of identifying configurations that jointly meet compute resource demands and network performance in the best way suited for applications with different profiles of workload intensities and data transfer requirements. Hence, the ability to efficiently process complex combinatorial problems involving performance requirements (e.g., latency, bandwidth and service reliability), resource demands (e.g., compute, memory and storage) along with other parameters related to a specific cloud computing task, application or service is fundamental.
Existing approaches are slow, suboptimal and one-sided. In general, resource allocation problems are largely formulated as (more or less) complex mixed-integer constraint optimization problems solved by heuristics. The primary drawback of heuristics encompasses the prominent issue of unpredictable behavior - there are basically no mathematical guarantees of finding a reasonably good solution within a bounded time frame. Alternative approaches based on greedy algorithms enable fast online resource allocation, but provide suboptimal solutions with insufficient means to adjust the overall efficiency of the deployment strategy. Because of the general complexity of constraint optimization problems, resource allocation approaches are typically designed to solve problems limited to one aspect of resource orchestration (e.g., either workload balancing or routing). Altogether, these issues severely reduces the practical applicability of existing approaches in the context of dynamic resource orchestration at scale.
Pre-training enables optimal solutions fast. In the long term, continuous underutilization of digital infrastructures at massive (and growing) scale, will be done at the expense of unsustainable overconsumption of energy, which inevitably will impact negatively on the global environment. The ability to dynamically scale applications by instantiating virtual service components (i.e. virtual machines and virtual network functions) and define effective communication paths on top of shared physical infrastructure is a key technique to sustainable digitalization, but is hard to do with existing approaches.
Our solution and project objective. We propose an adaptive AI-based approach for pre-training the solution space of complex constraint optimization problems, which enables resource allocation algorithms to effectively retrieve resource allocation plans within bounded time frames. Continuous learning will lead to even better solutions over time, thereby increasing the overall resource utilization and energy-efficiency substantially.
Coordinator, participant, project manager