Improving Multi-Instance GPU Efficiency via Sub-Entry Sharing TLB Design

AI Infrastructure literature

GPU

NVIDIA

Datacenter

TLB

Performance

Improving Multi-Instance GPU Efficiency via Sub-Entry Sharing TLB Design

Abstract

NVIDIA’s Multi-Instance GPU (MIG) technology allows for the partitioning of GPU computing power into separate hardware instances, each with complete resource isolation. Despite the benefits, the shared last-level TLB (L3 TLB) can cause performance issues in multi-tenant environments. The newly proposed STAR method dynamically adjusts TLB entry sharing to optimize address translation and minimize interference, improving performance by an average of 30.2% across various workloads.

Key Highlights

TLB Sharing Interference: Shared L3 TLB can degrade performance for co-running applications.
STAR Method: Introduces dynamic sharing of TLB entries to optimize address translation and improve sub-entry utilization.
Performance Improvement: Demonstrates an average performance increase of 30.2% in multi-tenant workloads.

Importance

This study addresses critical efficiency issues in modern datacenters using GPU virtualization. By improving TLB utilization, the STAR method enhances overall performance, highlighting significant implications for cloud computing and multi-tenant environments.

Personalized AI news from scientific papers.