29503

Posts

Nov, 3

LLload: An Easy-to-Use HPC Utilization Tool

The increasing use and cost of high performance computing (HPC) requires new easy-to-use tools to enable HPC users and HPC systems engineers to transparently understand the utilization of resources. The MIT Lincoln Laboratory Supercomputing Center (LLSC) has developed a simple command, LLload, to monitor and characterize HPC workloads. LLload plays an important role in identifying […]
Nov, 3

Scheduling Languages: A Past, Present, and Future Taxonomy

Scheduling languages express to a compiler a sequence of optimizations to apply. Compilers that support a scheduling language interface allow exploration of compiler optimizations, i.e., exploratory compilers. While scheduling languages have become a common feature of tools for expert users, the proliferation of these languages without unifying common features may be confusing to users. Moreover, […]
Nov, 3

Data-Driven Dynamic Autotuning: Optimizing Autotuning Overhead with Prior Tuning Data

Modern high performance computing applications often rely on heterogeneous hardware resources to achieve maximum performance. This approach presents obvious benefits, combining the processing power of multiple different processors and allowing them to be more specialized. However, since HPC applications typically need to be programmed in a hardware-aware manner to achieve maximum performance, this places more […]
Nov, 3

Is the GPU Half-Empty or Half-Full? Practical Scheduling Techniques for LLMs

Serving systems for Large Language Models (LLMs) improve throughput by processing several requests concurrently. However, multiplexing hardware resources between concurrent requests involves non-trivial scheduling decisions. Practical serving systems typically implement these decisions at two levels: First, a load balancer routes requests to different servers which each hold a replica of the LLM. Then, on each […]
Nov, 3

MambaCPU: Enhanced Correlation Mining with State Space Models for CPU Performance Prediction

Forecasting CPU performance, which involves estimating performance scores based on hardware characteristics during operation, is crucial for computational system design and resource management. This research field currently faces two primary challenges. First, the diversity of CPU products and the specialized nature of hardware characteristics make real-world data collection difficult. Second, existing approaches, whether reliant on […]
Oct, 27

Jailbreaking LLM-Controlled Robots

The recent introduction of large language models (LLMs) has revolutionized the field of robotics by enabling contextual reasoning and intuitive human-robot interaction in domains as varied as manipulation, locomotion, and self-driving vehicles. When viewed as a stand-alone technology, LLMs are known to be vulnerable to jailbreaking attacks, wherein malicious prompters elicit harmful text by bypassing […]
Oct, 27

Mixed-precision finite element kernels and assembly: Rounding error analysis and hardware acceleration

In this paper we develop the first fine-grained rounding error analysis of finite element (FE) cell kernels and assembly. The theory includes mixed-precision implementations and accounts for hardware-acceleration via matrix multiplication units, thus providing theoretical guidance for designing reduced- and mixed-precision FE algorithms on CPUs and GPUs. Guided by this analysis, we introduce hardware-accelerated mixed-precision […]
Oct, 27

Fastrack: Fast IO for Secure ML using GPU TEEs

As cloud-based ML expands, ensuring data security during training and inference is critical. GPU-based Trusted Execution Environments (TEEs) offer secure, high-performance solutions, with CPU TEEs managing data movement and GPU TEEs handling authentication and computation. However, CPU-to-GPU communication overheads significantly hinder performance, as data must be encrypted, authenticated, decrypted, and verified, increasing costs by 12.69 […]
Oct, 27

General-Purpose Computing on Tensor Processors

Modern computer systems have become heterogeneous and consist of many emerging kinds of hardware accelerators as Dennard Scaling discontinues. Also, such domain-specific hardware accelerators fulfill the rapidly growing computing demands for applications including artificial intelligence (AI) and machine learning (ML). Beyond conventional computer components such as central processing units (CPUs) and memory, modern computers typically […]
Oct, 27

Using modern C++ to improve CUDA programs

The classic style of writing and porting HPC applications to the GPU uses pointers to buffers or data-structures as kernel parameters. This style discards type information, leading to “flattening” of CPU-side data-structures before using them as kernel parameters, followed by a need to reconstruct them in GPU code to retain flexibility. In this thesis, we […]
Oct, 20

Online Energy Optimization in GPUs: A Multi-Armed Bandit Approach

Energy consumption has become a critical design metric and a limiting factor in the development of future computing architectures, from small wearable devices to large-scale leadership computing facilities. The predominant methods in energy management optimization are focused on CPUs. However, GPUs are increasingly significant and account for the majority of energy consumption in heterogeneous high […]
Oct, 20

Superpipeline: A Universal Approach for Reducing GPU Memory Usage in Large Models

The rapid growth in machine learning models, especially in natural language processing and computer vision, has led to challenges when running these models on hardware with limited resources. This paper introduces Superpipeline, a new framework designed to optimize the execution of large AI models on constrained hardware during both training and inference. Our approach involves […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: