This three-day intensive workshop represents a collaborative effort between Texas A&M High Performance Research Computing (HPRC) and the Texas A&M Institute of Data Science (TAMIDS) to advance GPU programming skills for massively parallel computing applications. Participants will gain hands-on experience with CUDA programming fundamentals, MPI parallel programming patterns, and the integration of both technologies for multi-node, multi-GPU applications using Texas A&M’s world-class supercomputing infrastructure.
The rapid evolution of high-performance computing has made GPU programming an essential skill for researchers and engineers working with computationally intensive applications. This workshop addresses the growing demand for expertise in massively parallel programming by combining CUDA for GPU acceleration with Message Passing Interface (MPI) for distributed computing across multiple nodes.
The workshop will utilize Texas A&M’s state-of-the-art computing clusters. This infrastructure provides participants with unprecedented access to cutting-edge hardware for learning advanced parallel programming techniques. The hands-on approach ensures that attendees will develop practical skills applicable to real-world computational challenges in fields ranging from materials science and quantum computing to climate modeling and artificial intelligence.
Participants must have solid programming experience in C or C++, including familiarity with pointers, arrays, and function implementations. A basic understanding of parallel computing concepts is helpful but not required, as fundamental principles will be covered during the workshop. Previous experience with Linux command-line environments and batch job submission systems is recommended, as all practical exercises will be conducted on Texas A&M’s supercomputing clusters.
All participants must bring a laptop capable of SSH connectivity to Texas A&M’s computing resources. Workshop organizers will provide temporary accounts on the HPRC systems, including access to the HPRC cluster with their extensive GPU resources. Participants should ensure their laptops have SSH client software installed and are capable of secure file transfer protocols for code development and data analysis.
Participants will master the essential concepts of GPU programming using CUDA, building from basic kernel development to advanced memory management techniques. The curriculum covers GPU architecture, threading models, memory hierarchy optimization, and performance analysis tools that are crucial for effective GPU programming. Hands-on exercises will provide immediate application of theoretical concepts using Texas A&M’s GPU clusters, ensuring participants develop both understanding and practical experience.
The second day focuses on distributed computing using MPI, covering fundamental communication patterns, collective operations, and scalable algorithm design. Participants will learn to decompose computational problems across multiple processors and implement efficient communication strategies for large-scale parallel applications. The training emphasizes modern MPI features and best practices for achieving optimal performance on contemporary supercomputing architectures.
The final day synthesizes knowledge from the previous sessions, teaching participants to develop applications that leverage both GPU acceleration and distributed computing. This includes managing GPU resources across multiple nodes, optimizing data movement between CPUs and GPUs in distributed environments, and implementing scalable multi-GPU algorithms. Real-world application examples will demonstrate the practical implementation of these advanced programming paradigms.
Dr. Jian Tao, TAMIDS Digital Twin Lab – jtao@tamu.edu
Dr. Honggao Liu, High Performance Research Computing Center – honggao@tamu.edu