Schedule

Module 1 - Introduction

Aug 25

Introduction to the course: Post lecture activity: Make an account on the Perlmutter supercomputer

Aug 27

Introduction to transformers: Optional reading: (1) Attention is all you need (2) The Illustrated Transformer

Module 2 - Systems implications of the Transformer architecture

Sep 1

No Class (Labor Day)

Sep 3

Memory Use in Transformer-based LLMs: Optional reading: Reducing Activation Recomputation in Large Transformer Models, FlashAttention

Module 3 - Hardware Infrastructure for Machine Learning

Sep 8

Multi-GPU servers and interconnects: GPU architecture, NVLinks, NVSwitches

Sep 10

ML-centric Datacenters: Datacenter clos, TPU torus and rail-optimized datacenters for ML

Sep 15

Training an LLM (hands-on activity): Bring your laptops to class

Sep 17

Communication infrastructure: RDMA, IB

Module 4 - Distributed Training with Data Parallelism

Sep 22: Class cancelled
Sep 24: Introduction to distributed training
Sep 29: Data Parallelism with ZeRO
Oct 06: Data Parallelism with ZeRO-3 or FSDP
Oct 08: Hands-on session on supervised finetuning
Oct 13: Fall Break
Oct 15: Introduction to CUDA programming

Module 5 - Distributed Training with Tensor/Pipeline/Sequence/Expert Parallelisms

Oct 20: Tensor parallelism
Oct 22: Introduction to Machine Learning Compilers
Oct 27: Sequence and Pipeline parallelism
Oct 29: Expert Parallelism and MoE
Nov 03: Collective Communication
Nov 05: Hybrid Parallelism Conclusion

Module 6 - Advanced Topics

Nov 10: Parameter Effecient Fine-Tuning
Nov 12
Nov 17: LLM Agents
Nov 19
Nov 24

Module 7 - Group Mid-point Presentations

Dec 1
Dec 03
Dec 08