Cornell CS5470: Systems for Large-scale ML

This course explores the systems challenges of training and serving large-scale ML models like GPT, LLaMA, and DeepSeek. You will learn how to design and operate distributed training and inference on multi-accelerator hardware, with attention to performance, memory, communication, and fault tolerance. The emphasis is on both theory and practice so we will combine with hands-on programming sessions, assignments and projects. By the end, you will have practical experience tackling the core bottlenecks of modern ML systems.

Acknowledgement: this course is supported by a NERSC Education Allocation Award.

Access to GPU Compute resources

If you are enrolled in the class or you are on the waitlist, create a NERSC account by following the instructions here. Each created account undergoes vetting so please do this as soon as you can.

Course Policies

You are allowed to use generative AI tools of your choice for graded components of the class. The submission may ask for which tools you used and the corresponding prompts.

Academic Honesty Policies

You are not allowed to share any code and text (including reports, summaries and prompts) that you use to complete projects and assignments.

Grading Policy:

Class Participation (10%)
Programming Assignments (40%)
Course Project (45%)
End of semester survey (5%)

Programming Assigments

This course has 3 programming assignments:

Assignment 1: vLLM benchmarking and profiling
Assignment 2: prompt scheduling for improved performance
Assignment 3: custom CUDA Attention kernel

For each assignment and hands-on session, we will allocate GPU hours on Perlmutter to every student. You should use these GPU hours carefully to not run out before you complete the activity.

Course Projects

Students will do a course project in a group of four. We will design a speciic project for each group. Once you are assigned the project, you will scope the project out (one week, 10% of the project score). There will be midterm progress presentations (30% of the project score). At the end of the finals period, you will submit your code (30% of the project score) and project report (30% of the project report).