Rapid LLM deployments: with great power comes great responsibility

Our MLSys seminars kick off with the first talk by Esha Choukse from MSR Azure Research. It is going to happen on 4/29 at 3:30pm in CSE1242. We will perhaps opt to a hybrid seminar with Zoom attendance accommodated.

Talk title: Rapid LLM deployments: with great power comes great responsibility

Talk Abstract: With the ubiquitous use-cases of modern LLMs, the deployment scale of these models is unforeseen. This has led to a large-scale datacenter expansion with GPUs, currently running into an energy wall worldwide. This talk will focus on the properties of generative LLMs that can be used to make the deployment of these models more power-efficient. The talk will also introduce POLCA and Splitwise, two techniques to reduce the power consumption for the LLM serving.

Bio: Esha is a Senior Researcher in the Azure Research - Systems group at Microsoft. She has been focused on LLM training and inference deployment efficiency across the layers of serving platforms, hardware, and data center optimizations. Her research background spans the systems and computer architecture areas, with most of work falling in the domain of resource efficiency and sustainability in cloud applications. She has published at various top systems conferences like ISCA, MICRO, ASPLOS, NSDI, and others. More information at https://www.microsoft.com/en-us/research/people/eschouks/.