Mlsys-seminar

All Posts

mlsys-seminar (6)
news (3)

Published on
March 6, 2025
The EAGLE Series: Lossless Inference Acceleration for LLMs
MLSys-Seminar
Speaker: Prof. Hongyang Zhang, University of Waterloo
This talk presents the EAGLE series, a groundbreaking approach to accelerating large language model inference without compromising output quality. Instead of traditional token-level processing, EAGLE operates at the structured feature level and incorporates sampling results to reduce uncertainty. The technology has gained significant industry adoption, with integration into major frameworks including vLLM, SGLang, TensorRT-LLM, and several others from AWS and Intel.
Published on
February 20, 2025
LLM360: From 360° Open Source to 360° Collaboration in AI
MLSys-Seminar
Speaker: Dr. Zhengzhong (Hector) Liu, MBZUAI
The LLM360 project advances AI through open-source foundation models and datasets. This talk explores key initiatives including K2, the most capable fully open-source language model, and TxT360, examining the true meaning of open source while proposing new approaches to academic and industry collaboration in open-source AI.
Published on
February 6, 2025
Enable Large Language Model Deployment Across Cloud and Edge with ML Compilation
MLSys-Seminar
Speaker: Prof. Tianqi Chen, CMU
In this talk, we will discuss the lessons learned in building an efficient large language model deployment system for both server and edge settings. We will cover general techniques in machine learning compilation and system support for efficient structure generation. We will also discuss the future opportunities in system co-design for cloud-edge model deployments.
Published on
May 9, 2024
OpenXLA: Compiling Machine Learning for Peak Performance
MLSys-Seminar
Speaker: Dr. Jinliang Wei, Google
Numerous domain-specific accelerators have been developed recently to address the growing computational needs of machine learning, and the success of these DSAs hinges on effective ML compilers like Google's XLA, which enhances ML performance on various hardware and supports multiple frameworks, and is further advanced through collaborative development in OpenXLA.
Published on
April 30, 2024
PyTorch 2: Faster Machine Learning Through Dynamic Python Bytecode Transformation and Graph
MLSys-Seminar
Speaker: Dr. Jason Ansel, Meta AI
PyTorch 2 leverages new technologies like TorchDynamo and TorchInductor to significantly enhance training and inference speeds without compromising its ease of use, flexibility, and Pythonic environment. TorchDynamo optimizes unmodified PyTorch code at the Python bytecode level, while TorchInductor translates programs for efficient execution on GPUs and CPUs, maintaining the dynamism inherent in PyTorch and allowing for easy user customization.
Published on
April 29, 2024
Rapid LLM deployments: with great power comes great responsibility
MLSys-Seminar
Speaker: Dr. Esha Choukse, Microsoft
With the ubiquitous use-cases of modern LLMs, the deployment scale of these models is unforeseen. This has led to a large-scale datacenter expansion with GPUs, currently running into an energy wall worldwide. This talk will focus on the properties of generative LLMs that can be used to make the deployment of these models more power-efficient. The talk will also introduce POLCA and Splitwise, two techniques to reduce the power consumption for the LLM serving.

Mlsys-seminar

All Posts

mlsys-seminar (6)

The EAGLE Series: Lossless Inference Acceleration for LLMs

LLM360: From 360° Open Source to 360° Collaboration in AI

Enable Large Language Model Deployment Across Cloud and Edge with ML Compilation

OpenXLA: Compiling Machine Learning for Peak Performance

PyTorch 2: Faster Machine Learning Through Dynamic Python Bytecode Transformation and Graph

Rapid LLM deployments: with great power comes great responsibility