All Events

  • Published on
    Speaker: Prof. Hongyang Zhang, University of Waterloo
    Preview of The EAGLE Series: Lossless Inference Acceleration for LLMs
    This talk presents the EAGLE series, a groundbreaking approach to accelerating large language model inference without compromising output quality. Instead of traditional token-level processing, EAGLE operates at the structured feature level and incorporates sampling results to reduce uncertainty. The technology has gained significant industry adoption, with integration into major frameworks including vLLM, SGLang, TensorRT-LLM, and several others from AWS and Intel.
  • Published on
    Speaker: Dr. Zhengzhong (Hector) Liu, MBZUAI
    Preview of LLM360: From 360° Open Source to 360° Collaboration in AI
    The LLM360 project advances AI through open-source foundation models and datasets. This talk explores key initiatives including K2, the most capable fully open-source language model, and TxT360, examining the true meaning of open source while proposing new approaches to academic and industry collaboration in open-source AI.
  • Published on
    Speaker: Prof. Tianqi Chen, CMU
    Preview of Enable Large Language Model Deployment Across Cloud and Edge with ML Compilation
    In this talk, we will discuss the lessons learned in building an efficient large language model deployment system for both server and edge settings. We will cover general techniques in machine learning compilation and system support for efficient structure generation. We will also discuss the future opportunities in system co-design for cloud-edge model deployments.