All Events

  • Published on
    Speaker: Dr. Jinliang Wei, Google
    Preview of OpenXLA: Compiling Machine Learning for Peak Performance
    Numerous domain-specific accelerators have been developed recently to address the growing computational needs of machine learning, and the success of these DSAs hinges on effective ML compilers like Google's XLA, which enhances ML performance on various hardware and supports multiple frameworks, and is further advanced through collaborative development in OpenXLA.
  • Published on
    Speaker: Dr. Jason Ansel, Meta AI
    Preview of PyTorch 2: Faster Machine Learning Through Dynamic Python Bytecode Transformation and Graph
    PyTorch 2 leverages new technologies like TorchDynamo and TorchInductor to significantly enhance training and inference speeds without compromising its ease of use, flexibility, and Pythonic environment. TorchDynamo optimizes unmodified PyTorch code at the Python bytecode level, while TorchInductor translates programs for efficient execution on GPUs and CPUs, maintaining the dynamism inherent in PyTorch and allowing for easy user customization.
  • Published on
    Speaker: Dr. Esha Choukse, Microsoft
    Preview of Rapid LLM deployments: with great power comes great responsibility
    With the ubiquitous use-cases of modern LLMs, the deployment scale of these models is unforeseen. This has led to a large-scale datacenter expansion with GPUs, currently running into an energy wall worldwide. This talk will focus on the properties of generative LLMs that can be used to make the deployment of these models more power-efficient. The talk will also introduce POLCA and Splitwise, two techniques to reduce the power consumption for the LLM serving.