Published onNovember 15, 2024DistFlashAttn: Distributed Memory-efficient Attention for Long-context LLMs TrainingCOLM2024Dacheng Li*, Rulin Shao*, Anze Xie, Eric P Xing, Joseph E Gonzalez, Ion Stoica, Xuezhe Ma, Hao Zhang
Published onOctober 4, 2024Learning to Maximize Mutual Information for Chain-of-Thought DistillationACL2024Xin Chen, Hanxian Huang, Yanjun Gao, Yi Wang, Jishen Zhao, Ke Ding
Published onSeptember 16, 2024Multi-modal Learning for WebAssembly Reverse EngineeringISSTA2024Hanxian Huang, Jishen Zhao
Published onAugust 30, 2024WikiDT: Visual-based Table Recognition and Question Answering DatasetICDAR2024Hui Shi, Yusheng Xie, Luis Goncalves, Sicun Gao, Jishen Zhao
Published onJuly 25, 2024Optimizing Speculative Decoding for Serving Large Language Models Using GoodputPREPRINTXiaoxuan Liu, Cade Daniel, Lanxiang Hu, Woosuk Kwon, Zhuohan Li, Xiangxi Mo, Alvin Cheung, Zhijie Deng, Ion Stoica, Hao Zhang
Published onJuly 21, 2024InferCept: Efficient Intercept Support for Augmented Large-Language Model InferencingICML2024Reyna Abhyankar, Zijian He, Vikranth Srivatsa, Hao Zhang, Yiying Zhang
Published onJuly 15, 2024Chatbot Arena: An Open Platform for Evaluating LLMs by Human PreferenceICML2024Wei-Lin Chiang*, Lianmin Zheng*, Ying Sheng, Anastasios Nikolas Angelopoulos, Tianle Li, Dacheng Li, Banghua Zhu, Hao Zhang, Michael Jordan, Joseph E. Gonzalez, Ion Stoica
Published onJuly 15, 2024CLLMs: Consistency Large Language ModelsICML2024Siqi Kou*, Lanxiang Hu*, Zhezhi He, Zhijie Deng, Hao Zhang
Published onJuly 15, 2024Break the Sequential Dependency of LLM Inference using Lookahead DecodingICML2024Yichao Fu, Peter Bailis, Ion Stoica, Hao Zhang
Published onJuly 15, 2024Online Speculative DecodingICML2024Xiaoxuan Liu, Lanxiang Hu, Peter Bailis, Ion Stoica, Zhijie Deng, Alvin Cheung, Hao Zhang
Published onJune 19, 2024AdaMoE: Token-Adaptive Routing with Null Experts for Mixture-of-Experts Language ModelsPREPRINTZihao Zeng, Yibo Miao, Hongcheng Gao, Hao Zhang, Zhijie Deng
Published onJune 9, 2024Sibyl: Forecasting Time-Evolving Query WorkloadsSIGMOD2024Hanxian Huang, Tarique Siddiqui, Rana Alotaibi, Carlo Curino, Jyoti Leeka, Alekh Jindal, Jishen Zhao, Jesús Camacho-Rodríguez, Yuanyuan Tian
Published onJune 4, 2024Fasor: A Fast Tensor Program Optimization Framework for Efficient DNN DeploymentICS2024Hanxian Huang, Xin Chen, Jishen Zhao
Published onMay 31, 2024Towards LLM-Powered Verilog RTL Assistant: Self-Verification and Self-CorrectionPREPRINTHanxian Huang, Zhenghan Lin, Zixuan Wang, Xin Chen, Ke Ding, Jishen Zhao
Published onMay 17, 2024Preble: Efficient Distributed Prompt Scheduling for LLM ServingPREPRINTVikranth Srivatsa, Zijian He, Reyna Abhyankar, Dongming Li, Yiying Zhang
Published onMay 13, 2024Safety-Critical Scenario Generation Via Reinforcement Learning Based EditingICRA2024Haolan Liu, Liangjun Zhang, Siva Hari, Jishen Zhao
Published onMay 10, 2024LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation DatasetICLR2024Lianmin Zheng*, Wei-Lin Chiang*, Ying Sheng, Tianle Li, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zhuohan Li, Zi Lin, Eric Xing, Joseph E Gonzalez, Ion Stoica, Hao Zhang
Published onApril 15, 2024In-Storage Domain-Specific Acceleration for Serverless ComputingASPLOS2024Rohan Mahapatra, Soroush Ghodrati, Byung Hoon Ahn, Sean Kinzer, Shu-ting Wang, Hanyang Xu, Lavanya Karthikeyan, Hardik Sharma, Amir Yazdanbakhsh, Mohammad Alian, Hadi Esmaeilzadeh
Published onApril 15, 2024Restoring the Broken Covenant Between Compilers and Deep Learning AcceleratorspreprintSean Kinzer, Soroush Ghodrati, Rohan Mahapatra, Byung Hoon Ahn, Edwin Mascarenhas, Xiaolong Li, Janarbek Matai, Liang Zhang, Hadi Esmaeilzadeh
Published onApril 15, 2024Tandem processor: Grappling with Emerging Operators in Neural NetworksASPLOS2024Soroush Ghodrati, Sean Kinzer, Hanyang Xu, Rohan Mahapatra, Yoonsung, Byung Hoon Ahn, Dong Kai Wang, Lavanya Karthikeyan, Amir Yazdanbakhsh, Jongse Park, Nam Sung Kim, Hadi Esmaeilzadeh
Published onApril 3, 2024Toward Inference-optimal Mixture-of-Expert Large Language ModelsPREPRINTLongfei Yun*, Yonghao Zhuang*, Yao Fu, Eric P Xing, Hao Zhang
Published onMarch 1, 2024DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model ServingOSDI2024Yinmin Zhong, Shengyu Liu, Junda Chen, Jianbo Hu, Yibo Zhu, Xuanzhe Liu, Xin Jin, Hao Zhang
Published onFebruary 15, 2024Data Motion Acceleration for Heterogeneous Cross-Domain Accelerator ChainingHPCA2024Shu-Ting Wang, Hanyang Xu, Amin Mamandipoor, Rohan Mahapatra, Byung Hoon Ahn, Soroush Ghodrati, Krishnan Kailas, Mohammad Alian, Hadi Esmaeilzadeh
Published onDecember 20, 2023Judging LLM-as-a-judge with MT-Bench and Chatbot ArenaNeurIPS2023Lianmin Zheng*, Wei-Lin Chiang*, Ying Sheng*, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric Xing, Hao Zhang, Joseph E Gonzalez, Ion Stoica
Published onDecember 1, 2023AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning ServingOSDI2023Zhuohan Li*, Lianmin Zheng*, Yinmin Zhong*, Vincent Liu, Ying Sheng, Xin Jin, Yanping Huang, Zhifeng Chen, Hao Zhang, Joseph E Gonzalez, Ion Stoica
Published onNovember 1, 2023How Long Can Context Length of Open-Source LLMs truly Promise?Instruction-Tuning-and-Instruction-Following-Workshop-@-NeurIPS2023Dacheng Li*, Rulin Shao*, Anze Xie, Ying Sheng, Lianmin Zheng, Joseph Gonzalez, Ion Stoica, Xuezhe Ma, Hao Zhang
Published onOctober 23, 2023Efficient Memory Management for Large Language Model Serving with PagedAttentionSOSP2023Woosuk Kwon*, Zhuohan Li*, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Yu, Joey Gonzalez, Hao Zhang, Ion Stoica
Published onOctober 2, 2023TripLe: Revisiting Pretrained Model Reuse and Progressive Learning for Efficient Vision Transformer Scaling and SearchingICCV2023Cheng Fu, Hanxian Huang, Zixuan Jiang, Yun Ni, Lifeng Nai, Gang Wu, Liqun Cheng, Yanqi Zhou, Sheng Li, Andrew Li, Jishen Zhao
Published onJuly 23, 2023Everyone’s Preference Changes Differently: A Weighted Multi-Interest Model for RetrievalICML2023Hui Shi, Yupeng Gu, Yitong Zhou, Bo Zhao, Sicun Gao, Jishen Zhao
Published onMay 15, 2023On Optimizing the Communication of Model ParallelismMLSYS2023Yonghao Zhuang*, Hexu Zhao*, Lianmin Zheng, Zhuohan Li, Eric P. Xing, Qirong Ho, Joseph E. Gonzalez, Ion Stoica, Hao Zhang
Published onMay 1, 2023MPCFormer: Fast, Performant and Private Transformer Inference with MPCICLR2023Dacheng Li*, Rulin Shao*, Hongyi Wang*, Han Guo, Eric P. Xing, Hao Zhang
Published onDecember 1, 2022AMP: Automatically Finding Model Parallel Strategies with Heterogeneity AwarenessNeurIPS2022Dacheng Li, Hongyi Wang, Eric Xing, Hao Zhang
Published onOctober 10, 2022Q-gym: An Equality Saturation Framework for DNN Inference Exploiting Weight RepetitionPACT2022Cheng Fu, Hanxian Huang, Bram Wasti, Chris Cummins, Riyadh Baghdadi, Kim Hazelwood, Yuandong Tian, Jishen Zhao, Hugh Leather
Published onJuly 1, 2022Alpa: Automating Inter-and Intra-Operator Parallelism for Distributed Deep LearningOSDI2022Lianmin Zheng*, Zhuohan Li*, Hao Zhang*, Yonghao Zhuang, Zhifeng Chen, Yanping Huang, Yida Wang, Yuanzhong Xu, Danyang Zhuo, Eric P. Xing, Joseph E. Gonzalez, Ion Stoica
Published onFebruary 24, 2022Learning Bounded Context-Free-Grammar via LSTM and the Transformer: the Difference and the ExplanationsAAAI2022Hui Shi, Sicun Gao, Yuandong Tian, Xinyun Chen, Jishen Zhao
Published onJuly 1, 2021Terapipe: Token-level Pipeline Parallelism for Training Large-scale Language ModelsICML2021Zhuohan Li, Siyuan Zhuang, Shiyuan Guo, Danyang Zhuo, Hao Zhang, Dawn Song, Ion Stoica
Published onFebruary 1, 2021Ada-segment: Automated Multi-loss Adaptation for Panoptic SegmentationAAAI2021Gengwei Zhang, Yiming Gao, Hang Xu, Hao Zhang, Zhenguo Li, Xiaodan Liang