Published on

Our paper received an ACM SIGSOFT Distinguished Paper Award

Authors
  • avatar
    Name
    MLSys @UCSD
    Twitter

The paper co-authored by Hanxian Huang and Jishen Zhao was one of 11 (out of 143) papers to receive an ACM SIGSOFT Distinguished Paper Award at the 2024 International Symposium on Software Testing and Analysis (ISSTA).

WasmRev_overview

WebAssembly (Wasm) is a low-level, portable, bytecode format compiled from high-level languages, such as C, C++, and Rust, delivering near-native performance when executed on the web. Recent studies have introduced machine learning (ML)-based WebAssembly reverse engineering tools. But the generalization of task-specific ML solutions remains challenging, because their effectiveness hinges on the availability of an ample supply of high-quality task-specific labeled data.

The paper proposed WasmRev, a multi-modal pre-trained language model for WebAssembly reverse engineering. WasmRev is pre-trained using self-supervised learning on a large-scale multi-modal corpus encompassing source code, code documentation and the compiled WebAssembly, without requiring labeled data. WasmRev incorporates three tailored multi-modal pre-training tasks to capture various characteristics of WebAssembly and cross-modal relationships. WasmRev is only trained once to produce general-purpose representations that can broadly support WebAssembly reverse engineering tasks through few-shot fine-tuning with much less labeled data, improving data efficiency. WasmRev is fine-tuned onto three important reverse engineering tasks: type recovery, function purpose identification and WebAssembly summarization. Our results show that WasmRev pre-trained on the corpus of multi-modal samples establishes a robust foundation for these tasks, achieving high task accuracy and outperforming the state-of-the-art ML methods for WebAssembly reverse engineering. More details can be found in the paper: https://arxiv.org/pdf/2404.03171