It’s all semantics. A short overview of WebAssembly’s language specification.

Macromedia Shockwave Player loading
screen

Before diving into how WebAssembly’s specification was mechanised, it’s important to understand the backstory of the language and why it was devised.

If you’ve been a user of the web for longer than a few years you may remember the loading screen in the above image. Flash, Java Applets, Shockwave and others were technologies that allowed developers to embed applications with multimedia features into a website. This could be apps such as games, videos and other computationally intensive programs. These technologies were not part of the website’s technology stack of HTML, CSS and Javascript. Instead they were essentially a separate entity stuck onto the webpage. As a result there were often performance and security issues with these applets as they relied on users to install and update plugins which were owned and developed by third party companies. As the web became more mature and users’ requirements changed there was a shift away from these multimedia plugins. Steve Jobs famously slated Flash in 2010 when Apple decided not to support the plugin on the iPhone, saying:

“Letting a third party layer of software come between the platform and the developer ultimately results in sub-standard apps and hinders the enhancement and progress of the platform"

Jobs continued, stating:

“While Adobe’s Flash products are widely available, this does not mean they are open, since they are controlled entirely by Adobe and available only from Adobe. By almost any definition, Flash is a closed system."

Job’s ultimatum was the first nail in the coffin for web software such as Flash, with its demise finally coming in 2020 when Adobe officially stopping support the product. This is where WebAssembly comes in. After JavaScript it is the first language to be implemented across all major web browsers, and although it is very different it accomplished similar things to products such as Adobe Flash. WebAssembly has been developed by several top tech companies including Apple, Google and Microsoft, with the aim of enabling high-performance programs on web pages. Due to the fact WebAssembly is an open standard and a joint venture between several actors the language represents the shared interests of multiple stakeholders, both individuals and companies.

In this post I will provide a short overview of the mechanisation of WebAssembly since its announcement in 2015. The development of WebAssembly has been unique as multiple organisations have come together to define a collective open standard. This allows for an analysis of the language from the perspective of its formal methods.

Mechanisation can be considered the process of implementing a formal definition of a language using a mechanised theorem prover. WebAssembly is an ideal target for mechanisation due to the fact it is stable, not large and formally specified - unlike other more mainstream languages.

Technical details

WebGL game engine built using
WebAssembly

WebAssembly is a binary instruction format for a stack-based virtual machine. It is designed as a target for compilation of higher-level languages like C and Rust, enabling deployment for web applications. Although designed for the web, the language can be theoretically used across any operating systems for any number of programs. The primary use case of WebAssembly is running inside of a web browser’s JavaScript virtual machine. The WebAssembly function that is called by the host environment exists within a sandbox environment, unable to directly access the host’s resources. As previously mentioned the language’s official specification includes formal semantics, which is normally reserved for academic work on programming languages and not a characteristic of industry projects. A goal of the language is that all undefined behaviour is eliminated and the specification precisely states the intended type soundness property - before a WebAssembly program is executed all programs undergo type checking validation.

A history of mechanisation

In this section I will describe the timeline of WebAssembly and how it has been mechanised over its short history.

The plan for the development of the WebAssembly project was first announced in 2015, with several browser vendors coordinating a public statement. The aim of the project was to replace JavaScript as the the assembly language of the Web. Brendan Eich, creator of JavaScript and founder of Mozilla stated on his blog:

“Yes, we are aiming to develop the Web’s polyglot-programming-language object-file format."

In its initial iteration WebAssembly was to be co-expressive with asm.js, a subset of JavaScript designed to behave like native code. Asm.js was the predecessor to WebAssembly and was Mozilla’s attempt at enabling programming languages such as C to run natively in the web browser. However in the long term WebAssembly was to diverge from JavaScript’s semantics in order "to best serve as common object-level format for multiple source-level programming languages". JavaScript was never intended to compile to low-level targets, and there were performance issues using the language for such a purpose.

In 2017 WebAssembly was moved to a minimum viable product status and the initial draft of the language’s semantics were released. This draft allowed researchers to mechanise the language according to its formal rules. The first mechanised formalisation of WebAssembly was conducted by Conrad Watt, a PhD researcher at Cambridge. Watt utilised the proof assistant Isabelle in order to mechanise the language’s specification. The W3C claimed that the language’s type system was sound, implying both type safety and memory safety with respect to the WebAssembly semantics. While conducting the proof of this claim, Watt discovered a few key errors which meant that the type system of the language were unsound. In the next section we will go into more detail on the specifics of the errors discovered in the specification.

At the Programming Language Design and Implementation conference in 2017 the language’s formal semantics were released, including details such as its typing rules, small step reduction rules and abstract syntax. The paper included suggested fixes by Watt to the errors he discovered, and he was mentioned by name in the paper’s "Acknowledgements" section. Later on in 2018 Watt officially published his mechanisation of the language, publicising his type soundness proof for the improved semantics.

In 2019 WebAssembly 1.0 was released (known as wasm 1.0). The W3C announced that WebAssembly was now an official web standard, and the language became the fourth "language" to run natively in the browser (alongside HTML, CSS and JavaScript). Wasm 1.0 had some slight refactoring done to it since its initial published 2017 version. The official specification contained a stronger statement of type soundness, however didn’t contain any proof, instead citing Watt’s earlier 2019 work as proof that its standard soundness theorems hold.

In 2021 a second paper is published by researchers matching this new official specification and to prove the stronger statement of type soundness. This time no errors were found in the language’s specifications and W3C’s claims were backed up. This paper contains two Wasm 1.0 mechanised specifications, which together prove type soundness and include an end-to-end verified interpreter. The paper’s objective was to replace Watt’s earlier 2018 work and become "the canonical source for the Wasm 1.0 type soundness proof".

Bugs in WebAssembly’s draft specification

Conrad Watt’s 2018 paper describes the bugs discovered in WebAssembly’s initial formal specification. These errors were related to the claim of type soundness in the language, and meant that the original specification’s type system was unsound. Here I will briefly describe two of Watt’s highlighted errors related to the type system.

Traps

WebAssembly contains traps as a core concept. Similar to software traps in classic assembly languages, a trap is issued by a program in order to immediately abort execution and transfer control to the host. When the trap value is generated in WebAssembly it propagates through the stack terminating all execution of functions, until it reaches the top of the stack and stops the program. However in the draft version of WebAssembly the trap value could become stuck during its reduction progression, meaning exceptions wouldn’t bubble up to the top of the program and would potentially be ignored.

Return

According to the specification, Result types classify the result of executing instructions or functions, which is a sequence of values. In its original iteration the specification defined a return type simply as a "break" operation, where the break procedure would be called for the number of nested labels within the current function call. According to this specification a return operation could occur outside of a function and still be considered well typed instead of being rejected.

Nothing left to prove

In April of 2022 the initial draft of WebAssembly 2.0 was released. The new specification contained a number of key changes, among which are significant alterations to the language’s definition and types. This includes the addition of three new types: number, vector and reference. A key update is the inclusion of reference types, as with this a WebAssembly program can handle references to complex host objects instead of being limited to integer and floating-point values. The goal with reference types is to potentially bring more complex features to WebAssembly such as garbage collection - which has been discussed as a potential addition since 2015. Despite these changes, this new 2.0 specification points to the same 2018 paper by Watt as proof of its soundness:

A machine-verified version of the formalisation and soundness proof is described in the following article”

Researchers such as Watt are working on new and updated mechanisations of WebAssembly 2.0, as stated in the 2021 paper

“we intend to keep our mechanisations abreast of these new features in WebAssembly changes”.

References

– “From asm.js to webassembly”, https://brendaneich.com/2015/06/from-asm-js-to-webassembly/, Brendan Eich, 2015, Jun

– “Steve Jobs slams adobe flash in open letter - and maybe kills it for mobile”, The Guardian, 2010, Apr

– “Adobe Flash Player End of life General Information Page”, Adobe, 2020, Dec

– “WebAssembly is now ready for browsers to use”, InfoWorld, 2017, March

– “WebAssembly Core Specification 1.0”, W3C, 2019, Dec

– “Mechanising and Verifying the WebAssembly Specification”, Watt Conrad, 2018, Association for Computing Machinery, https://doi.org/10.1145/3167082

– “Bringing the Web up to Speed with WebAssembly”, Haas, Andreas and Rossberg, Andreas and Schuff, Derek L. and Titzer, Ben L. and Holman, Michael and Gohman, Dan and Wagner, Luke and Zakai, Alon and Bastien, JF, 2017, Association for Computing Machinery, https://doi.org/10.1145/3062341.3062363

– “Two Mechanisations of WebAssembly 1.0”, Watt, Conrad and Rao, Xiaojia and Pichon-Pharabod, Jean and Bodin, Martin and Gardner, Philippa, https://doi.org/10.1007/978-3-030-90870-6_4

– “WebAssembly Core Specification 2.0”, W3C, 2022, May

– “WebAssembly Github”, https://github.com/WebAssembly/gc, W3C, 2022, May