What is the Web Audio API: A primer

DevToolsGuy / Monday, August 31, 2015

Audio is a huge part of what makes interactive experiences so compelling. Well-placed sounds are critical for notifications, chimes, and of course audio and video communication applications. For visually impaired computer users, audio cues, speech synthesis, and speech recognition are critically important for a usable experience. Web Audio API is built to control audio on the Web, allowing developers to choose audio sources, add effects to audio, create audio visualizations, apply spatial effects (such as panning) and much more.

What is Web Audio API?

 

Web Audio API is a high-level JavaScript API for processing and synthesizing audio in web applications. The goal of this API is to include capabilities found in modern game engines and some of the mixing, processing, and filtering tasks that are found in modern desktop and web applications.

A brief history

 

Incorporating audio into browsers all started in the late 1990s with the tag in Internet Explorer which could automatically play midi files when opening websites. Browsers later begun to use third party plugins like Flash, QuickTime or Silverlight but all these plugins had shortcomings. Then, with the arrival of mobile browsers which didn’t support flash, there was a need for something new. The solution came with HTML5, which introduced the which introduced the

 

Nonetheless, playing audio wasn’t enough - we wanted to analyze, react to and manipulate our audio. Also, HTML5 has significant limitations for implementing sophisticated games and interactive applications and this is where the Web Audio API comes into picture by providing many functionalities, including:

 

  • Playing different sounds with rhythm
  • Adjusting the volume
  • Cross-fading between two sounds
  • Equal Power Crossfading
  • Playlist Crossfading

 

What’s in it for Developers?

Modular Routing

 

Modular routing allows arbitrary connections between different AudioNode objects. Each node can have inputs and/or outputs. A source node has no inputs and a single output while a destination node has one input and no outputs, the most common example being AudioDestinationNode - the final destination of the audio hardware. Other nodes such as filters can be placed between the source and destination nodes. Developers don't have to worry about low-level stream format details when two objects are connected together; the right thing just happens.

 

This is how single routing would look like when a single source is routed directly to the output:

In many games, multiple sources of sound are combined to create the final mix. Sources include background music, game sound effects, UI feedback sounds, and in a multiplayer setting, other players’ voices. With Modular routing in Web Audio API, you can separate all the sounds by giving full control over each sound as well as all of them together.

 

This is how complex modular routing would look like where we can combine all channels to single output:

With this setup, it is easy for players to control the level of each channel separately. For example, many people prefer to play games with the background music turned off.

 

Precise Timing Model

 

For web applications, the time delay between mouse and keyboard events and a sound being heard is important. This time delay is called latency and is caused by several factors (input device latency, internal buffering latency, DSP processing latency, output device latency, distance of user's ears from speakers, etc.), and is cumulative. The larger this latency is, the less satisfying the UX. In the extreme, it can make musical production or game-play impossible.

 

Even at moderate levels, latency can affect timing and give the impression of sounds lagging behind or the game being non-responsive. For musical applications the timing problems affect rhythm. For gaming, the timing problems affect precision of gameplay. For interactive applications, it generally cheapens the user’s experience in much the same way as low animation frame-rates.

 

With Web Audio API, latency can be avoided and precise timing enables you to schedule events at specific times in the future. This is very important for scripted scenes and musical applications. Below are the features available in Web Audio API to overcome this issue:

 

  • Precise Playback and Resume
  • Scheduling Precise Rhythms
  • Changing Audio Parameters
  • Gradually Varying Audio Parameters
  • Custom Timing Curves

 

Analysis and Visualization

 

There is more to that in Web Audio API than audio synthesis and processing. It also provides a way of understanding the sound that is being played.  A good visual analyzer can act as a sort of debugging tool for tweaking sounds to be just right. Secondly, visualization is critical for any games and applications related to music.

 

We can analyze the sound wave over the time domain as well as frequency domain. You can add animation to the visualization of the sound. Finally we can set up a loop that queries and renders the analyzer for its current frequency analysis.

 

Below are the key points to achieve a simple Analysis and Visualization of sound.

 

  • Frequency Analysis
  • Animating with requestAnimationFrame
  •  Visualizing Sound

 

A final thought …

 

Web Audio API makes audio synthesis and processing very simple with its enormous range of functionalities. However, developers need to use it very cautiously as the API is implemented in JavaScript which causes some performance issues on browsers and particularly with today’s generation of mobile devices. While processing audio in JavaScript, it is extremely challenging to get reliable, glitch-free audio while achieving a reasonably low-latency, especially under heavy processor loads. In future we can expect that it will improve performance issues caused by JavaScript, but it’s generally pretty useful and turns the tide towards the developer.