From Microphone To .Wav With GetUserMedia And Web Audio

Microphones are a great way to get audio into your computer, and with the power of the Web Audio API, you can do a lot with that audio.

In this article, we’ll look at how to use the getUserMedia() method to access the microphone input, then process it in real time using Web Audio. We’ll also look at how to save the processed audio as a .wav file.

Contents show

Can JavaScript Access Microphone?

Yes, JavaScript can access microphone input with the help of WebRTC.

WebRTC (Web Real-Time Communication) is a technology which enables web browsers to communicate with each other directly, without the need for an intermediate server.

One of the features of WebRTC is the getUserMedia() API, which allows us to access the microphone input and use it in our web applications.

The getUserMedia() API is a part of the WebRTC standard, and is available in most modern browsers.

How to Use getUserMedia()

The getUserMedia() method is used to request microphone access from the user.

When called, it will open up a permission prompt asking the user to allow the web page to access the microphone.

If the user grants permission, then the getUserMedia() method will return a MediaStream object, which represents a stream of audio data from the microphone.

We can then use this MediaStream object in our Web Audio application.

How To Use Web Audio To Process Audio

Web Audio is a JavaScript API for processing and playing audio in the web browser.

It can be used to create custom sound effects, process and synthesize sounds, or create complex audio applications.

The Web Audio API provides a powerful and flexible way to process audio data in the browser.

In this article, we’ll use the Web Audio API to take microphone input and turn it into a .wav file.

What Is A .WAV File?

A .wav file is a digital audio file that store waveform data.

Waveform data is a representation of the amplitude of sound waves over time.

.WAV files are typically used for storing audio data on PCs, and are also a common format for storing audio data on CDs.

From Microphone To .Wav Using GetUserMedia And Web Audio

First, we’ll need access to the microphone, which we can do using the getUserMedia API.

if (!navigator.getUserMedia)

        navigator.getUserMedia = navigator.getUserMedia || navigator.webkitGetUserMedia ||

                      navigator.mozGetUserMedia || navigator.msGetUserMedia;

if (navigator.getUserMedia){

    navigator.getUserMedia({audio:true}, success, function(e) {

    alert("Error capturing audio.");

    });

} else alert("getUserMedia not supported in this browser.");

The first argument of the getUserMedia() API is a description of what we’re interested in obtaining access to (e.g., the microphone), whereas if we wanted access to the camera, we would have supplied an object with the video flag turned on:

navigator.getUserMedia({video:true}, success, function(e) {

    alert("Error capturing video.");

});

The first two arguments are callbacks that will be invoked upon successful hardware access or failure. When the user clicks "Allow" through this panel, the success callback will be triggered:

After the user has granted permission to use the microphone, we need to query the PCM samples; this is where it gets difficult and the Web Audio APIs enter the game. If you haven’t yet studied the Web Audio Specifications, you’ll be alarmed by how huge and daunting the surface appears when you see it for the first time, because with them, you can perform a variety of tasks like audio filters, synthesized music, 3D audio engines, and more. All we require are PCM samples stored in a WAV container using an ArrayBuffer to do all that.

So our user has approved, we’ll now generate an audio context and begin capturing the audio data:

function success(e){

    // creates the audio context

    audioContext = window.AudioContext || window.webkitAudioContext;

    context = new audioContext();

    // retrieve the current sample rate to be used for WAV packaging

    sampleRate = context.sampleRate;

    // creates a gain node

    volume = context.createGain();

    // creates an audio node from the microphone incoming stream

    audioInput = context.createMediaStreamSource(e);

    // connect the stream to the gain node

    audioInput.connect(volume);

    /* From the spec: This value controls how frequently the audioprocess event is

    dispatched and how many sample-frames need to be processed each call.

    Lower values for buffer size will result in a lower (better) latency.

    Higher values will be necessary to avoid audio breakup and glitches */

    var bufferSize = 2048;

    recorder = context.createJavaScriptNode(bufferSize, 2, 2);

    recorder.onaudioprocess = function(e){

        console.log ("recording");

        var left = e.inputBuffer.getChannelData (0);

        var right = e.inputBuffer.getChannelData (1);

        // we clone the samples

        leftchannel.push (new Float32Array (left));

        rightchannel.push (new Float32Array (right));

        recordingLength += bufferSize;

    }

    // we connect the recorder

    volume.connect (recorder);

    recorder.connect (context.destination);

}

The createJavaScriptNode method takes as a first parameter the buffer size you want to retrieve, as I noted in the comments, this value will influence how often the audioprocess event is dispatched. Choose a low number for optimum latency, like 2048 (remember it must be a power of two).

We call the getChannelData APIs for each channel (left and right) and obtain a new Float32Array buffer for each channel that we clone (sorry GC), then store into two separate Arrays every time the event is dispatched.

You may use a custom constant to prevent your code from being obfuscated. However, given that these do not have an undefined length, we must fallback to plain Arrays instead of Float32Arrays.

Why must we duplicate the channels? It drove me crazy for a long time. When the channel is played back, the pointers to the current samples coming in are preserved; otherwise, you’ll wind up with samples that correspond to the sound issuing from your microphone at the moment you quit recording.

Once we have our arrays of buffers, we need to flat down each channel:

function mergeBuffers(channelBuffer, recordingLength){

  var result = new Float32Array(recordingLength);

  var offset = 0;

  var lng = channelBuffer.length;

  for (var i = 0; i < lng; i++){

    var buffer = channelBuffer[i];

    result.set(buffer, offset);

    offset += buffer.length;

  }

  return result;

}

We can interleaved the two channels when they are flat:

function interleave(leftChannel, rightChannel){

  var length = leftChannel.length + rightChannel.length;

  var result = new Float32Array(length);

  var inputIndex = 0;

  for (var index = 0; index < length; ){

    result[index++] = leftChannel[inputIndex];

    result[index++] = rightChannel[inputIndex];

    inputIndex++;

  }

  return result;

}

We then add the little writeUTFBytes utility function:

function writeUTFBytes(view, offset, string){

  var lng = string.length;

  for (var i = 0; i < lng; i++){

    view.setUint8(offset + i, string.charCodeAt(i));

  }

}

You can now begin to WAV package the recording, and if required, you may modify the volume variable (from 0 to 1).

var leftBuffer = mergeBuffers ( leftchannel, recordingLength );

var rightBuffer = mergeBuffers ( rightchannel, recordingLength );

// we interleave both channels together

var interleaved = interleave ( leftBuffer, rightBuffer );

// create the buffer and view to create the .WAV file

var buffer = new ArrayBuffer(44 + interleaved.length * 2);

var view = new DataView(buffer);

// write the WAV container, check spec at: https://ccrma.stanford.edu/courses/422/projects/WaveFormat/

// RIFF chunk descriptor

writeUTFBytes(view, 0, "RIFF");

view.setUint32(4, 44 + interleaved.length * 2, true);

writeUTFBytes(view, 8, "WAVE");

// FMT sub-chunk

writeUTFBytes(view, 12, "fmt ");

view.setUint32(16, 16, true);

view.setUint16(20, 1, true);

// stereo (2 channels)

view.setUint16(22, 2, true);

view.setUint32(24, sampleRate, true);

view.setUint32(28, sampleRate * 4, true);

view.setUint16(32, 4, true);

view.setUint16(34, 16, true);

// data sub-chunk

writeUTFBytes(view, 36, "data");

view.setUint32(40, interleaved.length * 2, true);

// write the PCM samples

var lng = interleaved.length;

var index = 44;

var volume = 1;

for (var i = 0; i < lng; i++){

    view.setInt16(index, interleaved[i] * (0x7FFF * volume), true);

    index += 2;

}

// our final binary blob that we can hand off

var blob = new Blob ( [ view ], { type : "audio/wav" } );

Obviously, if WAV packaging gets too costly, it’s a good idea to outsource it to a background employee. We can now save the blob to a file or do whatever we want with it. We can now save it locally or remotely, edit it, or even publish it. You may also try out the live demo for more enjoyment.

This Post Has 3 Comments

Erik Hermansen May 10, 2022 Reply

This is an excellent article. It’s a good mix of code and explanation. Thanks so much for writing it!
kris merckx January 1, 2023 Reply

Where can I find the live demo?
kris merckx January 1, 2023 Reply

Where can I find the live demo of this example code?