Silica Gel: Do Not Eat: July 2016

This isn't the demo.
This is just a screenshot.

I've been working on some JavaScript music code for a possible future project. So far I've learned a few interesting things about music theory, sound, and digital audio, and I've reached a milestone so here's Keyboard Demo, a little electronic keyboard demo.

Here are some things I needed to learn to get this far:

Generating audio with FFmpeg

I wanted to be able to play a range of pitches; the standard 88-key piano keyboard runs from C0 to C8, so at least those. I also wanted to support IE. There don't seem to be a lot of good baked-in options for that sort of thing, so I knew the lowest common denominator would be to have a bunch of tiny audio files, one for each pitch.

I considered generating all the pitches with MIDI in Anvil Studio, recording with Audacity, and then cutting that audio up into individual files, but that would have taken forever. After poking around, I discovered that you can generate audio in batch using FFmpeg.

FFmpeg is a very useful tool for audio and video manipulation. It can transcode audio and video files, change sample- and frame- rates, apply filters, crossfade, and probably hundreds of other tricks I don't even know about. I didn't know it could generate audio (or video) until I ran into this superuser.com question. The idea is to use ffmpeg's filtergraph feature as an input, with a command line like this:

ffmpeg -f lavfi -i "<filtergraph string>" <output>

FFmpeg's filtergraph strings are pretty complicated, but making a sine wave is pretty easy:

sine=frequency=440:duration=1

Yup. That's a sine wave.

A quick python script to generate 88 calls to FFmpeg and now I've got all the pitches I need as individual m4a files (IE won't play ogg, sadly). There's some interesting music theory in just the choice of frequencies, of which I'll give a taste here:

I generated pitches using equal temperatment, meaning that each pair of adjacent pitches has the same frequency ratio. As temperaments go, this is the one that all the cool kids are using. It beats out well and meantone temperament, and generally makes sense if you want to play in multiple keys using the same set of pitches. Apparently it produces some slightly impure intervals, though I haven't yet compared it to just intionation with my own ear.

Then I based the frequencies around the notion that A4 (the A above middle C) is 440 hertz. That's called concert pitch, or at least that's what we call it lately, and it's not the only pitch reference out there. It's nice that it's a round number like that - makes it easier to remember. I've heard it said that music played in different keys can "feel" different, as though simply transposing a piece upward a semitone can turn it from melancholy to hopeful, or something. But the fact that the exact frequency of a pitch has changes through the ages, and that different ensembles might choose something other than the standard, makes me a little skeptical. Some people have perfect pitch, of course, but I don't know how that works with respect to different pitch references. I guess both the feeling-of-the-key and perfect-pitch concepts are relative to contemporary practice. Anyway, 440.

My ears

The range of frequencies that humans can hear is (very roughly) 20 Hz to 20 kHz, which are nice round numbers that should make you suspicious about the error bars around them. I don't know what the standard deviation is, but I was surprised when I went to listen to those audio files I made. I couldn't hear most of the low octave (C1-A1, ~32-55Hz)! I was sure that I had made a mistake in my script. I've played an 88-key piano before and while my memory, like my pitch, isn't perfect, I seem to recall being able to hear those low notes.

I ran a highly scientific test using this YouTube video, and sure enough, my ears don't kick in until around 55 Hz. I didn't get to check on the high end of my range because the dogs started barking and I can't blame them because those are some annoying frequencies up there.

So what's going on? Well, as you know, a piano doesn't sound like a sine wave. It produces a crazy mess of noise that goes well beyond the fundamental frequency. Some of the biggest frequencies in the spectrum are multiples of the fundamental frequency (called "harmonics"), but also the whole audible range is spattered with low-amplitude impurities, and all that stuff together makes the timbre of the note. That's why those frequency spectrum visualizers are never very satisfying to me - it's all a mess, all the time. I guess it turns out that when I play C1 I'm not usually hearing the fundamental frequency at all (although more volume can help), but my brain is putting the harmonics and other noice together and I end up with a good approximation.

Adding harmonics

So I went back to those sound files and added some harmonics. This stackoverflow answer has some numbers in it for relative frequencies of harmonics for a piano, supposedly. I don't know where they got those numbers from, but I figured it was better than just guessing. I plugged those into my script and generated some more complicated FFmpeg filtergraphs. For example, heres A4 with just the first three harmonics:

sine=frequency=440.00000:duration=1[s1];

[s1]volume=volume=3[i1];

sine=frequency=880.00000:duration=1[s2];

[s2]volume=volume=1.197[i2];

sine=frequency=1320.00000:duration=1[s3];

[s3]volume=volume=0.897[i3];

[i1][i2][i3]amix=inputs=3

The actual filtergraph I used was a little more complex: more harmonics, plus I added a delay to each harmonic so that I didn't end up with a sawtooth-like shape.

Now I can hear the low notes. Plus, the notes are a little less annoying. It's interesting that adding frequencies other than the one I want to play - impure, dirty, frequencies - ones that don't belong - actually make the sound more real.

Aliasing

Hey wait a minute. I can hear the low octave now, just fine, but the high octave sounds just terrible. The notes I hear don't even seem related to the ones I'm asking for. Luckily, I remembered something from college about the Nyquist frequency (so I guess it wasn't a total waste). I had picked a sample rate of 16 kHz, because I wanted to save bandwidth and it made the audio files nice and small. Sure enough, if your sample rate is less than twice some frequency you'd like to hear then sorry, that's just not gonna happen. Instead you're going to hear a bizzaro mirror-world frequency, and you probably won't like it.

To me, the best analogy is to that old experiment where you look at a fan in a strobe light: at a certain strobe frequency the fan appears to stop spinning, and then if you strobe faster the fan seems to rotate backwards. It's not a perfect analogy, but basically this is the kind of crazy stuff that happens when analog meets digital. I probably could have fixed it by filtering out the high frequencies or increasing my sample rate, but these audio files are just for IE users. I can just drop those pitches from the range. The IE users probably won't miss them. What are the non-IE users going to get?

The Web Audio API

Modern browsers have the Web Audio API built-in, and with that I don't need any silly audio files laying about. I can generate my noises in real time! It's as simple as this:

    var context = new AudioContext();

    var oscillator = context.createOscillator();

    oscillator.frequency.value = 440;

    oscillator.connect(context.destination);

    oscillator.start();

That's produces a pure A4 sine wave. Add more oscillators for the harmonics, watch out for aliasing, put a "gain" node in there because the default oscillator is super loud, and we're in business. There are some cool tricks like amplitude envelopes that I could use to make better or crazier sounds (see this web synthesizer site for instance). Really, the engineering of synthesizers is an enormously deep rabbit-hole, and I'm tempted to jump in, but this will do for now.

Volume

Another thing about sounds: Loudness. I used to think it was mostly about how far up and down the waveform moved. Nope. It's also not about the rate-of-change of the waveform at any given point. If you're thinking about amplitude of sine waves, that's closer, but still not right.

Uh-uh.

Nope.

From what I now understand, it's a combination of the amount of energy in the wave over a certain span of time, and the sensitivity of the human ear to the frequencies involved. A big spike in the wave means the speaker (for example) has to push a bunch of air molecules really hard, but it has to keep working on that air for a while before it will affect how loud we think the sound is. Plus, even a very energetic sound may be very quiet if it's near the edges of the range of our hearing, like those low-octave pure-sine-wave notes I couldn't hear. Loudness is a messy concept, and, when it comes to commercial music, thank goodness for ReplayGain.

In my demo you'll notice that some of the notes are louder than others, due partly to the fact that I didn't vary the oscillator amplitudes as the pitch went up, but also due to a phenomenon represented well by the Fletcher-Munson curves, which show that some frequencies just sound louder than others even when the "sound pressure level" is the same.

The interface

The interface for this demo is pretty simple: Click and hold a key to play it. If you want to get fancy, you can hold down shift and click different notes to make chords. I took a tape ruler to the little electric keyboard I have for some measurements and then used a little dynamic SVG to generate the keys. There's a little dot on middle C just to get you oriented. That's about it. I have some bigger plans for this code, so stay... tuned. See what I did there? Tuned? It's.. oh, ok. Here's the link again:

Keyboard Demo

Silica Gel: Do Not Eat

Monday, July 25, 2016

Music Theory, Sound Science and Digital Audio