Ten months ago, when I was first starting out on this project, I spoke to a friend with decades of experience in developing streaming platforms. I had a vague idea of what to do, and a slightly clearer picture of what I was wanting to achieve, but the “how to glue it all together” stage was beyond me.

In my world there are two different types of code, client-side and server-side. Client-side happens in front of you, it’s the push-a-button-and-the-screen-colour-changes, or position-that-image-here stuff. You’ll mostly encounter it on the web as JavaScript running in your browser, or the design and layout of web pages. Server-side is stuff that happens elsewhere, working behind the scenes to manage data. I write both client and server-side code, with PHP as my core language.

You have to learn [server-side] Python”, my friend said. “There are various Python audio libraries that will do most of the hard work, collating the separate music tracks into a single audio stream for broadcast”.

So, comfortable that learning-new-stuff is the whole point of everything, I filed that away as “something to do later” and got to work on the management system.

“Later” was January and, after completing two online Python courses (beginners and advanced), I started chicken-pecking my way through my new and glorious solution… and failed to get much further than #!/usr/bin/env python

Facing the limit of my understanding and fairly invested in this whole thing by now, I advertised the work on a job site, eventually finding an Indian development company with experience in Python audio streaming. After writing the code for the management system I had a clear picture of what the new script had to do, so we didn’t think it would be an issue.

The algorithm and API (the brains of the station) was mostly written, and I was able to provide copious instructions, diagrams and pseudo-code of the entire logic. Lalit, the project’s primary developer, started on what was supposed to be a week’s work… and then he contracted Covid – reports from India were sounding horrific by then. Once recovered he headed down some cul-de-sacs of his own and, after a while, the company pulled in an additional developer.

Then I was asked whether I really wanted to manage the volume level of tracks – absolutely! – which was accompanied by a fair amount of their head-scratching and my growing panic about the quickly diminishing fund I had scraped together. The company was, however, amazing; they only charged for a fraction over the scheduled week and spent considerably more development time than that [SubCoDevs if you’re looking for a recommendation]. Eventually, two months later, they finally gave up; “What you want is not possible”.

We had misunderstood a fundamental aspect of Python (and, fwiw, server-side audio processing in general). The changes/effects to an audio file must happen prior to breaking it up into chunks and sending it to the broadcast platform. Yes, you can do everything I wanted – change the volume, fade the music over time, trigger the next track to play, and do it all in Python – but you need to do it in advance. Trying to change files dynamically once streaming has begun is incredibly complex (not going to say impossible, ‘cause I’m sure it is somehow, but it was certainly beyond my limited budget).

What I am wanting is for the radio platform algorithm to select the best next track only once the current track has started playing. It might also say that there needs to be an “ident” (basically me saying “you’re listening to…”) that might run over the tail end of the currently playing track. I don’t want that ident to compete against the underlying song, so the volume of the already-started-to-stream music track must be lowered – changing the volume of an already streaming audio file. “What you want is not possible”.

Having spent all I could afford on the not-possible, I asked around for help. Eventually someone clever suggested that as I wrote the track management system in JavaScript, i.e. client-side, perhaps I could expand that to create a compiled stream? Client-side happens in real-time, so changing the volume is as simple as turning down the dial at the right moment.

Ideally the streaming process is hosted on an Amazon EC2 Instance (a “server in the cloud”) that scales on demand and keeps resources clearly defined (so that when I get to the point of providing the platform to a third-party, everything is simple). These servers don’t have monitors or soundcards, but I knew it should be possible to run my JavaScript, make the server think it had a soundcard, collect the signal from this pretend-soundcard and pipe it to Icecast.

I started with a version of JavaScript developed to run server-side called Node (which I have used previously when developing for Alexa, so it’s not totally unfamiliar…), but after a week or so of thrashing around with web-audio-api and trying to hot-wire client-side audio libraries into a server-side environment, that went out the window. Node.js isn’t really built for dynamic audio control.

The only other thing I could think of was to write a script, host it on my server, and then run a web browser 24/7 on the EC2 instance that opened this page and mixed the music together in real-time, playing it just like any web page with music does. If I could get the pretend soundcard to collect the audio output from the browser then the rest of the chain should be simple.

I eventually managed to get this working by using a Google tool called Puppeteer, developed for web geeks to automate testing of big sites, or for running around the internet stealing content and taking screenshots. Puppeteer runs a headless (i.e. without a screen) version of Chrome that loads my script (built upon the Howler.js audio library) just like any other web page. The audio out from this brower is then sent to PulseAudio (a core Linux sound module that includes compression and other fancy tools) directing it to a pretend soundcard PulseAudio has created. The pretend soundcard is monitored by DarkIce which grabs the digital audio signal, encodes it and sends it to Icecast. Icecast provides the point to which multiple listeners can connect to listen to the single stream. All this must be set up in such a way that if the EC2 instance reboots then all these stages are rebuilt and the stream is automatically restored.

So, I managed to get that all working with a basic “play this one song” script last week – yay! Since then I’ve been working on the final link in the chain, the “grab all the sources and create a lovely audio stream” script that the system will use to drive everything. I’m using my old-school radio knowledge, with three “carts” cueing sequentially, two for music, one for the idents. There’s a fair amount for this script to do, and bug fixing audio problems is… tricky, so I spent a day adding a development-only front-end to provide me with real-time feedback.

I’m currently listening to a steady stream of music through it, cued and faded and with all the volume controls I could ask for. Next week I’ll integrate the idents into the mix and then there’s just a small matter of adding cue points to the first 2000 tracks and I’ll be ready for y’all.