Posts Tagged accessibility

Audio Word Clouds

For my comprehensive channel trailer, I created a word cloud of the words used in titles and descriptions of the videos uploaded each month. Word clouds have been around for a while now, so that’s nothing unusual. For the soundtrack, I wanted to make audio versions of these word clouds using text-to-speech, with the most common words being spoken louder. This way people with either hearing or vision impairments would have a somewhat similar experience of the trailer, and people with no such impairments would have the same surplus of information blasted at them in two ways.

I checked to see if anyone had made audio word clouds before, and found Audio Cloud: Creation and Rendering, which makes me wonder if I should write an academic paper about my audio word clouds. That paper describes an audio word cloud created from audio recordings using speech-to-text, while I wanted to create one from text using text-to-speech. I was mainly interested in any insights into the number of words we could perceive at once at various volumes or voices. In the end, I just tried a few things and used my own perception and that of a few friends to decide what worked. Did it work? You tell me.

Part of the System Voice menu in the Speech section of the Accessibility panel of the macOS Catalina System Preferences


There’s a huge variety of English voices available on macOS, with accents from Australia, India, Ireland, Scotland, South Africa, the United Kingdom, and the United States, and I’ve installed most of them. I excluded the voices whose speaking speed can’t be changed, such as Good News, and a few novelty voices, such as Bubbles, which aren’t comprehensible enough when there’s a lot of noise from other voices. I ended up with 30 usable voices. I increased the volume of a few which were harder to understand when quiet.

I wondered whether it might work best with only one or a few voices or accents in each cloud, analogous to the single font in each visual word cloud. That way people would have a little time to adapt to understand those specific voices rather than struggling with an unfamiliar voice or accent with each word. On the other hand, maybe it would be better to have as many voices as possible in each word cloud so that people could distinguish between words spoken simultaneously by voice, just as we do in real life. In the end I chose the voice for each word randomly, and never got around to trying the fewer-distinct-voices version. Being already familiar with many of these voices, I’m not sure I would have been a good judge of whether that made it easier to get used to them.

Arranging the words

It turns out making an audio word cloud is simpler than making a visual one. There’s only one dimension in an audio word cloud — time. Volume could be thought of as sort of a second dimension, as my code would search through the time span for a free rectangle of the right duration with enough free volume. I later wrote an AppleScript to create ‘visual audio word clouds’ in OmniGraffle showing how the words fit into a time/volume rectangle.  I’ve thus illustrated this post with a visual word cloud of this post, and a few audio word clouds and visual audio word clouds of this post with various settings.

A visual representation of an audio word cloud of an early version of this post, with the same hubbub factor as was used in the video. The horizontal axis represents time, and the vertical axis represents volume. Rectangles in blue with the darker gradient to the right represent words panned to the right, while those in red with the darker gradient to the left represent words panned to the left.

However, words in an audio word cloud can’t be oriented vertically as they can in a visual word cloud, nor can there really be ‘vertical’ space between two words, so it was only necessary to search along one dimension for a suitable space. I limited the word clouds to five seconds, and discarded any words that wouldn’t fit in that time, since it’s a lot easier to display 301032 words somewhat understandably in nine minutes than it is to speak them. I used the most common (and therefore louder) words first, sorted by length, and stopped filling the audio word cloud once I reached a word that would no longer fit. It would sometimes still be possible to fit a shorter, less common word in that cloud, but I didn’t want to include words much less common than the words I had to exclude.

I set a preferred volume for each word based on its frequency (with a given minimum and maximum volume so I wouldn’t end up with a hundred extremely quiet words spoken at once) and decided on a maximum total volume allowed at any given point. I didn’t particularly take into account the logarithmic nature of sound perception. I then found a time in the word cloud where the word would fit at its preferred volume when spoken by the randomly-chosen voice. If it didn’t fit, I would see if there was room to put it at a lower volume. If not, I’d look for places it could fit by increasing the speaking speed (up to a given maximum) and if there was still nowhere, I’d increase the speaking speed and decrease the volume at once. I’d prioritise reducing the volume over increasing the speed, to keep it understandable to people not used to VoiceOver-level speaking speeds. Because of the one-and-a-bit dimensionality of the audio word cloud, it was easy to determine how much to decrease the volume and/or increase the speed to fill any gap exactly. However, I was still left with gaps too short to fit any word at an understandable speed, and slivers of remaining volume smaller than my per-word minimum.

A visual representation of an audio word cloud of this post, with a hubbub factor that could allow two additional words to be spoken at the same time as the others.

I experimented with different minimum and maximum word volumes, and maximum total volumes, which all affected how many voices might speak at once (the ‘hubbub level’, as I call it). Quite late in the game, I realised I could have some voices in the right ear and some in the left, which makes it easier to distinguish them. In theory, each word could be coming from a random location around the listener, but I kept to left and right — in fact, I generated separate left and right tracks and adjusted the panning in Final Cut Pro. Rather than changing the logic to have two separate channels to search for audio space in, I simply made my app alternate between left and right when creating the final tracks. By doing this, I could increase the total hubbub level while keeping many of the words understandable. However, the longer it went on for, the more taxing it was to listen to, so I decided to keep the hubbub level fairly low.

The algorithm is deterministic, but since voices are chosen randomly, and different voices take different amounts of time to speak the same words even at the same number of words per minute, the audio word clouds created from the same text can differ considerably. Once I’d decided on the hubbub level, I got my app to create a random one for each month, then regenerated any where I thought certain words were too difficult to understand.


The visual word cloud from December 2019, with both ‘Competition’ and the lowercase ‘competition’ featured prominently

In my visual word clouds, I kept the algorithm case-sensitive, so that a word with the same spelling but different capitalisation would be counted as a separate word, and displayed twice. There are arguments for keeping it like this, and arguments to collapse capitalisations into the same word — but which capitalisation of it? My main reason for keeping the case-sensitivity was so that the word cloud of Joey singing the entries to our MathsJam Competition Competition competition would have the word ‘competition’ in it twice.

Sometimes these really are separate words with different meanings (e.g. US and us, apple and Apple, polish and Polish, together and ToGetHer) and sometimes they’re not. Sometimes these two words with different meanings are pronounced the same way, other times they’re not. But at least in a visual word cloud, the viewer always has a way of understanding why the same word appears twice. For the audio word cloud, I decided to treat different capitalisations as the same word, but as I’ve mentioned, capitalisation does matter in the pronunciation, so I needed to be careful about which capitalisation of each word to send to the text-to-speech engine. Most voices pronounce ‘JoCo’ (short for Jonathan Coulton, pronounced with the same vowels as ‘go-go’) correctly, but would pronounce ‘joco’ or ‘Joco’ as ‘jocko’, with a different vowel in the first syllable. I ended up counting any words with non-initial capitals (e.g. JoCo, US) as separate words, but treating title-case words (with only the initial letter capitalised) as the same as all-lowercase, and pronouncing them in title-case so I wouldn’t risk mispronouncing names.

Further work

A really smart version of this would get the pronunciation of each word in context (the same way my rhyming dictionary finds rhymes for the different pronunciations of homographs, e.g. bow), group them by how they were pronounced, and make a word cloud of words grouped entirely by pronunciation rather than spelling, so ‘polish’ and ‘Polish’ would appear separately but there would be no danger of, say ‘rain’ and ‘reign’ both appearing in the audio word cloud and sounding like duplicates. However, which words are actually pronounced the same depend on the accent (e.g. whether ‘cot’ and ‘caught’ sound the same) and text normalisation of the voice — you might have noticed that some of the audio word clouds in the trailer have ‘aye-aye’ while others have ‘two’ for the Roman numeral ‘II’.

Similarly, a really smart visual word cloud would use natural language processing to separate out different meanings of homographs (e.g. bow🎀, bow🏹, bow🚢, and bow🙇🏻‍♀️) and display them in some way that made it obvious which was which, e.g. by using different symbols, fonts, styles, colours for different parts of speech. It could also recognise names and keep multi-word names together, count words with the same lemma as the same, and cluster words by semantic similarity, thus putting ‘Zoe Keating’ near ‘cello’, and ‘Zoe Gray’ near ‘Brian Gray’ and far away from ‘Blue’. Perhaps I’ll work on that next.

A visual word cloud of this blog post about audio word clouds, superimposed on a visual representation of an audio word cloud of this blog post about audio word clouds.

I’ve recently been updated to a new WordPress editor whose ‘preview’ function gives a ‘page not found’ error, so I’m just going to publish this and hope it looks okay. If you’re here early enough to see that it doesn’t, thanks for being so enthusiastic!

, , , , ,

Leave a comment

‘Accessible’ and ‘Back to the Future Song’ as actual songs

I’ve been away in the Bay Area, on JoCo Cruise, on trains, and at MarsCon, and too many things have happened for one blog post, but here are a few of them. Just before the cruise, Joey Marianer sang ‘Accessible‘, my parody of James Blunt’s ‘Beautiful’ about accessibility:

Joey sang a few other songs of mine during and after the cruise, but I’m going to save them for other posts so that this one is less of a mish-mash. If you would like a preview of those along with a recap of other things I wrote that he sang, here’s a playlist.

But Joey is not the only person whose name starts with ‘Jo’ who has sung words that I wrote! A while ago, my friend Joseph sang ‘Back to the Future Song‘, my parody of Moxy Früvous’s ‘Gulf War Song‘ as part of his Patreon. Lately he’s been opening up older posts to be visible to non-patrons, so now you can also hear Joseph singing Back to the Future Song. I changed that one line that I didn’t like very much.

You can also hear the cover of Moxy Früvous’s ‘Downsizing’ which Joseph sang for me after I lost my last job. If you like these covers, check out some of his other covers, short stories and poems on patron, and become a patron; I’m sure he’d appreciate the support, and you, too, would be able to request things like this.

I’ll post a few more times to update you on some other cool things, and who knows, perhaps I’ll participate in National Poetry Writing Month again. As is usual at this time of year, I’m spending most of my free time lately uploading videos from the JoCo Cruise, so if you want me to entertain you in some way and you can’t wait for the next blog post, subscribe to me on YouTube to see my latest uploads.

, , , , , ,

Leave a comment

Accessible (James Blunt ‘Beautiful’ parody lyrics)

I watched a speech by by Haben Girma yesterday, and as she was talking about making things accessible to people with disabilities and how that can lead to usefulness and innovations beyond accessibility, I decided to write a quick parody of James Blunt’s ‘Beautiful’ about accessibility. Here are the lyrics. They don’t always match the original tune exactly, but rather an approximation of it I have in my head from listening to parodies such as The Amateur Transplants’ Beautiful Song, Weird Al’s You’re Pitiful, and James Blunt’s own My Triangle. Feel free to come up with your own additions to include other aspects of accessibility or other things which could be more accessible.

My widget’s brilliant

My venue’s brilliant
Of that I’m sure
For all the people
I don’t ignore

I met someone who was different
And I knew I’d undershot
‘Cause I thought my plans were brilliant
and they were not

Accessible to all
To learn or say, in a different way
When one sense is essential
It makes sense to overhaul

So I fixed my stuff
It was not enough
I met more folk who tried and they still
found it tough
And I don’t think I can accommodate
But I have a goal now and I will innovate

Accessible to all
To simply manoeuvre, no matter how we move
Paralysed or pained or small
Still our movement shall not fall

la la la la…

Accessible to all
And the gains are shared with the unimpaired
Till there’s nothing that seems impossible

If you see, hear, feel the call
Make some tools; tear down a wall.

Of course, I can’t publish lyrics to a song about accessibility without mentioning The Accessibility Song by James Dempsey. If you didn’t like mine, try his, now also available along with his other songs about Cocoa development on iTunes:

I’ve actually seen the original song ‘Beautiful’ performed live, since I randomly went to a James Blunt concert with some co-workers the very first time I was in Vienna. However, even then, I did not really listen to the words. I did while writing this, and realised just how weird and creepy it is. The protagonist is not telling his partner she’s beautiful, as a casual listener might assume, but rather singing to a woman he caught a glimpse of once on a train as if she were some great lost love, despite the fact that they never spoke to each other and she was already involved with someone. It’s essentially Jackson Park Express, presented as a romantic short film rather than a feature-length comedy. With songs like this out there, no wonder random guys on the street sometimes think they can follow me home!

Unrelated to all this, Joey Marianer has once again sung some words I wrote! This time it’s A Few Things You’ll Need for the Cruise, slightly changed to be about the fact that there was, at the time of publishing, exactly one month until JoCo Cruise 2018. There is now less than a month, but still room for you to join us!


, , , , , ,


%d bloggers like this: