Posts Tagged speech synthesis
For my comprehensive channel trailer, I created a word cloud of the words used in titles and descriptions of the videos uploaded each month. Word clouds have been around for a while now, so that’s nothing unusual. For the soundtrack, I wanted to make audio versions of these word clouds using text-to-speech, with the most common words being spoken louder. This way people with either hearing or vision impairments would have a somewhat similar experience of the trailer, and people with no such impairments would have the same surplus of information blasted at them in two ways.
I checked to see if anyone had made audio word clouds before, and found Audio Cloud: Creation and Rendering, which makes me wonder if I should write an academic paper about my audio word clouds. That paper describes an audio word cloud created from audio recordings using speech-to-text, while I wanted to create one from text using text-to-speech. I was mainly interested in any insights into the number of words we could perceive at once at various volumes or voices. In the end, I just tried a few things and used my own perception and that of a few friends to decide what worked. Did it work? You tell me.
There’s a huge variety of English voices available on macOS, with accents from Australia, India, Ireland, Scotland, South Africa, the United Kingdom, and the United States, and I’ve installed most of them. I excluded the voices whose speaking speed can’t be changed, such as Good News, and a few novelty voices, such as Bubbles, which aren’t comprehensible enough when there’s a lot of noise from other voices. I ended up with 30 usable voices. I increased the volume of a few which were harder to understand when quiet.
I wondered whether it might work best with only one or a few voices or accents in each cloud, analogous to the single font in each visual word cloud. That way people would have a little time to adapt to understand those specific voices rather than struggling with an unfamiliar voice or accent with each word. On the other hand, maybe it would be better to have as many voices as possible in each word cloud so that people could distinguish between words spoken simultaneously by voice, just as we do in real life. In the end I chose the voice for each word randomly, and never got around to trying the fewer-distinct-voices version. Being already familiar with many of these voices, I’m not sure I would have been a good judge of whether that made it easier to get used to them.
Arranging the words
It turns out making an audio word cloud is simpler than making a visual one. There’s only one dimension in an audio word cloud — time. Volume could be thought of as sort of a second dimension, as my code would search through the time span for a free rectangle of the right duration with enough free volume. I later wrote an AppleScript to create ‘visual audio word clouds’ in OmniGraffle showing how the words fit into a time/volume rectangle. I’ve thus illustrated this post with a visual word cloud of this post, and a few audio word clouds and visual audio word clouds of this post with various settings.
However, words in an audio word cloud can’t be oriented vertically as they can in a visual word cloud, nor can there really be ‘vertical’ space between two words, so it was only necessary to search along one dimension for a suitable space. I limited the word clouds to five seconds, and discarded any words that wouldn’t fit in that time, since it’s a lot easier to display 301032 words somewhat understandably in nine minutes than it is to speak them. I used the most common (and therefore louder) words first, sorted by length, and stopped filling the audio word cloud once I reached a word that would no longer fit. It would sometimes still be possible to fit a shorter, less common word in that cloud, but I didn’t want to include words much less common than the words I had to exclude.
I set a preferred volume for each word based on its frequency (with a given minimum and maximum volume so I wouldn’t end up with a hundred extremely quiet words spoken at once) and decided on a maximum total volume allowed at any given point. I didn’t particularly take into account the logarithmic nature of sound perception. I then found a time in the word cloud where the word would fit at its preferred volume when spoken by the randomly-chosen voice. If it didn’t fit, I would see if there was room to put it at a lower volume. If not, I’d look for places it could fit by increasing the speaking speed (up to a given maximum) and if there was still nowhere, I’d increase the speaking speed and decrease the volume at once. I’d prioritise reducing the volume over increasing the speed, to keep it understandable to people not used to VoiceOver-level speaking speeds. Because of the one-and-a-bit dimensionality of the audio word cloud, it was easy to determine how much to decrease the volume and/or increase the speed to fill any gap exactly. However, I was still left with gaps too short to fit any word at an understandable speed, and slivers of remaining volume smaller than my per-word minimum.
I experimented with different minimum and maximum word volumes, and maximum total volumes, which all affected how many voices might speak at once (the ‘hubbub level’, as I call it). Quite late in the game, I realised I could have some voices in the right ear and some in the left, which makes it easier to distinguish them. In theory, each word could be coming from a random location around the listener, but I kept to left and right — in fact, I generated separate left and right tracks and adjusted the panning in Final Cut Pro. Rather than changing the logic to have two separate channels to search for audio space in, I simply made my app alternate between left and right when creating the final tracks. By doing this, I could increase the total hubbub level while keeping many of the words understandable. However, the longer it went on for, the more taxing it was to listen to, so I decided to keep the hubbub level fairly low.
The algorithm is deterministic, but since voices are chosen randomly, and different voices take different amounts of time to speak the same words even at the same number of words per minute, the audio word clouds created from the same text can differ considerably. Once I’d decided on the hubbub level, I got my app to create a random one for each month, then regenerated any where I thought certain words were too difficult to understand.
In my visual word clouds, I kept the algorithm case-sensitive, so that a word with the same spelling but different capitalisation would be counted as a separate word, and displayed twice. There are arguments for keeping it like this, and arguments to collapse capitalisations into the same word — but which capitalisation of it? My main reason for keeping the case-sensitivity was so that the word cloud of Joey singing the entries to our MathsJam Competition Competition competition would have the word ‘competition’ in it twice.
Sometimes these really are separate words with different meanings (e.g. US and us, apple and Apple, polish and Polish, together and ToGetHer) and sometimes they’re not. Sometimes these two words with different meanings are pronounced the same way, other times they’re not. But at least in a visual word cloud, the viewer always has a way of understanding why the same word appears twice. For the audio word cloud, I decided to treat different capitalisations as the same word, but as I’ve mentioned, capitalisation does matter in the pronunciation, so I needed to be careful about which capitalisation of each word to send to the text-to-speech engine. Most voices pronounce ‘JoCo’ (short for Jonathan Coulton, pronounced with the same vowels as ‘go-go’) correctly, but would pronounce ‘joco’ or ‘Joco’ as ‘jocko’, with a different vowel in the first syllable. I ended up counting any words with non-initial capitals (e.g. JoCo, US) as separate words, but treating title-case words (with only the initial letter capitalised) as the same as all-lowercase, and pronouncing them in title-case so I wouldn’t risk mispronouncing names.
A really smart version of this would get the pronunciation of each word in context (the same way my rhyming dictionary rhyme.science finds rhymes for the different pronunciations of homographs, e.g. bow), group them by how they were pronounced, and make a word cloud of words grouped entirely by pronunciation rather than spelling, so ‘polish’ and ‘Polish’ would appear separately but there would be no danger of, say ‘rain’ and ‘reign’ both appearing in the audio word cloud and sounding like duplicates. However, which words are actually pronounced the same depend on the accent (e.g. whether ‘cot’ and ‘caught’ sound the same) and text normalisation of the voice — you might have noticed that some of the audio word clouds in the trailer have ‘aye-aye’ while others have ‘two’ for the Roman numeral ‘II’.
Similarly, a really smart visual word cloud would use natural language processing to separate out different meanings of homographs (e.g. bow🎀, bow🏹, bow🚢, and bow🙇🏻♀️) and display them in some way that made it obvious which was which, e.g. by using different symbols, fonts, styles, colours for different parts of speech. It could also recognise names and keep multi-word names together, count words with the same lemma as the same, and cluster words by semantic similarity, thus putting ‘Zoe Keating’ near ‘cello’, and ‘Zoe Gray’ near ‘Brian Gray’ and far away from ‘Blue’. Perhaps I’ll work on that next.
I’ve recently been updated to a new WordPress editor whose ‘preview’ function gives a ‘page not found’ error, so I’m just going to publish this and hope it looks okay. If you’re here early enough to see that it doesn’t, thanks for being so enthusiastic!
A few months ago I wrote an app to download my YouTube metadata, and I blogged some statistics about it and some haiku I found in my video titles and descriptions. I also created a few word clouds from the titles and descriptions. In that post, I said:
Next perhaps I’ll make word clouds of my YouTube descriptions from various time periods, to show what I was uploading at the time. […] Eventually, some of the content I create from my YouTube metadata will make it into a YouTube video of its own — perhaps finally a real channel trailer.Me, two and a third months ago
TL;DR: I made a channel trailer of audiovisual word clouds showing each month of uploads:
It seemed like the only way to do justice to the number and variety of videos I’ve uploaded over the past thirteen years. My channel doesn’t exactly have a content strategy. This is best watched on a large screen with stereo sound, but there is no way you will catch everything anyway. Prepare to be overwhelmed.
Now for the ‘too long; don’t feel obliged to read’ part on how I did it. I’ve uploaded videos in 107 distinct months, so creating a word cloud for each month using wordclouds.com seemed tedious and slow. I looked into web APIs for creating word clouds automatically, and added the code to my app to call them, but then I realised I’d have to sign up for an account, including a payment method, and once I ran out of free word clouds I’d be paying a couple of cents each. That could easily add up to $5 or more if I wanted to try different settings! So obviously I would need to spend many hours programming to avoid that expense.
I have a well-deserved reputation for being something of a gadget freak, and am rarely happier than when spending an entire day programming my computer to perform automatically a task that it would otherwise take me a good ten seconds to do by hand. Ten seconds, I tell myself, is ten seconds. Time is valuable and ten seconds’ worth of it is well worth the investment of a day’s happy activity working out a way of saving it.Douglas Adams in ‘Last chance to see…’
I searched for free word cloud code in Swift, downloaded the first one I found, and then it was a simple matter of changing it to work on macOS instead of iOS, fixing some alignment issues, getting it to create an image instead of arranging text labels, adding some code to count word frequencies and exclude common English words, giving it colour schemes, background images, and the ability to show smaller words inside characters of other words, getting it to work in 1116 different fonts, export a copy of the cloud to disk at various points during the progress, and also create a straightforward text rendering using the same colour scheme as a word cloud for the intro… before I knew it, I had an app that would automatically create a word cloud from the titles and descriptions of each month’s public uploads, shown over the thumbnail of the most-viewed video from that month, in colour schemes chosen randomly from the ones I’d created in the app, and a different font for each month. I’m not going to submit a pull request; the code is essentially unrecognisable now.
In case any of the thumbnails spark your curiosity, or you just think the trailer was too short and you’d rather watch 107 full videos to get an idea of my channel, here is a playlist of all the videos whose thumbnails are shown in this video:
It’s a mixture of super-popular videos and videos which didn’t have much competition in a given month.
Of course, I needed a soundtrack for my trailer. Music wouldn’t do, because that would reduce my channel trailer to a mere song for anyone who couldn’t see it well. So I wrote some code to make an audio version of each word cloud (or however much of it could fit into five seconds without too many overlapping voices) using the many text-to-speech voices in macOS, with the most common words being spoken louder. I’ll write a separate post about that; I started writing it up here and it got too long.
The handwritten thank you notes at the end were mostly from members of the JoCo Cruise postcard trading club, although one came with a pandemic care package from my current employer. I have regaled people there with various ridiculous stories about my life, and shown them my channel. You’re all most welcome; it’s been fun rewatching the concert videos myself while preparing to upload, and it’s always great to know other people enjoy them too.
I put all the images and sounds together into a video using Final Cut Pro 10.4.8. This was all done on my mid-2014 Retina 15-inch MacBook Pro, Sneuf.
I subjected Haiku Detector to some serious stress-testing with a 29MB text file (that’s 671481 sentences, containing 16810 haiku, of which some are intentional) a few days ago, and kept finding more things that needed fixing or could do with improvement. A few days in a nerdsniped daze later, I have a new version, and some interesting tidbits about the way Mac speech synthesis pronounces things. Here’s some of what I did:
- Tweaked the user interface a bit, partly to improve responsiveness after 10000 or so haiku have been found.
- Made the list of haiku stay scrolled to the bottom so you can see the new ones as they’re found.
- Added a progress bar instead of the spinner that was there before.
- Fixed a memory issue.
- Changed a setting so it should work in Mac OS X 10.6, as I said here it would, but I didn’t have a 10.6 system to test it on, and it turns out it does not run on one. I think 10.7 (Lion) is the lowest version it will run on.
- Added some example text on startup so that it’s easier to know what to do.
- Made it a Developer ID signed application, because now that I have a bit more time to do Mac development (since I don’t have a day job; would you like to hire me?), it was worth signing up to the paid Mac Developer Program again. Once I get an icon for Haiku Detector, I’ll put it on the app store.
- Fixed a few bugs and made a few other changes relating to how syllables are counted, which lines certain punctuation goes on, and which things are counted as haiku.
That last item is more difficult than you’d think, because the Mac speech synthesis engine (which I use to count syllables for Haiku Detector) is very clever, and pronounces words differently depending on context and punctuation. Going through words until the right number of syllables for a given line of the haiku are reached can produce different results depending on which punctuation you keep, and a sentence or group of sentences which is pronounced with 17 syllables as a whole might not have words in it which add up to 17 syllables, or it might, but only if you keep a given punctuation mark at the start of one line or the end of the previous. There are therefore many cases where the speech synthesis says the syllable count of each line is wrong but the sum of the words is correct, or vice versa, and I had to make some decisions on which of those to keep. I’ve made better decisions in this version than the last one, but I may well change things in the next version if it gives better results.
Here are some interesting examples of words which are pronounced differently depending on punctuation or context:
|ooohh||Pronounced with one syllable, as you would expect|
|ooohh.||Pronounced with one syllable, as you would expect|
|ooohh..||Spelled out (Oh oh oh aitch aitch)|
|ooohh…||Pronounced with one syllable, as you would expect|
|H H||Pronounced aitch aitch|
|H H H||Pronounced aitch aitch aitch|
|H H H H H H H H||Pronounced aitch aitch aitch|
|Da-da-de-de-da||Pronounced with five syllables, roughly as you would expect|
|Da-da-de-de-da-||Pronounced dee-ay-dash-di-dash-di-dash-di-dash-di-dash. The dashes are pronounced for anything with hyphens in it that also ends in a hyphen, despite the fact that when splitting Da-da-de-de-da-de-da-de-da-de-da-de-da-da-de-da-da into a haiku, it’s correct punctuation to leave the hyphen at the end of the line:
Though in a different context, where – is a minus sign, and meant to be pronounced, it might need to go at the start of the next line. Greater-than and less-than signs have the same ambiguity, as they are not pronounced when they surround a single word as in an html tag, but are if they are unmatched or surround multiple words separated by spaces. Incidentally, surrounding da-da in angle brackets causes the dash to be pronounced where it otherwise wouldn’t be.
|U.S or u.s||Pronounced you dot es (this way, domain names such as angelastic.com are pronounced correctly.)|
|U.S. or u.s.||Pronounced you es|
|US||Pronounced you es, unless in a capitalised sentence such as ‘TAKE US AWAY’, where it’s pronounced ‘us’|
I also discovered what I’m pretty sure is a bug, and I’ve reported it to Apple. If two carriage returns (not newlines) are followed by any integer, then a dot, then a space, the number is pronounced ‘zero’ no matter what it is. You can try it with this file; download the file, open it in TextEdit, select the entire text of the file, then go to the Edit menu, Speech submenu, and choose ‘Start Speaking’. Quite a few haiku were missed or spuriously found due to that bug, but I happened to find it when trimming out harmless whitespace.
Apart from that bug, it’s all very clever. Note how even without the correct punctuation, it pronounces the ‘dr’s and ‘st’s in this sentence correctly:
the dr who lives on rodeo dr who is better than the dr I met on the st john’s st turnpike
However, it pronounces the second ‘st’ as ‘saint’ in the following:
the dr who lives on rodeo dr who is better than the dr I met in the st john’s st john
This is not just because it knows there is a saint called John; strangely enough, it also gets this one wrong:
the dr who lives on rodeo dr who is better than the dr I met in the st john’s st park
I could play with this all day, or all night, and indeed I have for the last couple of days, but now it’s your turn. Download the new Haiku Detector and paste your favourite novels, theses, holy texts or discussion threads into it.
If you don’t have a Mac, you’ll have to make do with a few more haiku from the New Scientist special issue on the brain which I mentioned in the last post:
Being a baby
is like paying attention
with most of our brain.
But that doesn’t mean
there isn’t a sex difference
in the brain,” he says.
They may even be
a different kind of cell that
just looks similar.
It is easy to
see how the mind and the brain
We like to think of
ourselves as rational and
It didn’t seem to
matter that the content of
these dreams was obtuse.
I’d like to thank the people of the xkcd Time discussion thread for writing so much in so many strange ways, and especially Sciscitor for exporting the entire thread as text. It was the test data set that kept on giving.
I’ve been thinking of getting my robot choir (an app I wrote to make my Mac’s speech synthesis sing) to sing Jonathan Coulton covers for a while, but as many of his songs involve robots, singing them with a robot voice forces a change of perspective. I rewrote Better to be from the perspective of a robot whose partner is becoming human, rather than a human whose partner is becoming a robot. Here‘s a rough recording of it using the Trinoids voice and the karaoke file for the song:
Where did we go?
When was the moment that we came unplugged?
I think I know.
In fact I am sure ’cause I’ve had your chips bugged.
I remember the first big surprise,
the day you came home with your infant-bred eyes.
I looked inside them and lased you a note
but your return signal was smoke.
But it’s not smoke, it’s fire,
and your burning desire
to turn into something
that I don’t require.
You used to be OK
and I liked you that way,
but I don’t think that I like you better.
No I don’t think that I like you better.
Started out small:
some lungs and a heart and your lasers unwired.
Now you’re just six feet tall.
Even when fully charged your organics get tired.
And I’m tired of the evenings I spend
making small talk with your new human friends
and their stupid insistence on blocking my lasers
when they know I know the three laws.
And you climbed the wrong way out
of the uncanny gorge.
You went from bad data
to bad Geordi La Forge.
You used to be OK
and I liked you that way,
but I don’t think that I like you better.
No I don’t think that I like you better.
So that’s how it goes.
Tap my interface once if you still understand.
No data flows.
Wait, are your digits just five on your hand?
I can tell by your insider art
There’s more than a pump in your chimpanzee heart.
I tried to reason, but something’s gone wrong.
Why am I singing a song?
Well, I like to think different, but it’s not quite the same.
If this is a trojan, I know who to blame.
You used to be flawless; now you’re F-ing lawless,
and I don’t think that I like you better.
No I don’t think that I like you better.
Some lines stay close to the original when I perhaps should have struck out and gone with something completely different. If you have any suggestions, let me know; the beauty of robots is I can change the words and make a new recording in seconds.
The tune is based on Spektugalo’s UltraStar file for that song. I had to make some changes to the robot choir to handle the one-beat gaps between notes, and I made a few tweaks to timing after that, which probably messed up more than they fixed. When I started writing this parody, I assumed I had the source tracks of the original song to work with, but it turns out that song is not on JoCo Looks Back, so all I have is the karaoke version with some backing vocals. I’ve turned the volume of my vocals way up, both so they’re easier to understand and to obscure the backing vocals more when the lyrics are not the same. Consider this a demo.
On the subject of cruises, I’ve just had some copies of my They Might Not Be Giants poster printed, and I’ll bring them with me on JoCo Cruise Crazy. If you are going on the cruise (or will just be in the area the night before) and you would like to buy one from me for less than it would cost through Zazzle, let me know and I’ll make sure I bring one for you. I can’t sell them on board the ship due to the cruise line needing a cut, but I can do so at the hotel before the cruise, the cruise port or airport after, or we can work out some kind of trade involving upcharged food or drink on the ship. They are A3 sized (just a tiny bit smaller than 11×17 inches) and printed beautifully on 300gsm silk-coated paper.
When Europeans colonised New Zealand, they brought not only mammals to drive many of the native birds to extinction, but also their religion to exterminate the native theodiversity. This began with Reverend Samuel Marsden on Christmas Day 1814, and there is a Christmas carol about it called Te Harinui. Since it just turned Christmas day about an hour ago in New Zealand, here‘s a recording of Te Harinui I just made.
It’s sung by the voice Vicki from my robot choir (an app I wrote to make my Mac sing using the built-in speech synthesis.) It has a couple of little glitches, and I couldn’t get it to pronounce the Māori words exactly right, but otherwise, I think this is the best Vicki has ever sounded. Usually I switch to Victoria because Vicki’s singing sounds weird. I made a couple of tweaks to the time allocated to consonants, and I think they helped. I used the music in the New Zealand Folk Song page, with a few small changes to the ‘glad tidings’ line to make it sound more like how I remember it.
You can see the effect of widespread hemispherism in the fact that the song opens by saying it isn’t snowy, as if being snowy were the default state and any deviation from it must be called out.
Now, I must get some good Christmas sleep.
I felt a bit bad about having to truncate the full-length instrumental that Colleen and Joseph made for JoCo Day is Wunnerful, so, having already taught my robot choir the main melody, I decided to record my own cover of Christmastime is Wunnerful. I was toying with the idea of making it a mashup with Jonathan Coulton’s other Christmas classic, Chiron Beta Prime (since the source tracks for that are available), when I realised that even without modification, Christmastime is Wunnerful is quite amusing to listen to while watching Tom Ellsworth‘s music video for Chiron Beta Prime. So I decided to edit that video (with Tom’s permission) to match my cover. Here is the result:
For comparison, here’s the original Chiron Beta Prime video. I didn’t have to change very much, really:
I had to pretty much abandon the ‘daily’ part of Holidailies because ended up flyng to New Zealand, which in itself takes more than a day without internet. But here’s some more holiday for you.
The voices I used were, in order of appearance:
Adult human male: Alex
Standard robots: Zarvox
Festive holiday figure robots for the purposes of augmenting human morale and productivity: Trinoids
Adult human female: Victoria
Human male emulation for the purposes of undetectable redaction: Ralph
Juvenile human: Junior
I also used the bells and ‘Message redacted’ tracks from Chiron Beta Prime, and the ‘Machines’ track from The Future Soon.
I originally wrote Haiku Detector because my friend Gry saw Times Haiku and wondered whether there were any haiku in her Ph. D. thesis. The other day I heard back about the haiku she found. It turns out that even the title of the thesis is a haiku:
studies of the extremes of
Here’s another one, which could be about anything. The last line is a bit of an anticlimax.
As of today, the
origin of this strength is
not well understood.
When I read this one, I wondered if miniball was a mini-golf style version of another ball game:
the MINIBALL would be used
for the same purpose.
are easily seen.
After seeing these, I sent her the as-yet-unreleased new version of Haiku Detector, which can detect haiku made up of several sentences. Having mostly had my name on papers authored by the entire CMS collaboration, I expected her to find a lot of haiku in the author list. But ISOLDE is much smaller, and also this is her thesis that she wrote, not some paper whose author list she got tacked onto. So she got some from references:
Goko, H. Toyokawa,
K. Yamada, T.
and some things with section numbers tacked on:
Open shell nuclei and
This matrix is the
starting point for the Oslo
That last one has so many possibilities. I like to think of it as being about an electronic band called The Oslo Method which released a 45rpm record about The Matrix. Unfortunately, nobody can be told what the haiku is. You have to see it for yourself. And indeed, you can see the other haiku she found on the #MyHaikuThesis tag on Twitter.
I noticed something interesting while writing this post — some of the ‘haiku’ Gry found include gamma (γ) symbols:
The γ-ray strength functions
display no strong enhancement
for low γ energies.
Haiku Detector on her Mac has treated them as having zero syllables, as if they are not pronounced, and I think I recall characters like that not being pronounced in the Princeton Companion to Mathematics. But I just checked on my Mac running Mac OS X Yosemite, and the speech synthesis (which Haiku Detector relies on for syllable counting) pronounces γ as ‘Greek small letter gamma’, so Haiku Detector does not find those erroneous haiku. I think that this might be a new feature in Yosemite.
But here’s where it gets weird: you’d think that it’s just reading ‘Greek small letter gamma’ because that’s the unicode name of the character. I tried with a few emoji and other special characters, and that hypothesis is upheld. But the unicode character named ‘chicken’ (🐔) is pronounced ‘chicken head’. Spooky. Another strange thing is that there is no unicode ‘duck’ character.
If you’ve been paying attention, you probably know why I happened to come across those oddities. I’ll have to investigate them later, though; right now I’m in Edinburgh for NSScotland, and it’s about time I looked at some tourism information.
So, Haiku Detector; what now? Maybe look for supersymmetric haiku?
Update: It seems that in Mac OS X 10.8, γ is not pronounced, and 🐔 is pronounced ‘chicken emoji’. Other emoji also have ’emoji’ in their pronunciations, while still others are not pronounced. I wonder if pronunciations were added (and later edited to remove the ’emoji’) for certain emoji, and now the default pronunciation has changed from nothing to the unicode name. So ‘🐔’ ended up with the explicit pronunciation ‘chicken head’ while others which were not previously pronounced use their unicode names. So this should be a haiku in Yosemite, though for some reason Haiku Detector does not detect it:
I am learning about four-part harmonies, so I wrote and recorded [mp3] a short song about self-confidence and poop. Anyone with a head and a butt should understand; understand also that I do not condone headbutting buttheads. These are four voices that might accumulate in one’s head as a child grows up and vacillates between self-confidence and self-doubt.
Here are the lyrics:
Soprano: Look how in-control my bowel is. Clearly I know where my towel is.
Alto: What if all I do is shit? How do they put up with it?
Tenor: Push and push and I’ll improve. Know my shit, my bowel will move.
Bass: Everyone poops.
All: If everyone poops…
Soprano & Tenor: Maybe I’m no better than them.
Alto & Bass: Maybe I’m no worse than them.
All: Maybe I am just as good.
It is sung by my robot choir (a program I wrote to make my Mac sing using the built-in speech synthesis), with the voice Princess as the soprano, Victoria as alto, Fred as tenor and Ralph as the bass, unless I’ve misunderstood how the parts are named or which octaves they were meant to be singing in, which is entirely likely after one half-hour lesson on the topic.
I’ve mentioned before that I’m doing music lessons with John Anealio over the internet. A couple of weeks ago I decided I wanted to learn about harmonies. We picked out some chords and random and then decided which notes each voice would sing from them. I checked out what they sounded like using instruments in GarageBand, then I decided I may as well write some words with it, with each voice singing the same sequence of notes over and over. I remember thinking about making them conflicting inner voices, but I’m not sure what made me decide that those inner voices were full of shit. Of course, I can’t tell whether this song is shit, good shit, horse shit, or the shit; when it comes to music, I’m still figuring out how not to soil myself. But it’s about poop, so it ought to entertain someone.
One of these days I’ll find a more convenient way to host podcasts so that I actually bother to put things like this on mine.
This is the fourth in a series called ‘Forms and Formulae‘ in which I write about articles in the Princeton Companion to Mathematics using poetic forms covered by articles in the Princeton Encyclopedia of Poetry and Poetics. This post’s mathematics article is entitled ‘The General Goals of Mathematical Research‘ and the poetic form is alba, which is a kind of song; I recorded it [direct mp3 link] using my robot choir and some newfound musical knowledge, and there are many notes on that after the lyrics below.
Here are some extracts from the article on the alba, explaining the features that I ended up using:
A dawn song about adulterous love, expressing one or both lovers’ regret over the coming of dawn after a night of love. A third voice, a watchman, may announce the coming of dawn and the need for the lovers to separate. An Occitan alba may contain a dialogue (or serial monologues) between lover and beloved or a lover and the watchman or a combination of monologue with a brief narrative intro.
The alba has no fixed metrical form, but in Occitan each stanza usually ends with a refrain that contains the word alba.
…the arrival of dawn signaled by light and bird’s song…
The watchman plays an important role as mediator between the two symbolic worlds of night (illicit love in an enclosed space) and day (courtly society, lauzengiers or evil gossips or enemies of love)
I based the song on section 8.3 of the article, entitled ‘Illegal Calculations‘. In retrospect, using the word alba in each refrain (are these even refrains?) doesn’t make much sense, since I’m not writing in Occitan, and the casual listener will not know that alba means ‘dawn’ in Occitan. But hey, it kind of rhymes with the start of ‘self-avoiding walk‘. How can I not rhyme an obscure foreign word with an obscure mathematical concept?
Mathematicians struggle even today to learn about the average distance between the endpoints of a self-avoiding walk. French physicist Pierre-Gilles de Gennes found answers by transforming the problem into a question about something called the n-vector model when the n is zero. But since this implies vectors with zero dimensions, mathematicians reject the approach as non-rigorous. Here we find that zero waking up next to its cherished n-vector model after a night of illicit osculation.
I am just a zero; I am hardly worth a mention.
I null your vector model figure, discarding your dimension,
and every night I’m here with you I fear the break of day,
when day breaks our veneer of proof, and we must go away.
Here by your side
till alba warns the clock.
Fear’s why I hide
in a self-avoiding walk.
Let the transformations of De Gennes show your place.
Never let them say we’re a degenerate case.
When I’m plus-two-n there’s just too many ways to move,
But you’re my sweetest nothing and we’ve got nothing to prove.
Here by your side
till alba warms the clock.
Fear can’t divide;
it’s a self-avoiding walk.
The sun has come; your jig is up. It’s time for peer review.
You think your secret union has engendered something new.
You thought you would both find a proof, but is it you’re confusing
The sorta almost kinda-truths the physicists are using?
That’s not rigorous,
says alba’s voice in shock.
All but meaningless
to the self-avoiding walk.
Zero and N-vector model together:
If you say that our results don’t matter,
then go straight to find a better path.
For as long as you insult our data,
Is it wrong to say you’re really math?
Hey there, Rigorous
at alba poised in shock,
you are just like us,
in a self-avoiding walk.
All voices are built-in Mac text-to-speech voices, some singing thanks to my robot choir (a program I wrote to make the Mac sing the tunes and lyrics I enter, which still needs a lot of work to be ready for anyone else to use.) Older voices tend to sound better when singing than the newer ones, and many new voices don’t respond to the singing commands at all, particularly those with non-US accents. So for the introduction I took the opportunity to use a couple of those non-US voices. These are the voices used:
Introduction: Tessa (South African English) and, since I also can’t fine-tune Tessa’s pronunciation of ‘Pierre-Gilles de Genne’, Virginie (French from France)
N-vector Model: Kathy
Most of the bird noises come from the end of Jonathan Coulton’s ‘Blue Sunny Day‘, and I can use them because they’re either Creative Commons licensed or owned by the birds. The two peacock noises are from a recording by junglebunny. Free Birds!
As I mentioned, I’ve been learning about songwriting from John Anealio, and since the Forms and Formulae project sometimes requires me to write songs, I’m putting the new knowledge into practice sooner than I expected. This song uses several musical things I’ve never tried before, which is quite exciting, but it also means I probably didn’t do them very well, because there’s only so much I can learn in a couple of months of half-hour weekly lessons. I welcome friendly criticism and advice. The new things are: Read the rest of this entry »
Back in late February, my friend Alice sent an email asking people to cover The Doubleclicks’ Nerdy Birthday Song for Sara Chicazul, who had a birthday on JoCo Cruise Crazy 2 but not on JoCo Cruise Crazy 3. The idea was to put up one per day, so that she could experience the thrill and horror (previously reserved for Mike Phirman) of having a birthday every day. A lot of people did. I don’t usually sing anything more melodic than Chicken Monkey Duck when people can hear me, so I figured I’d dust off my robot choir (a little program I wrote to take text and a tune played on my MIDI keyboard, and turn it into TUNE commands to make the built-in Mac speech synthesis sing) and record a cover that way. It took a fair bit of dusting off, what with a new version of XCode and of the MIDI framework I used, and I think the metaphorical dust mites gave me cold-like symptoms, which is why I haven’t posted anything for a while. Anyway, today I finally recorded a cover, and here it is. Given that the third thing I ever recorded using my robot choir was my Macs singing Happy Birthday to the London Science Museum, I think I may as well rename my robot choir to ‘The Phirmanator’.
This recording starts off with just the Victoria voice singing, then at the first ‘you’re getting older’, Vicki joins in. I have a cameo saying ‘everybody!’ and then Agnes joins in and all three voices get a gospel choir effect. I added Zarvox (an intentionally robotic voice) at the end, partly because I thought it would be funny, and partly because Vicki sounds awful holding that ‘all’ note and I wanted to make up for her being so much quieter in that part. I noticed part of the way through that I’d used the wrong notes in a few places, so I fixed those, but there are probably others. I don’t know how to make music; I just know how to turn MIDI notes into frequencies. Also, I can barely even play my rainstick, let alone a stringed instrument, so it’s a robocoppella. I timed everything to synch up with the original song, and it sounds kind of nice if you play both together. By itself, well… it sounds like autotune became sentient and killed all the human singers.