Mathematician and linguist by training, programmer by trade, physicist by association, writer by neglecting everything else.
Posted in News on December 3, 2020
Edit: As of 8 January, 2021, @realdonaldtrump is no longer a Twitter user, but he was at the time of this post.
Version 2.0.1 of my iOS app NastyWriter has 184 different insults (plus two extra special secret non-insults that appear rarely for people who’ve paid to remove ads 🤫) which it can automatically add before nouns in the text you enter. “But Angela,” I hear you not asking, “you’re so incredibly nice! How could you possibly come up with 184 distinct insults?” and I have to admit, while I’ve been known to rap on occasion, I have not in fact been studying the Art of the Diss — I have a secret source. (This is a bonus joke for people with non-rhotic accents.)
My secret source is the Trump Twitter Archive. Since NastyWriter is all about adding gratuitous insults immediately before nouns, which Twitter user @realdonaldtrump is such a dab hand at, I got almost all of the insults from there. But I couldn’t stand to read it all myself, so I wrote a Mac app to go through all of the tweets and find every word that seemed to be an adjective immediately before a noun. I used NSLinguisticTagger, because the new Natural Language framework did not exist when I first wrote it.
Natural language processing is not 100% accurate, because language is complicated — indeed, the app thought ‘RT’, ‘bit.ly’, and a lot of twitter @usernames (most commonly @ApprenticeNBC) and hashtags were adjectives, and the usernames and hashtags were indeed used as adjectives (usually noun adjuncts) e.g. in ‘@USDOT funding’. One surprising supposed adjective was ‘gsfsgh2kpc’, which was in a shortened URL mentioned 16 times, to a site which Amazon CloudFront blocks access to from my country.
For each purported adjective the app found, I had a look at how it was used before adding it to NastyWriter’s insult collection. Was it really an adjective used before a noun? Was it used as an insult? Was it gratuitous? Were there any other words it was commonly paired with, making a more complex insult such as ‘totally conflicted and discredited’, or ‘frumpy and very dumb’? Was it often in allcaps or otherwise capitalised in a specific way?
But let’s say we don’t care too much about that and just want to know roughly which adjectives he used the most. Can you guess which is the most common adjective found before a noun? I’ll give you a hint: he uses it a lot in other parts of sentences too. Here are the top 35 as of 6 November 2020:
- ‘great’ appears 4402 times
- ‘big’ appears 1351 times
- ‘good’ appears 1105 times
- ‘new’ appears 1034 times
- ‘many’ appears 980 times
- ‘last’ appears 809 times
- ‘best’ appears 724 times
- ‘other’ appears 719 times
- ‘fake’ appears 686 times
- ‘American’ appears 592 times
- ‘real’ appears 510 times
- ‘total’ appears 509 times
- ‘bad’ appears 466 times
- ‘first’ appears 438 times
- ‘next’ appears 407 times
- ‘wonderful’ appears 375 times
- ‘amazing’ appears 354 times
- ‘only’ appears 325 times
- ‘political’ appears 310 times
- ‘beautiful’ appears 298 times
- ‘fantastic’ appears 279 times
- ‘tremendous’ appears 270 times
- ‘massive’ appears 268 times
- ‘illegal’ appears 254 times
- ‘incredible’ appears 254 times
- ‘nice’ appears 251 times
- ‘strong’ appears 250 times
- ‘greatest’ appears 248 times
- ‘true’ appears 247 times
- ‘major’ appears 243 times
- ‘same’ appears 236 times
- ‘terrible’ appears 231 times
- ‘presidential’ appears 221 times
- ‘much’ appears 217 times
- ‘long’ appears 215 times
So as you can see, he doesn’t only insult. The first negative word, ‘fake’, is only the ninth most common, though more common than its antonyms ‘real’ and ‘true’, if they’re taken separately (‘false’ is in 72nd position, with 102 uses before nouns, while ‘genuine’ has only four uses.) And ‘illegal’ only slightly outdoes ‘nice’.
He also talks about American things a lot, which is not surprising given his location. ‘Russian’ comes in 111st place, with 62 uses, so about a tenth as many as ‘American’. As far as country adjectives go, ‘Iranian’ is next with 40 uses before nouns, then ‘Mexican’ with 39, and ‘Chinese’ with 37. ‘Islamic’ has 33. ‘Jewish’ and ‘White’ each have 27 uses as adjectives before nouns, though the latter is almost always describing a house rather than people. The next unequivocally racial (i.e. referring to a group of people rather than a specific region) adjective is ‘Hispanic’, with 25. I’m not an expert on what’s unequivocally racial, but I can tell you that ‘racial’ itself has nine adjectival uses before nouns, and ‘racist’ has three.
“But Angela,” I hear you not asking, “why are you showing us a list of words and numbers? Didn’t you just make an audiovisual word cloud generator a few months ago?” and the answer is, yes, indeed, I did make a word cloud generator that makes visual and audio word clouds, So here is an audiovisual word cloud of all the adjectives found at least twice before nouns in tweets by @realdonaldtrump in The Trump Twitter Archive, with Twitter usernames filtered out even if they are used as adjectives. More common words are larger and louder. Words are panned left or right so they can be more easily distinguished, so this is best heard in stereo.
There are some nouns in there, but they are only counted when used as attributive nouns to modify other nouns, e.g. ‘NATO countries’, or ‘ObamaCare website’.
Posted in NastyWriter on November 28, 2020
I came upon a secret stash of free time, so I finally put finishing touches on the Siri Shortcuts I’d added to NastyWriter, made the app work properly in Dark Mode, added the latest gratuitous insults harvested from Twitter (I’ll write another post about how I did that), and released it. Then somebody pointed out something that still didn’t work in Dark Mode, so I fixed that and a few related things, and released it again. Thus NastyWriter’s version number (2.0.1) is the reverse of what it was before (1.0.2.)
I added Siri Shortcuts to NastyWriter soon after iOS 12 came out, just to learn a bit about them. You can add a shortcut with whatever text you’ve entered, and then run the shortcut whenever you like to get a freshly-nastified version of the same text.
There’s also a ‘Give me an insult’ shortcut (which you can find in the Shortcuts app) which just gives a random insult, surrounded by unpleasant emoji.
As I added these soon after iOS 12 came out, they don’t support parameters, which are new in iOS 13. I may work on that next, so you’ll be able to nastify text on the fly, or nastify the output from another shortcut as part of a longer workflow.
Since Tom Lehrer recently released all his music and lyrics into the public domain, I took this opportunity to update the screenshots of NastyWriter in the App Store to show Tom Lehrer’s song ‘She’s My Girl’ where they had previously shown Shakespeare’s Sonnet 18. You can read a full nastification of this on the NastyWriter tumblr.
A few weeks ago I posted a video of myself talking about the time Steve Wozniak gave me a laptop, and I said:
A few years later, I met Woz, had pizza and learnt to Segway with him, and watched him play Tetris and pranks all through a concert of The Dead, but that will probably be a different 18-minute video.
Well, last week I indeed recorded an 18-minute video about the time I met Woz; the raw video was coincidentally imported into Photos at the same minute of the day as the previous one, and was one second longer than it.
The final video, with turning the camera on and off trimmed out, is two seconds longer than the previous one.
The background is a little blurry, but in the first take the entire picture was blurry, so in comparison, a little artful background blur is fine.
The short version: I met a friend of Woz by complaining by email that the lights were turned off in Woz’s office, and then met that friend in San Francisco when I went there for WWDC 2004. We met Woz, who had flashing lights in his teeth, at a pizza restaurant, and then went to a concert, where we rode Segways and Woz confused people by flashing tooth lights and lasers at them while playing Tetris.
Here’s a playlist with both of my Woz stories. Perhaps this will be the start of a series of 18-minute videos about my ridiculous life, or perhaps not. I don’t have any more Woz stories, but I do have more stories.
People seem to enjoy hearing this story, and Woz’s 70th birthday seems like a good occasion to tell it to more people. I in a lot of details of varying relevance (and was looking down at notes on my iPad a bit to keep track of them), because it my video and I may as well tell it my own way. But if you don’t have eighteen minutes to spare, there’s a short version in the next paragraph (to avoid spoilers.)
The short version: My then-boyfriend left my PowerBook in a phone booth, the PowerBook was held for ransom and not recovered, and meanwhile my sister emailed the Woz (who knew of me from having called me on my birthday half a year earlier) and offered to buy me a replacement.
A few years later, I met Woz, had pizza and learnt to Segway with him, and watched him play Tetris and pranks all through a concert of The Dead, but that will probably be a different 18-minute video.
If you think my life is ridiculous, well, you’re right, but also, you should see Steve Wozniak’s life! (His autobiography, iWoz, would be a great book to read to a cool kid at bedtime.) And check out the events, challenges, and fundraising going on at wozbday.com.
Posted in News on August 1, 2020
For my comprehensive channel trailer, I created a word cloud of the words used in titles and descriptions of the videos uploaded each month. Word clouds have been around for a while now, so that’s nothing unusual. For the soundtrack, I wanted to make audio versions of these word clouds using text-to-speech, with the most common words being spoken louder. This way people with either hearing or vision impairments would have a somewhat similar experience of the trailer, and people with no such impairments would have the same surplus of information blasted at them in two ways.
I checked to see if anyone had made audio word clouds before, and found Audio Cloud: Creation and Rendering, which makes me wonder if I should write an academic paper about my audio word clouds. That paper describes an audio word cloud created from audio recordings using speech-to-text, while I wanted to create one from text using text-to-speech. I was mainly interested in any insights into the number of words we could perceive at once at various volumes or voices. In the end, I just tried a few things and used my own perception and that of a few friends to decide what worked. Did it work? You tell me.
There’s a huge variety of English voices available on macOS, with accents from Australia, India, Ireland, Scotland, South Africa, the United Kingdom, and the United States, and I’ve installed most of them. I excluded the voices whose speaking speed can’t be changed, such as Good News, and a few novelty voices, such as Bubbles, which aren’t comprehensible enough when there’s a lot of noise from other voices. I ended up with 30 usable voices. I increased the volume of a few which were harder to understand when quiet.
I wondered whether it might work best with only one or a few voices or accents in each cloud, analogous to the single font in each visual word cloud. That way people would have a little time to adapt to understand those specific voices rather than struggling with an unfamiliar voice or accent with each word. On the other hand, maybe it would be better to have as many voices as possible in each word cloud so that people could distinguish between words spoken simultaneously by voice, just as we do in real life. In the end I chose the voice for each word randomly, and never got around to trying the fewer-distinct-voices version. Being already familiar with many of these voices, I’m not sure I would have been a good judge of whether that made it easier to get used to them.
Arranging the words
It turns out making an audio word cloud is simpler than making a visual one. There’s only one dimension in an audio word cloud — time. Volume could be thought of as sort of a second dimension, as my code would search through the time span for a free rectangle of the right duration with enough free volume. I later wrote an AppleScript to create ‘visual audio word clouds’ in OmniGraffle showing how the words fit into a time/volume rectangle. I’ve thus illustrated this post with a visual word cloud of this post, and a few audio word clouds and visual audio word clouds of this post with various settings.
However, words in an audio word cloud can’t be oriented vertically as they can in a visual word cloud, nor can there really be ‘vertical’ space between two words, so it was only necessary to search along one dimension for a suitable space. I limited the word clouds to five seconds, and discarded any words that wouldn’t fit in that time, since it’s a lot easier to display 301032 words somewhat understandably in nine minutes than it is to speak them. I used the most common (and therefore louder) words first, sorted by length, and stopped filling the audio word cloud once I reached a word that would no longer fit. It would sometimes still be possible to fit a shorter, less common word in that cloud, but I didn’t want to include words much less common than the words I had to exclude.
I set a preferred volume for each word based on its frequency (with a given minimum and maximum volume so I wouldn’t end up with a hundred extremely quiet words spoken at once) and decided on a maximum total volume allowed at any given point. I didn’t particularly take into account the logarithmic nature of sound perception. I then found a time in the word cloud where the word would fit at its preferred volume when spoken by the randomly-chosen voice. If it didn’t fit, I would see if there was room to put it at a lower volume. If not, I’d look for places it could fit by increasing the speaking speed (up to a given maximum) and if there was still nowhere, I’d increase the speaking speed and decrease the volume at once. I’d prioritise reducing the volume over increasing the speed, to keep it understandable to people not used to VoiceOver-level speaking speeds. Because of the one-and-a-bit dimensionality of the audio word cloud, it was easy to determine how much to decrease the volume and/or increase the speed to fill any gap exactly. However, I was still left with gaps too short to fit any word at an understandable speed, and slivers of remaining volume smaller than my per-word minimum.
I experimented with different minimum and maximum word volumes, and maximum total volumes, which all affected how many voices might speak at once (the ‘hubbub level’, as I call it). Quite late in the game, I realised I could have some voices in the right ear and some in the left, which makes it easier to distinguish them. In theory, each word could be coming from a random location around the listener, but I kept to left and right — in fact, I generated separate left and right tracks and adjusted the panning in Final Cut Pro. Rather than changing the logic to have two separate channels to search for audio space in, I simply made my app alternate between left and right when creating the final tracks. By doing this, I could increase the total hubbub level while keeping many of the words understandable. However, the longer it went on for, the more taxing it was to listen to, so I decided to keep the hubbub level fairly low.
The algorithm is deterministic, but since voices are chosen randomly, and different voices take different amounts of time to speak the same words even at the same number of words per minute, the audio word clouds created from the same text can differ considerably. Once I’d decided on the hubbub level, I got my app to create a random one for each month, then regenerated any where I thought certain words were too difficult to understand.
In my visual word clouds, I kept the algorithm case-sensitive, so that a word with the same spelling but different capitalisation would be counted as a separate word, and displayed twice. There are arguments for keeping it like this, and arguments to collapse capitalisations into the same word — but which capitalisation of it? My main reason for keeping the case-sensitivity was so that the word cloud of Joey singing the entries to our MathsJam Competition Competition competition would have the word ‘competition’ in it twice.
Sometimes these really are separate words with different meanings (e.g. US and us, apple and Apple, polish and Polish, together and ToGetHer) and sometimes they’re not. Sometimes these two words with different meanings are pronounced the same way, other times they’re not. But at least in a visual word cloud, the viewer always has a way of understanding why the same word appears twice. For the audio word cloud, I decided to treat different capitalisations as the same word, but as I’ve mentioned, capitalisation does matter in the pronunciation, so I needed to be careful about which capitalisation of each word to send to the text-to-speech engine. Most voices pronounce ‘JoCo’ (short for Jonathan Coulton, pronounced with the same vowels as ‘go-go’) correctly, but would pronounce ‘joco’ or ‘Joco’ as ‘jocko’, with a different vowel in the first syllable. I ended up counting any words with non-initial capitals (e.g. JoCo, US) as separate words, but treating title-case words (with only the initial letter capitalised) as the same as all-lowercase, and pronouncing them in title-case so I wouldn’t risk mispronouncing names.
A really smart version of this would get the pronunciation of each word in context (the same way my rhyming dictionary rhyme.science finds rhymes for the different pronunciations of homographs, e.g. bow), group them by how they were pronounced, and make a word cloud of words grouped entirely by pronunciation rather than spelling, so ‘polish’ and ‘Polish’ would appear separately but there would be no danger of, say ‘rain’ and ‘reign’ both appearing in the audio word cloud and sounding like duplicates. However, which words are actually pronounced the same depend on the accent (e.g. whether ‘cot’ and ‘caught’ sound the same) and text normalisation of the voice — you might have noticed that some of the audio word clouds in the trailer have ‘aye-aye’ while others have ‘two’ for the Roman numeral ‘II’.
Similarly, a really smart visual word cloud would use natural language processing to separate out different meanings of homographs (e.g. bow🎀, bow🏹, bow🚢, and bow🙇🏻♀️) and display them in some way that made it obvious which was which, e.g. by using different symbols, fonts, styles, colours for different parts of speech. It could also recognise names and keep multi-word names together, count words with the same lemma as the same, and cluster words by semantic similarity, thus putting ‘Zoe Keating’ near ‘cello’, and ‘Zoe Gray’ near ‘Brian Gray’ and far away from ‘Blue’. Perhaps I’ll work on that next.
I’ve recently been updated to a new WordPress editor whose ‘preview’ function gives a ‘page not found’ error, so I’m just going to publish this and hope it looks okay. If you’re here early enough to see that it doesn’t, thanks for being so enthusiastic!
Posted in News on July 14, 2020
A few months ago I wrote an app to download my YouTube metadata, and I blogged some statistics about it and some haiku I found in my video titles and descriptions. I also created a few word clouds from the titles and descriptions. In that post, I said:
Next perhaps I’ll make word clouds of my YouTube descriptions from various time periods, to show what I was uploading at the time. […] Eventually, some of the content I create from my YouTube metadata will make it into a YouTube video of its own — perhaps finally a real channel trailer.Me, two and a third months ago
TL;DR: I made a channel trailer of audiovisual word clouds showing each month of uploads:
It seemed like the only way to do justice to the number and variety of videos I’ve uploaded over the past thirteen years. My channel doesn’t exactly have a content strategy. This is best watched on a large screen with stereo sound, but there is no way you will catch everything anyway. Prepare to be overwhelmed.
Now for the ‘too long; don’t feel obliged to read’ part on how I did it. I’ve uploaded videos in 107 distinct months, so creating a word cloud for each month using wordclouds.com seemed tedious and slow. I looked into web APIs for creating word clouds automatically, and added the code to my app to call them, but then I realised I’d have to sign up for an account, including a payment method, and once I ran out of free word clouds I’d be paying a couple of cents each. That could easily add up to $5 or more if I wanted to try different settings! So obviously I would need to spend many hours programming to avoid that expense.
I have a well-deserved reputation for being something of a gadget freak, and am rarely happier than when spending an entire day programming my computer to perform automatically a task that it would otherwise take me a good ten seconds to do by hand. Ten seconds, I tell myself, is ten seconds. Time is valuable and ten seconds’ worth of it is well worth the investment of a day’s happy activity working out a way of saving it.Douglas Adams in ‘Last chance to see…’
I searched for free word cloud code in Swift, downloaded the first one I found, and then it was a simple matter of changing it to work on macOS instead of iOS, fixing some alignment issues, getting it to create an image instead of arranging text labels, adding some code to count word frequencies and exclude common English words, giving it colour schemes, background images, and the ability to show smaller words inside characters of other words, getting it to work in 1116 different fonts, export a copy of the cloud to disk at various points during the progress, and also create a straightforward text rendering using the same colour scheme as a word cloud for the intro… before I knew it, I had an app that would automatically create a word cloud from the titles and descriptions of each month’s public uploads, shown over the thumbnail of the most-viewed video from that month, in colour schemes chosen randomly from the ones I’d created in the app, and a different font for each month. I’m not going to submit a pull request; the code is essentially unrecognisable now.
In case any of the thumbnails spark your curiosity, or you just think the trailer was too short and you’d rather watch 107 full videos to get an idea of my channel, here is a playlist of all the videos whose thumbnails are shown in this video:
It’s a mixture of super-popular videos and videos which didn’t have much competition in a given month.
Of course, I needed a soundtrack for my trailer. Music wouldn’t do, because that would reduce my channel trailer to a mere song for anyone who couldn’t see it well. So I wrote some code to make an audio version of each word cloud (or however much of it could fit into five seconds without too many overlapping voices) using the many text-to-speech voices in macOS, with the most common words being spoken louder. I’ll write a separate post about that; I started writing it up here and it got too long.
The handwritten thank you notes at the end were mostly from members of the JoCo Cruise postcard trading club, although one came with a pandemic care package from my current employer. I have regaled people there with various ridiculous stories about my life, and shown them my channel. You’re all most welcome; it’s been fun rewatching the concert videos myself while preparing to upload, and it’s always great to know other people enjoy them too.
I put all the images and sounds together into a video using Final Cut Pro 10.4.8. This was all done on my mid-2014 Retina 15-inch MacBook Pro, Sneuf.
Posted in Haiku Detector on May 10, 2020
Since I wrote a little app to download much of my YouTube metadata, it was obvious that I needed to feed it through another little app I wrote: Haiku Detector. So I did. In all of my public YouTube descriptions put together, with URLs removed, there are 26 172 sentences, and 436 detected haiku.
As is usually the case, a few of these ‘haiku’ were not really haiku because the Mac speech synthesis pronounces them wrong, and thus Haiku Detector counts their syllables incorrectly. A few more involved sentences which no longer made sense because their URLs had been removed, or which were partial sentences from song lyrics which looked like full sentences because they were on lines of their own. Most of the rest just weren’t very interesting.
There were quite a lot of song lyrics which fit into haiku, which suggest tunes to which other haiku can be sung, if the stress patterns match up. I’m not going to put those here though; there are too many, and I could make a separate post about haiku in Jonathan Coulton lyrics, having already compiled a JoCorpus for rhyme.science to find rhymes in. So here are some other categories of haiku I liked. For lack of a better idea, I’ll link the first word of each one to the video it’s from.
Apologies about my camerawork
Also, there’s a lot
of background noise so the sound
isn’t very good.
There was a little
too much light and sound for my
poor little camera. 🙂
But hey, if I’d brought
my external microphone,
it would have got wet.
I’m so sad that I
had to change batteries or
something part-way through. 😦
Who do I look like,
Joe Covenant in Glasgow
Now the guitar is
out of tune and my camera
is out of focus.
Performers being their typical selves
they get around to singing
the song Cinnamon.
Aimee Mann asks John
Roderick to play one of
his songs (which he wrote.)
But first, he gives us
a taste of what he’s really
famous for: tuning.
And now he’s lost his
voice, so it’s going to be
great for everything.
Cody Wymore can’t
do a set without Stephen
Sondheim in it.
Cody horns in on
it anyway by adding
a piano part.
He pauses time for
a bit so nobody knows
he was unprepared.
It’s about being
in a room full of people
and feeling alone.
Paul and Storm:
Why does every new
verse of their song keep taking
them so goddamn long?
Little did I know
that four other people would
throw panties at Paul.
We’re gonna bring the
mood down a little bit, but
maybe lift it up!
Meanwhile, they have to
fix up the drums because I
guess they rocked too hard.
Zoe and Brian Gray:
It’s For the Glory
of Gleeble Glorp, which isn’t
Zoe Gray has to
follow Brian Gray’s songs from
He’s here to perform
for us an amazing act
of léger de main.
Travis gets up on
stage and holds a small doll’s head
in a creepy way.
which brings us to Jonathan Coulton:
He loves us and is
very glad to be with us.
This is Creepy Doll.
remarks on the lax rhyming
in God Save The Queen.
Jonathan will use
Jim’s capo, and he will give
it back afterwards.
Jonathan did not
know this was going to be
a cardio set.
That guy Paul has been
seeing every goddamned day
for the last two months.
talks about samples and tells
us what hiphop is.
It’s not because she’s
a lady, but because she’s
She feels like she should
get a guitar case, even
without a guitar.
Jon Spurney rocks out
on the guitar solo, as
he is wont to do.
at about 6:38,
we get to the point.
The ship’s IT guy:
He has been very
glad to meet us, but he’s not
sad to see us leave.
Red Team Leader:
Red Leader has some
announcements to make before
the final concert.
The Red Team didn’t
mind, because we’re the team that
All the JoCo Cruise performers in the second half of the last show:
Let’s bring Aimee Mann
back out to the stage to join
the Shitty Bar Band.
We now get into
the unrehearsed supergroup
section of the show.
JoCo Cruise hijinks
This is the last show,
unless we’re quarantined on
the ship for a while!
Half of those palettes
were 55-gallon drums
of caveat sauce.
This pun somehow leads
to a sad Happy Birthday
for Paul Sabourin.
Paul Sabourin points
out Kendra’s Glow Cloud dress in
the front row (all hail!)
They talk about why
they did note-for-note covers
instead of new takes.
So by Friday night,
they’d written this musical
about JoCo Cruise.
A plan to take over the world:
Here’s how it’s going
to work: first we’re going to
have a nice dinner.
And once we have our
very own cruise ship, we shall
dominate the seas.
An actual cake
which is not a lie. It was
delicious and moist.
It was delicious
and moist. This is Drew’s body
given up for us.
Questions and answers:
What do you do when
you reach the limits of your
When she reaches the
limits of her knowledge, she
says she doesn’t know.
the green people with
buttons who are aliens
wanting to probe you
Wash your hands! Do you
need to take your life jackets
to the safety drill?
What about water,
though? Where do you sign up for
the specialty lunch?
Calls to action
All this and more can
be real if you book yourself
a berth on that boat.
It was supported
by her Patreon patrons.
You could be one too!
If you want to hear
him sing more covers this way,
back this Kickstarter:
That will do for now. Next perhaps I’ll make word clouds of my YouTube descriptions from various time periods, to show what I was uploading at the time. Or perhaps I’ll feed the descriptions into the app I wrote to create the data for rhyme.science, see what the most common rhymes are, and write a poem about them, as I did with Last Chance to See.
Eventually, some of the content I create from my YouTube metadata will make it into a YouTube video of its own — perhaps finally a real channel trailer. But what will I write in the description and title, and will I have to calculate the steady state of a Markov chain to make sure it doesn’t affect the data it shows?
Posted in Recipes on May 3, 2020
I don’t know how ajvar is usually used, and I’m not even sure I pronounce it correctly, but many years ago I discovered that it makes a great base for nachos, or just a great nacho topping by itself, so with this recipe, I may offend Balkan and Mexican chefs alike. Quantities are all approximate… use as much of each thing as you feel like.
1 jar ajvar (spicy or mild, depending on how spicy you want your nachos and how many other ingredients you’ll be adding)
1 or 2 large packets of corn chips (you probably need more than you think. I prefer nacho cheese flavour.)
1 small container of sour cream or crème fraîche (you probably need less than you think)
plenty of grated cheese (I find Sbrinz cheese is great for nachos.)
1 or 2 onions (optional)
1 can red kidney beans (optional)
some kind of hot sauce, to taste (optional)
Chop and fry the onion(s), if using, in a large frying pan. Empty the ajvar into the pan. Swill a small amount of liquid from the kidney bean can, if using, or water, in the ajvar jar to dislodge any additional ajvar, and pour that into the pan. Drain the rest of the liquid from the kidney beans (if using) and empty them into the pan. Stir and heat up the mixture to a good eating temperature. Add hot sauce to taste, if using.
To serve, put a few large serving spoonfuls of the mixture onto a plate. Cover it with a layer of grated cheese, and if necessary, microwave briefly to melt the cheese. Add a dollop of sour cream. Serve the corn chips on the side so they stay crunchy and are less messy to handle.
To eat, scoop up some ajvar mixture, cheese, and a little of the sour cream with a corn chip, and put it in your mouth. You probably know how to do the rest.
Serves 2 or 3, as a main dish, if all optional ingredients are used.
I’ve developed a bit of a habit of recording entire concerts of musicians who don’t mind their concerts being recorded, splitting them into individual songs, and uploading them to my YouTube channel with copious notes in the video descriptions. My first upload was, appropriately, the band featured in the first image on the web, Les Horribles Cernettes, singing Big Bang. I first got enough camera batteries and SD cards to record entire concerts for the K’s Choice comeback concert in Dranouter in 2009, though the playlist is short, so perhaps I didn’t actually record that entire show.
I’ve also developed a habit of going on a week-long cruise packed with about 25 days of entertainment every year, and recording 30 or so hours of that entertainment. So my YouTube channel is getting a bit ridiculous. I currently have 2723 publicly-visible videos on my channel, and 2906 total videos — the other 183 are private or unlisted, either because they’re open mic or karaoke performances from JoCo Cruise and I’m not sure I have the performer’s permission to post them, or they’re official performances that we were requested to only share with people that were there.
I’ve been wondering just how much I’ve written in my sometimes-overly-verbose video descriptions over the years, and the only way I found to download all that metadata was using the YouTube API. I tested it out by putting a URL with the right parameters in a web browser, but it’s only possible to get the data for up to 50 videos at a time, so it was clear I’d have to write some code to do it.
Late Friday evening, after uploading my last video from JoCo Cruise 2020, I set to writing a document-based CoreData SwiftUI app to download all that data. I know my way around CoreData and downloading and parsing JSON in Swift, but haven’t had many chances to try out SwiftUI, so this was a way I could quickly get the information I wanted while still learning something. I decided to only get the public videos, since that doesn’t need authentication (indeed, I had already tried it in a web browser), so it’s a bit simpler.
By about 3a.m, I had all the data, stored in a document and displayed rather simply in my app. Perhaps that was my cue to go to bed, but I was too curious. So I quickly added some code to export all the video descriptions in one text file and all the video titles in another. I had planned to count the words within the app (using enumerateSubstrings byWords or enumerateTags, of course… we’re not savages! As a linguist I know that counting words is more complicated than counting spaces.) but it was getting late and I knew I wanted the full text for other things, so I just exported the text and opened it in Pages. The verdict:
- 2723 public videos
- 33 465 words in video titles
- 303 839 words in video descriptions
The next day, I wanted to create some word clouds with the data, but all the URLs in the video descriptions got in the way. I quite often link to the playlists each video is in, related videos, and where to purchase the songs being played. I added some code to remove links (using stringByReplacingMatches with an NSDataDetector with the link type, because we’re not savages! As an internet person I know that links are more complicated than any regex I’d write.) I found that Pages counts URLs as having quite a few words, so the final count is:
- At least 4 633 links (this is just by searching for ‘http’ in the original video descriptions, like a savage, so might not match every link)
- 267 567 words in video descriptions, once links are removed. I could almost win NaNoWriMo with the links from my video descriptions alone.
I then had my app export the publish dates of all the videos, imported them into Numbers, and created the histogram shown above. I actually learnt quite a bit about Numbers in the process, so that’s a bonus. I’ll probably do a deeper dive into the upload frequency later, with word clouds broken down by time period to show what I was uploading at any given time, but for now, here are some facts:
- The single day when I uploaded the most publicly-visible videos was 25 December 2017, when I uploaded 34 videos — a K’s Choice concert and a Burning Hell concert in Vienna earlier that year. I’m guessing I didn’t have company for Christmas, so I just got to hang out at home watching concerts and eating inexpertly-roasted potatoes.
- The month when I uploaded the most publicly-visible videos was April 2019. This makes sense, as I was unemployed at the time, and got back from JoCo Cruise on March 26.
So, onto the word clouds I cleaned up that data to make. I created them on wordclouds.com, because wordle has rather stagnated. Most of my video titles mention the artist name and concert venue and date, so some words end up being extremely common. This huge variation in word frequency meant I had to reduce the size from 0 all the way to -79 in order for it to be able to fit common words such as ‘Jonathan’. Wordclouds lets you choose the shape of the final word cloud, but at that scale, it ends up as the intersection of a diamond with the chosen shape, so the shape doesn’t end up being recognisable. Here it is, then, as a diamond:
The video descriptions didn’t have as much variation between word frequencies, so I only had to reduce it to size -45 to fit both ‘Jonathan’ and ‘Coulton’ in it. I still don’t know whether there are other common words that didn’t fit, because the site doesn’t show that information until it’s finished, and there are so many different words that it’s still busy drawing the word cloud. Luckily I could download an image of it before that finished. Anyway, at size -45, the ‘camera’ shape I’d hoped to use isn’t quite recognisable, but I did manage a decent ‘YouTube play button’ word cloud:
One weird fact I noticed is that I mention Paul Sabourin of Paul and Storm in video descriptions about 40% more often than I mention Storm DiCostanzo, and I include his last name three times as much. To rectify this, I wrote a song mentioning Storm’s last name a lot, to be sung to the tune of ‘Hallelujah’, because that’s what we do:
We’d like to sing of Paul and Storm.
It’s Paul we love to see perform.
The other member’s name’s the one that scans though.
So here’s to he who plays guitar;
let’s all sing out a thankful ‘Arrr!’
for Paul and Storm’s own Greg “Storm” DiCostanzo!
DiCostanzo, DiCostanzo, DiCostanzo, DiCostanzo
I’m sure I’ll download more data from the API, do some more analysis, and mine the text for haiku (if Haiku Detector even still runs — it’s been a while since I touched it!) later, but that’s enough for now!
A few weeks before JoCo Cruise 2020, I wrote a song to perform at the open mic. It’s a singalong which I figured everyone could relate to, so I figured people would enjoy it. I came up with the tune myself, and Joey Marianer worked out some ukulele accompaniment. Then we found out there would be no open mic on the cruise, so we performed it at Beth Kinderman’s song circle at MarsCon, though there was a lot of background noise and not much singing along there.
I was signed up to perform in a shadow event called ‘A Bunch of Monkeys Read Some Stuff‘ on the cruise, so I also performed it there, along with some short poems I’d written during NanoRhymo 1 and 2, and Global Poetry Writing Month. Words and tweet links of the specific tiny poems are in the video description.
Later in the cruise, Joey hastily organised an especially unofficial open mic, so we performed it there as well. By that time I was slightly more confident about remembering the words:
Here are the lyrics. They contain much haplology, and work best in an accent without the trap-bath split; I had to change the way I pronounce ‘demand’ to sing it, and I didn’t always keep that change consistent through the rest of the song.
We’re close, and I’m finally here with you.
You don’t look like your avatar.
Until I demand all
your names and your handles,
I probably won’t know who you are.
You’ve changed name and gender
your hair, or your shirt
You took off your glasses
your beard or your skirt
You left for three seconds,
your mouth’s now ajar.
I probably don’t know who you are
I probably don’t know who you are.
I probably don’t know who you are!
Your name and your face too,
I just cannot place you.
I probably don’t know who you are.
You’ve just really killed it at open mic.
Your singalong chorus went. far,
but nobody says so
when you’re off the stage, so
they probably don’t know who you are.
They snubbed you at dinner
they brought the wrong beer
Regaled you with stories
you told them last year.
They won’t share their stateroom
or give back your car
They probably don’t know who you are
They probably don’t know who you are.
They probably don’t know who you are!
Even if someone knows ya,
there’s prosopagnosia —
they probably don’t know who you are.
You once seemed at least somewhat normative
but each year things get more bizarre.
There’s joy and there’s strife while
you’re changing your lifestyle.
You probably don’t know who you are.
I couldn’t write this part;
It wouldn’t be true.
Just think about things
That are changing for you.
It takes time and patience
To tune a guitar
You probably don’t know who you are
You probably don’t know who you are.
You probably don’t know who you are!
You’re constantly growing
new parts for not knowing.
You probably don’t know who you are.
We probably don’t know who we are.
We probably don’t know who we are!
And we don’t know whether
we’ll find out together.
We probably don’t know who we are.
It’s all based on truth. Every JoCo Cruise I spend an action-packed and sleep-deprived week with people who are, to varying degrees, my friends. It’s a cruise where people’s clothes and makeup are often far more memorable than their faces, so I may or may not recognise my new or old friends each time I see them during that week. The subtle difference between formal night and pyjama day attire in the videos above can’t compare to the costume changes some people go through. I spend the rest of the year connected to many of these friends via the internet, where I learn their full names and/or other handles, but (despite the name of one of the websites) not necessarily the faces which go with those names. Then we meet in person again, a year of growth different.
Sometimes they grow a full beard between cruises, and then once I’ve figured out who they are, shave it off during the cruise (you know who you are. I didn’t.) Sometimes they transition, tell me their new name, and I don’t connect that ‘new’ person with the name and face they had previously until weeks after I get home. Sometimes I accidentally tell people their own origin stories.
I perform at many open mics, and often love the performances as they’re happening, but don’t remember exactly what the performers looked like or who did what. When people come up to me afterwards and praise my performance, I want to do the same for them, but am not sure whether or what they performed.
I wrote the ‘I’ and ‘they’ parts with no particular plan to turn it into something serious at the end, but then a ‘you’ section seemed like the obvious continuation. That part is true for me, too — the most predictable thing about my life is that it will keep getting ever more ridiculous. May you all find a Jim Boggia to help tune your metaphorical guitars, and if not, time and patience.