Posts Tagged YouTube

How to fit 301032 words into nine minutes

A few months ago I wrote an app to download my YouTube metadata, and I blogged some statistics about it and some haiku I found in my video titles and descriptions. I also created a few word clouds from the titles and descriptions. In that post, I said:

Next perhaps I’ll make word clouds of my YouTube descriptions from various time periods, to show what I was uploading at the time. […] Eventually, some of the content I create from my YouTube metadata will make it into a YouTube video of its own — perhaps finally a real channel trailer. 

Me, two and a third months ago

TL;DR: I made a channel trailer of audiovisual word clouds showing each month of uploads:

It seemed like the only way to do justice to the number and variety of videos I’ve uploaded over the past thirteen years. My channel doesn’t exactly have a content strategy. This is best watched on a large screen with stereo sound, but there is no way you will catch everything anyway. Prepare to be overwhelmed.

Now for the ‘too long; don’t feel obliged to read’ part on how I did it. I’ve uploaded videos in 107 distinct months, so creating a word cloud for each month using seemed tedious and slow. I looked into web APIs for creating word clouds automatically, and added the code to my app to call them, but then I realised I’d have to sign up for an account, including a payment method, and once I ran out of free word clouds I’d be paying a couple of cents each. That could easily add up to $5 or more if I wanted to try different settings! So obviously I would need to spend many hours programming to avoid that expense.

I have a well-deserved reputation for being something of a gadget freak, and am rarely happier than when spending an entire day programming my computer to perform automatically a task that it would otherwise take me a good ten seconds to do by hand. Ten seconds, I tell myself, is ten seconds. Time is valuable and ten seconds’ worth of it is well worth the investment of a day’s happy activity working out a way of saving it.

Douglas Adams in ‘Last chance to see…’

I searched for free word cloud code in Swift, downloaded the first one I found, and then it was a simple matter of changing it to work on macOS instead of iOS, fixing some alignment issues, getting it to create an image instead of arranging text labels, adding some code to count word frequencies and exclude common English words, giving it colour schemes, background images, and the ability to show smaller words inside characters of other words, getting it to work in 1116 different fonts, export a copy of the cloud to disk at various points during the progress, and also create a straightforward text rendering using the same colour scheme as a word cloud for the intro… before I knew it, I had an app that would automatically create a word cloud from the titles and descriptions of each month’s public uploads, shown over the thumbnail of the most-viewed video from that month, in colour schemes chosen randomly from the ones I’d created in the app, and a different font for each month. I’m not going to submit a pull request; the code is essentially unrecognisable now.

In case any of the thumbnails spark your curiosity, or you just think the trailer was too short and you’d rather watch 107 full videos to get an idea of my channel, here is a playlist of all the videos whose thumbnails are shown in this video:

It’s a mixture of super-popular videos and videos which didn’t have much competition in a given month.

Of course, I needed a soundtrack for my trailer. Music wouldn’t do, because that would reduce my channel trailer to a mere song for anyone who couldn’t see it well. So I wrote some code to make an audio version of each word cloud (or however much of it could fit into five seconds without too many overlapping voices) using the many text-to-speech voices in macOS, with the most common words being spoken louder. I’ll write a separate post about that; I started writing it up here and it got too long.

The handwritten thank you notes at the end were mostly from members of the JoCo Cruise postcard trading club, although one came with a pandemic care package from my current employer. I have regaled people there with various ridiculous stories about my life, and shown them my channel. You’re all most welcome; it’s been fun rewatching the concert videos myself while preparing to upload, and it’s always great to know other people enjoy them too.

I put all the images and sounds together into a video using Final Cut Pro 10.4.8. This was all done on my mid-2014 Retina 15-inch MacBook Pro, Sneuf.

, , , , , ,

Leave a comment

Unintentional Haiku in my YouTube Video Descriptions

Since I wrote a little app to download much of my YouTube metadata, it was obvious that I needed to feed it through another little app I wrote: Haiku Detector. So I did. In all of my public YouTube descriptions put together, with URLs removed, there are 26 172 sentences, and 436 detected haiku.

As is usually the case, a few of these ‘haiku’ were not really haiku because the Mac speech synthesis pronounces them wrong, and thus Haiku Detector counts their syllables incorrectly. A few more involved sentences which no longer made sense because their URLs had been removed, or which were partial sentences from song lyrics which looked like full sentences because they were on lines of their own. Most of the rest just weren’t very interesting.

There were quite a lot of song lyrics which fit into haiku, which suggest tunes to which other haiku can be sung, if the stress patterns match up. I’m not going to put those here though; there are too many, and I could make a separate post about haiku in Jonathan Coulton lyrics, having already compiled a JoCorpus for to find rhymes in. So here are some other categories of haiku I liked. For lack of a better idea, I’ll link the first word of each one to the video it’s from.

Apologies about my camerawork

Also, there’s a lot
of background noise so the sound
isn’t very good.

There was a little
too much light and sound for my
poor little camera. 🙂

But hey, if I’d brought
my external microphone,
it would have got wet.

I’m so sad that I
had to change batteries or
something part-way through. 😦

Who do I look like,
Joe Covenant in Glasgow
in 2008?

Now the guitar is
out of tune and my camera
is out of focus.

Performers being their typical selves

John Roderick:

they get around to singing
the song Cinnamon.

Aimee Mann asks John
Roderick to play one of
his songs (which he wrote.)

Jim Boggia:

But first, he gives us
a taste of what he’s really
famous for: tuning.

And now he’s lost his
voice, so it’s going to be
great for everything.

Cody Wymore:

Cody Wymore can’t
do a set without Stephen
Sondheim in it.

Cody horns in on
it anyway by adding
a piano part.

He pauses time for
a bit so nobody knows
he was unprepared.

It’s about being
in a room full of people
and feeling alone.

Paul and Storm:

Why does every new
verse of their song keep taking
them so goddamn long?

Little did I know
that four other people would
throw panties at Paul.

Ted Leo:

We’re gonna bring the
mood down a little bit, but
maybe lift it up!

Nerf Herder:

Meanwhile, they have to
fix up the drums because I
guess they rocked too hard.

Zoe and Brian Gray:

It’s For the Glory
of Gleeble Glorp, which isn’t
a euphemism.

Zoe Gray has to
follow Brian Gray’s songs from
the Gleebleverse.

Clint McElroy:

He’s here to perform
for us an amazing act
of léger de main.

Travis McElroy:

Travis gets up on
stage and holds a small doll’s head
in a creepy way.

which brings us to Jonathan Coulton:

He loves us and is
very glad to be with us.
This is Creepy Doll.

Jonathan Coulton
remarks on the lax rhyming
in God Save The Queen.

Jonathan will use
Jim’s capo, and he will give
it back afterwards.

Jonathan did not
know this was going to be
a cardio set.

That guy Paul has been
seeing every goddamned day
for the last two months.

MC Frontalot:

MC Frontalot
talks about samples and tells
us what hiphop is.

Jean Grae:

It’s not because she’s
a lady, but because she’s
an alcoholic.

She feels like she should
get a guitar case, even
without a guitar.

Jon Spurney:

Jon Spurney rocks out
on the guitar solo, as
he is wont to do.


at about 6:38,
we get to the point.

The ship’s IT guy:

He has been very
glad to meet us, but he’s not
sad to see us leave.

Red Team Leader:

Red Leader has some
announcements to make before
the final concert.

The Red Team didn’t
mind, because we’re the team that
entertains ourselves.

All the JoCo Cruise performers in the second half of the last show:

Let’s bring Aimee Mann
back out to the stage to join
the Shitty Bar Band.

We now get into
the unrehearsed supergroup
section of the show.

JoCo Cruise hijinks

This is the last show,
unless we’re quarantined on
the ship for a while!

Half of those palettes
were 55-gallon drums
of caveat sauce.

This pun somehow leads
to a sad Happy Birthday
for Paul Sabourin.

Paul Sabourin points
out Kendra’s Glow Cloud dress in
the front row (all hail!)

They talk about why
they did note-for-note covers
instead of new takes.

Make It With You by
Bread, which has even better
string writing than Swift.

So by Friday night,
they’d written this musical
about JoCo Cruise.

A plan to take over the world:

Here’s how it’s going
to work: first we’re going to
have a nice dinner.

And once we have our
very own cruise ship, we shall
dominate the seas.

Some Truth:

An actual cake
which is not a lie. It was
delicious and moist.

It was delicious
and moist. This is Drew’s body
given up for us.

Questions and answers:

What do you do when
you reach the limits of your
own understanding?

When she reaches the
limits of her knowledge, she
says she doesn’t know.

the green people with
buttons who are aliens
wanting to probe you

Wash your hands! Do you
need to take your life jackets
to the safety drill?

What about water,
though? Where do you sign up for
the specialty lunch?

Calls to action

All this and more can
be real if you book yourself
a berth on that boat.

It was supported
by her Patreon patrons.
You could be one too!

If you want to hear
him sing more covers this way,
back this Kickstarter:

That will do for now. Next perhaps I’ll make word clouds of my YouTube descriptions from various time periods, to show what I was uploading at the time. Or perhaps I’ll feed the descriptions into the app I wrote to create the data for, see what the most common rhymes are, and write a poem about them, as I did with Last Chance to See.

Eventually, some of the content I create from my YouTube metadata will make it into a YouTube video of its own — perhaps finally a real channel trailer. But what will I write in the description and title, and will I have to calculate the steady state of a Markov chain to make sure it doesn’t affect the data it shows?


, , , , , , ,

Leave a comment

Some Statistics About My Ridiculous YouTube Channel

I’ve developed a bit of a habit of recording entire concerts of musicians who don’t mindGraph their concerts being recorded, splitting them into individual songs, and uploading them to my YouTube channel with copious notes in the video descriptions. My first upload was, appropriately, the band featured in the first image on the web, Les Horribles Cernettes, singing Big Bang. I first got enough camera batteries and SD cards to record entire concerts for the K’s Choice comeback concert in Dranouter in 2009, though the playlist is short, so perhaps I didn’t actually record that entire show.

I’ve also developed a habit of going on a week-long cruise packed with about 25 days of entertainment every year, and recording 30 or so hours of that entertainment. So my YouTube channel is getting a bit ridiculous. I currently have 2723 publicly-visible videos on my channel, and 2906 total videos — the other 183 are private or unlisted, either because they’re open mic or karaoke performances from JoCo Cruise and I’m not sure I have the performer’s permission to post them, or they’re official performances that we were requested to only share with people that were there.

I’ve been wondering just how much I’ve written in my sometimes-overly-verbose video descriptions over the years, and the only way I found to download all that metadata was using the YouTube API. I tested it out by putting a URL with the right parameters in a web browser, but it’s only possible to get the data for up to 50 videos at a time, so it was clear I’d have to write some code to do it.

Late Friday evening, after uploading my last video from JoCo Cruise 2020, I set to writing a document-based CoreData SwiftUI app to download all that data. I know my way around CoreData and downloading and parsing JSON in Swift, but haven’t had many chances to try out SwiftUI, so this was a way I could quickly get the information I wanted while still learning something. I decided to only get the public videos, since that doesn’t need authentication (indeed, I had already tried it in a web browser), so it’s a bit simpler.

By about 3a.m, I had all the data, stored in a document and displayed rather simply in my app. Perhaps that was my cue to go to bed, but I was too curious. So I quickly added some code to export all the video descriptions in one text file and all the video titles in another. I had planned to count the words within the app (using enumerateSubstrings byWords or enumerateTags, of course… we’re not savages! As a linguist I know that counting words is more complicated than counting spaces.) but it was getting late and I knew I wanted the full text for other things, so I just exported the text and opened it in Pages. The verdict:

  • 2723 public videos
  • 33 465 words in video titles
  • 303 839 words in video descriptions

The next day, I wanted to create some word clouds with the data, but all the URLs in the video descriptions got in the way. I quite often link to the playlists each video is in, related videos, and where to purchase the songs being played. I added some code to remove links (using stringByReplacingMatches with an NSDataDetector with the link type, because we’re not savages! As an internet person I know that links are more complicated than any regex I’d write.) I found that Pages counts URLs as having quite a few words, so the final count is:

  • At least 4 633 links (this is just by searching for ‘http’ in the original video descriptions, like a savage, so might not match every link)
  • 267 567 words in video descriptions, once links are removed. I could almost win NaNoWriMo with the links from my video descriptions alone.

I then had my app export the publish dates of all the videos, imported them into Numbers, and created the histogram shown above. I actually learnt quite a bit about Numbers in the process, so that’s a bonus. I’ll probably do a deeper dive into the upload frequency later, with word clouds broken down by time period to show what I was uploading at any given time, but for now, here are some facts:

  • The single day when I uploaded the most publicly-visible videos was 25 December 2017, when I uploaded 34 videos — a K’s Choice concert and a Burning Hell concert in Vienna earlier that year. I’m guessing I didn’t have company for Christmas, so I just got to hang out at home watching concerts and eating inexpertly-roasted potatoes.
  • The month when I uploaded the most publicly-visible videos was April 2019. This makes sense, as I was unemployed at the time, and got back from JoCo Cruise on March 26.

So, onto the word clouds I cleaned up that data to make. I created them on, because wordle has rather stagnated. Most of my video titles mention the artist name and concert venue and date, so some words end up being extremely common. This huge variation in word frequency meant I had to reduce the size from 0 all the way to -79 in order for it to be able to fit common words such as ‘Jonathan’. Wordclouds lets you choose the shape of the final word cloud, but at that scale, it ends up as the intersection of a diamond with the chosen shape, so the shape doesn’t end up being recognisable. Here it is, then, as a diamond:


The video descriptions didn’t have as much variation between word frequencies, so I only had to reduce it to size -45 to fit both ‘Jonathan’ and ‘Coulton’ in it. I still don’t know whether there are other common words that didn’t fit, because the site doesn’t show that information until it’s finished, and there are so many different words that it’s still busy drawing the word cloud. Luckily I could download an image of it before that finished. Anyway, at size -45, the ‘camera’ shape I’d hoped to use isn’t quite recognisable, but I did manage a decent ‘YouTube play button’ word cloud:


One weird fact I noticed is that I mention Paul Sabourin of Paul and Storm in video descriptions about 40% more often than I mention Storm DiCostanzo, and I include his last name three times as much. To rectify this, I wrote a song mentioning Storm’s last name a lot, to be sung to the tune of ‘Hallelujah’, because that’s what we do:

We’d like to sing of Paul and Storm.
It’s Paul we love to see perform.
The other member’s name’s the one that scans though.
So here’s to he who plays guitar;
let’s all sing out a thankful ‘Arrr!’
for Paul and Storm’s own Greg “Storm” DiCostanzo!
DiCostanzo, DiCostanzo, DiCostanzo, DiCostanzo

I’m sure I’ll download more data from the API, do some more analysis, and mine the text for haiku (if Haiku Detector even still runs — it’s been a while since I touched it!) later, but that’s enough for now!


, , , , , , , , ,

Leave a comment

%d bloggers like this: