Posts Tagged wordle

How to fit 301032 words into nine minutes


A few months ago I wrote an app to download my YouTube metadata, and I blogged some statistics about it and some haiku I found in my video titles and descriptions. I also created a few word clouds from the titles and descriptions. In that post, I said:

Next perhaps I’ll make word clouds of my YouTube descriptions from various time periods, to show what I was uploading at the time. […] Eventually, some of the content I create from my YouTube metadata will make it into a YouTube video of its own — perhaps finally a real channel trailer. 

Me, two and a third months ago

TL;DR: I made a channel trailer of audiovisual word clouds showing each month of uploads:

It seemed like the only way to do justice to the number and variety of videos I’ve uploaded over the past thirteen years. My channel doesn’t exactly have a content strategy. This is best watched on a large screen with stereo sound, but there is no way you will catch everything anyway. Prepare to be overwhelmed.

Now for the ‘too long; don’t feel obliged to read’ part on how I did it. I’ve uploaded videos in 107 distinct months, so creating a word cloud for each month using wordclouds.com seemed tedious and slow. I looked into web APIs for creating word clouds automatically, and added the code to my app to call them, but then I realised I’d have to sign up for an account, including a payment method, and once I ran out of free word clouds I’d be paying a couple of cents each. That could easily add up to $5 or more if I wanted to try different settings! So obviously I would need to spend many hours programming to avoid that expense.

I have a well-deserved reputation for being something of a gadget freak, and am rarely happier than when spending an entire day programming my computer to perform automatically a task that it would otherwise take me a good ten seconds to do by hand. Ten seconds, I tell myself, is ten seconds. Time is valuable and ten seconds’ worth of it is well worth the investment of a day’s happy activity working out a way of saving it.

Douglas Adams in ‘Last chance to see…’

I searched for free word cloud code in Swift, downloaded the first one I found, and then it was a simple matter of changing it to work on macOS instead of iOS, fixing some alignment issues, getting it to create an image instead of arranging text labels, adding some code to count word frequencies and exclude common English words, giving it colour schemes, background images, and the ability to show smaller words inside characters of other words, getting it to work in 1116 different fonts, export a copy of the cloud to disk at various points during the progress, and also create a straightforward text rendering using the same colour scheme as a word cloud for the intro… before I knew it, I had an app that would automatically create a word cloud from the titles and descriptions of each month’s public uploads, shown over the thumbnail of the most-viewed video from that month, in colour schemes chosen randomly from the ones I’d created in the app, and a different font for each month. I’m not going to submit a pull request; the code is essentially unrecognisable now.

In case any of the thumbnails spark your curiosity, or you just think the trailer was too short and you’d rather watch 107 full videos to get an idea of my channel, here is a playlist of all the videos whose thumbnails are shown in this video:

It’s a mixture of super-popular videos and videos which didn’t have much competition in a given month.

Of course, I needed a soundtrack for my trailer. Music wouldn’t do, because that would reduce my channel trailer to a mere song for anyone who couldn’t see it well. So I wrote some code to make an audio version of each word cloud (or however much of it could fit into five seconds without too many overlapping voices) using the many text-to-speech voices in macOS, with the most common words being spoken louder. I’ll write a separate post about that; I started writing it up here and it got too long.

The handwritten thank you notes at the end were mostly from members of the JoCo Cruise postcard trading club, although one came with a pandemic care package from my current employer. I have regaled people there with various ridiculous stories about my life, and shown them my channel. You’re all most welcome; it’s been fun rewatching the concert videos myself while preparing to upload, and it’s always great to know other people enjoy them too.

I put all the images and sounds together into a video using Final Cut Pro 10.4.8. This was all done on my mid-2014 Retina 15-inch MacBook Pro, Sneuf.

, , , , , ,

2 Comments

Some Statistics About My Ridiculous YouTube Channel


I’ve developed a bit of a habit of recording entire concerts of musicians who don’t mindGraph their concerts being recorded, splitting them into individual songs, and uploading them to my YouTube channel with copious notes in the video descriptions. My first upload was, appropriately, the band featured in the first image on the web, Les Horribles Cernettes, singing Big Bang. I first got enough camera batteries and SD cards to record entire concerts for the K’s Choice comeback concert in Dranouter in 2009, though the playlist is short, so perhaps I didn’t actually record that entire show.

I’ve also developed a habit of going on a week-long cruise packed with about 25 days of entertainment every year, and recording 30 or so hours of that entertainment. So my YouTube channel is getting a bit ridiculous. I currently have 2723 publicly-visible videos on my channel, and 2906 total videos — the other 183 are private or unlisted, either because they’re open mic or karaoke performances from JoCo Cruise and I’m not sure I have the performer’s permission to post them, or they’re official performances that we were requested to only share with people that were there.

I’ve been wondering just how much I’ve written in my sometimes-overly-verbose video descriptions over the years, and the only way I found to download all that metadata was using the YouTube API. I tested it out by putting a URL with the right parameters in a web browser, but it’s only possible to get the data for up to 50 videos at a time, so it was clear I’d have to write some code to do it.

Late Friday evening, after uploading my last video from JoCo Cruise 2020, I set to writing a document-based CoreData SwiftUI app to download all that data. I know my way around CoreData and downloading and parsing JSON in Swift, but haven’t had many chances to try out SwiftUI, so this was a way I could quickly get the information I wanted while still learning something. I decided to only get the public videos, since that doesn’t need authentication (indeed, I had already tried it in a web browser), so it’s a bit simpler.

By about 3a.m, I had all the data, stored in a document and displayed rather simply in my app. Perhaps that was my cue to go to bed, but I was too curious. So I quickly added some code to export all the video descriptions in one text file and all the video titles in another. I had planned to count the words within the app (using enumerateSubstrings byWords or enumerateTags, of course… we’re not savages! As a linguist I know that counting words is more complicated than counting spaces.) but it was getting late and I knew I wanted the full text for other things, so I just exported the text and opened it in Pages. The verdict:

  • 2723 public videos
  • 33 465 words in video titles
  • 303 839 words in video descriptions

The next day, I wanted to create some word clouds with the data, but all the URLs in the video descriptions got in the way. I quite often link to the playlists each video is in, related videos, and where to purchase the songs being played. I added some code to remove links (using stringByReplacingMatches with an NSDataDetector with the link type, because we’re not savages! As an internet person I know that links are more complicated than any regex I’d write.) I found that Pages counts URLs as having quite a few words, so the final count is:

  • At least 4 633 links (this is just by searching for ‘http’ in the original video descriptions, like a savage, so might not match every link)
  • 267 567 words in video descriptions, once links are removed. I could almost win NaNoWriMo with the links from my video descriptions alone.

I then had my app export the publish dates of all the videos, imported them into Numbers, and created the histogram shown above. I actually learnt quite a bit about Numbers in the process, so that’s a bonus. I’ll probably do a deeper dive into the upload frequency later, with word clouds broken down by time period to show what I was uploading at any given time, but for now, here are some facts:

  • The single day when I uploaded the most publicly-visible videos was 25 December 2017, when I uploaded 34 videos — a K’s Choice concert and a Burning Hell concert in Vienna earlier that year. I’m guessing I didn’t have company for Christmas, so I just got to hang out at home watching concerts and eating inexpertly-roasted potatoes.
  • The month when I uploaded the most publicly-visible videos was April 2019. This makes sense, as I was unemployed at the time, and got back from JoCo Cruise on March 26.

So, onto the word clouds I cleaned up that data to make. I created them on wordclouds.com, because wordle has rather stagnated. Most of my video titles mention the artist name and concert venue and date, so some words end up being extremely common. This huge variation in word frequency meant I had to reduce the size from 0 all the way to -79 in order for it to be able to fit common words such as ‘Jonathan’. Wordclouds lets you choose the shape of the final word cloud, but at that scale, it ends up as the intersection of a diamond with the chosen shape, so the shape doesn’t end up being recognisable. Here it is, then, as a diamond:

titles

The video descriptions didn’t have as much variation between word frequencies, so I only had to reduce it to size -45 to fit both ‘Jonathan’ and ‘Coulton’ in it. I still don’t know whether there are other common words that didn’t fit, because the site doesn’t show that information until it’s finished, and there are so many different words that it’s still busy drawing the word cloud. Luckily I could download an image of it before that finished. Anyway, at size -45, the ‘camera’ shape I’d hoped to use isn’t quite recognisable, but I did manage a decent ‘YouTube play button’ word cloud:

descriptions

One weird fact I noticed is that I mention Paul Sabourin of Paul and Storm in video descriptions about 40% more often than I mention Storm DiCostanzo, and I include his last name three times as much. To rectify this, I wrote a song mentioning Storm’s last name a lot, to be sung to the tune of ‘Hallelujah’, because that’s what we do:

We’d like to sing of Paul and Storm.
It’s Paul we love to see perform.
The other member’s name’s the one that scans though.
So here’s to he who plays guitar;
let’s all sing out a thankful ‘Arrr!’
for Paul and Storm’s own Greg “Storm” DiCostanzo!
DiCostanzo, DiCostanzo, DiCostanzo, DiCostanzo

I’m sure I’ll download more data from the API, do some more analysis, and mine the text for haiku (if Haiku Detector even still runs — it’s been a while since I touched it!) later, but that’s enough for now!

, , , , , , , , ,

Leave a comment

%d bloggers like this: