Posts Tagged statistics
I’ve developed a bit of a habit of recording entire concerts of musicians who don’t mind their concerts being recorded, splitting them into individual songs, and uploading them to my YouTube channel with copious notes in the video descriptions. My first upload was, appropriately, the band featured in the first image on the web, Les Horribles Cernettes, singing Big Bang. I first got enough camera batteries and SD cards to record entire concerts for the K’s Choice comeback concert in Dranouter in 2009, though the playlist is short, so perhaps I didn’t actually record that entire show.
I’ve also developed a habit of going on a week-long cruise packed with about 25 days of entertainment every year, and recording 30 or so hours of that entertainment. So my YouTube channel is getting a bit ridiculous. I currently have 2723 publicly-visible videos on my channel, and 2906 total videos — the other 183 are private or unlisted, either because they’re open mic or karaoke performances from JoCo Cruise and I’m not sure I have the performer’s permission to post them, or they’re official performances that we were requested to only share with people that were there.
I’ve been wondering just how much I’ve written in my sometimes-overly-verbose video descriptions over the years, and the only way I found to download all that metadata was using the YouTube API. I tested it out by putting a URL with the right parameters in a web browser, but it’s only possible to get the data for up to 50 videos at a time, so it was clear I’d have to write some code to do it.
Late Friday evening, after uploading my last video from JoCo Cruise 2020, I set to writing a document-based CoreData SwiftUI app to download all that data. I know my way around CoreData and downloading and parsing JSON in Swift, but haven’t had many chances to try out SwiftUI, so this was a way I could quickly get the information I wanted while still learning something. I decided to only get the public videos, since that doesn’t need authentication (indeed, I had already tried it in a web browser), so it’s a bit simpler.
By about 3a.m, I had all the data, stored in a document and displayed rather simply in my app. Perhaps that was my cue to go to bed, but I was too curious. So I quickly added some code to export all the video descriptions in one text file and all the video titles in another. I had planned to count the words within the app (using enumerateSubstrings byWords or enumerateTags, of course… we’re not savages! As a linguist I know that counting words is more complicated than counting spaces.) but it was getting late and I knew I wanted the full text for other things, so I just exported the text and opened it in Pages. The verdict:
- 2723 public videos
- 33 465 words in video titles
- 303 839 words in video descriptions
The next day, I wanted to create some word clouds with the data, but all the URLs in the video descriptions got in the way. I quite often link to the playlists each video is in, related videos, and where to purchase the songs being played. I added some code to remove links (using stringByReplacingMatches with an NSDataDetector with the link type, because we’re not savages! As an internet person I know that links are more complicated than any regex I’d write.) I found that Pages counts URLs as having quite a few words, so the final count is:
- At least 4 633 links (this is just by searching for ‘http’ in the original video descriptions, like a savage, so might not match every link)
- 267 567 words in video descriptions, once links are removed. I could almost win NaNoWriMo with the links from my video descriptions alone.
I then had my app export the publish dates of all the videos, imported them into Numbers, and created the histogram shown above. I actually learnt quite a bit about Numbers in the process, so that’s a bonus. I’ll probably do a deeper dive into the upload frequency later, with word clouds broken down by time period to show what I was uploading at any given time, but for now, here are some facts:
- The single day when I uploaded the most publicly-visible videos was 25 December 2017, when I uploaded 34 videos — a K’s Choice concert and a Burning Hell concert in Vienna earlier that year. I’m guessing I didn’t have company for Christmas, so I just got to hang out at home watching concerts and eating inexpertly-roasted potatoes.
- The month when I uploaded the most publicly-visible videos was April 2019. This makes sense, as I was unemployed at the time, and got back from JoCo Cruise on March 26.
So, onto the word clouds I cleaned up that data to make. I created them on wordclouds.com, because wordle has rather stagnated. Most of my video titles mention the artist name and concert venue and date, so some words end up being extremely common. This huge variation in word frequency meant I had to reduce the size from 0 all the way to -79 in order for it to be able to fit common words such as ‘Jonathan’. Wordclouds lets you choose the shape of the final word cloud, but at that scale, it ends up as the intersection of a diamond with the chosen shape, so the shape doesn’t end up being recognisable. Here it is, then, as a diamond:
The video descriptions didn’t have as much variation between word frequencies, so I only had to reduce it to size -45 to fit both ‘Jonathan’ and ‘Coulton’ in it. I still don’t know whether there are other common words that didn’t fit, because the site doesn’t show that information until it’s finished, and there are so many different words that it’s still busy drawing the word cloud. Luckily I could download an image of it before that finished. Anyway, at size -45, the ‘camera’ shape I’d hoped to use isn’t quite recognisable, but I did manage a decent ‘YouTube play button’ word cloud:
One weird fact I noticed is that I mention Paul Sabourin of Paul and Storm in video descriptions about 40% more often than I mention Storm DiCostanzo, and I include his last name three times as much. To rectify this, I wrote a song mentioning Storm’s last name a lot, to be sung to the tune of ‘Hallelujah’, because that’s what we do:
We’d like to sing of Paul and Storm.
It’s Paul we love to see perform.
The other member’s name’s the one that scans though.
So here’s to he who plays guitar;
let’s all sing out a thankful ‘Arrr!’
for Paul and Storm’s own Greg “Storm” DiCostanzo!
DiCostanzo, DiCostanzo, DiCostanzo, DiCostanzo
I’m sure I’ll download more data from the API, do some more analysis, and mine the text for haiku (if Haiku Detector even still runs — it’s been a while since I touched it!) later, but that’s enough for now!