So I leave my bags behind (Galilee Song parody, now actually sung!) and another new version of Seddit


Hey look, Joey Marianer sang the parody song lyrics from my last post! Check there for the lyrics and the aviation incidents referenced.

There are some more song parody lyrics, but first, a word from my sponsor: me. Just like last time, I’ve released a new version of Seddit, my text-to-speech-focussed Reddit client for macOS and iOS. This has a feature I’ve wanted to add for a while — the possibility to select multiple voices, and read each user’s posts and comments in a different one. The variety makes it easier to keep paying attention when listening for a long time, and having each user consistently use the same voice should make it easier to follow conversations.

I made some other changes in this version too. Here’s a full list of them:

Features

  • Added the possibility to have each user’s posts and comments spoken in a different voice
  • Added settings for whether to read out the subreddit name, and date and time for each post.
  • Added the option to load no comments — this was for Joey, who wanted to try listening to short story subreddits while obeying the “don’t read the comments” rule of the internet.

Bug fixes

  • Fixed a bug whereby turning off the ‘Say “Link” instead of reading out URLs’ setting would not work
  • Fixed a bug where comments that weren’t loaded would be read as “comment by unknown user” Comments that aren’t loaded due to the comment depth settings are also no longer displayed.
  • Fixed a potential crash when opening the app if posts had been deleted on another device

On the subject of text-to-speech, nine or ten years ago I read a book and a bunch of papers on speech synthesis in order to write a term paper for my Web Development for Linguistics degree. The term paper was longer than the text of my thesis, because my thesis also included source code for a web site and a Mac app. Anyway, from this book I learnt about PSOLA (Pitch Synchronous Overlap and Add) which is used to change the pitch and duration of sounds for text-to-speech, as one might do to change prosody, or create a robot choir.

Newer voices don’t use PSOLA so much, as (to put it simply) they have more samples of actual speech in different situations, so they don’t need to modify samples for the sake of prosody. Note, this is ‘newer voices’ as of a decade or two ago; I don’t know whether the latest crop of ML-based voices do things the same way. Anyway, I assume this is why the newer macOS voices don’t support the TUNE format I used for my robot choir.

At the time, I wrote an utterly silly partial parody of Lola, by The Kinks, about PSOLA. I thought maybe I’d finish it or maybe even make it less silly[why?], but I never did, and now I don’t remember enough about how PSOLA works to fully understand what I originally wrote. So here is that draft. It really doesn’t scan, but I hope it doesn’t scan in amusing ways:

I was trying to synthesise some prosody,
but my source and filter were mixed up just like granola
G-R-A-N-O-L-A, granola.

So I found a new way to make it sound rad
It’s called pitch-synchronous overlap and add, that is PSOLA
P-S-O-L-A PSOLA. Pso-pso-pso-P-SOLA.

Well I didn’t want to sound like a smallpox blight
So I really took care with my to get my epochs right
for PSOLA. Pso-pso-pso-P-SOLA.

If you’re not dumb then you’ll soon understand
How I speak like a woman then sound like a man
It’s P-SOLA. Pso-pso-pso-P-SOLA. Pso-pso-pso-P-SOLA.

[It doesn’t look like I wrote anything for the bridge (is that a bridge?) of the song, so just pretend it keeps going roughly like before]

It was used to make synthesized speech sound natural
But now there’s some super-sized features that fill that role-uh
R-O-L-E hyphen U-H role-uh

So that’s my guess if you’re wondering why r-
ecent voices don’t sing in my robot choir:
No PSOLA.

, , , , , , , , ,

  1. Seddit 1.5 supports multilingual Reddit listening. Also, Joey sang my half-baked PSOLA song! | Creative Output

Leave a comment