Archive for May, 2015
I subjected Haiku Detector to some serious stress-testing with a 29MB text file (that’s 671481 sentences, containing 16810 haiku, of which some are intentional) a few days ago, and kept finding more things that needed fixing or could do with improvement. A few days in a nerdsniped daze later, I have a new version, and some interesting tidbits about the way Mac speech synthesis pronounces things. Here’s some of what I did:
- Tweaked the user interface a bit, partly to improve responsiveness after 10000 or so haiku have been found.
- Made the list of haiku stay scrolled to the bottom so you can see the new ones as they’re found.
- Added a progress bar instead of the spinner that was there before.
- Fixed a memory issue.
- Changed a setting so it should work in Mac OS X 10.6, as I said here it would, but I didn’t have a 10.6 system to test it on, and it turns out it does not run on one. I think 10.7 (Lion) is the lowest version it will run on.
- Added some example text on startup so that it’s easier to know what to do.
- Made it a Developer ID signed application, because now that I have a bit more time to do Mac development (since I don’t have a day job; would you like to hire me?), it was worth signing up to the paid Mac Developer Program again. Once I get an icon for Haiku Detector, I’ll put it on the app store.
- Fixed a few bugs and made a few other changes relating to how syllables are counted, which lines certain punctuation goes on, and which things are counted as haiku.
That last item is more difficult than you’d think, because the Mac speech synthesis engine (which I use to count syllables for Haiku Detector) is very clever, and pronounces words differently depending on context and punctuation. Going through words until the right number of syllables for a given line of the haiku are reached can produce different results depending on which punctuation you keep, and a sentence or group of sentences which is pronounced with 17 syllables as a whole might not have words in it which add up to 17 syllables, or it might, but only if you keep a given punctuation mark at the start of one line or the end of the previous. There are therefore many cases where the speech synthesis says the syllable count of each line is wrong but the sum of the words is correct, or vice versa, and I had to make some decisions on which of those to keep. I’ve made better decisions in this version than the last one, but I may well change things in the next version if it gives better results.
Here are some interesting examples of words which are pronounced differently depending on punctuation or context:
|ooohh||Pronounced with one syllable, as you would expect|
|ooohh.||Pronounced with one syllable, as you would expect|
|ooohh..||Spelled out (Oh oh oh aitch aitch)|
|ooohh…||Pronounced with one syllable, as you would expect|
|H H||Pronounced aitch aitch|
|H H H||Pronounced aitch aitch aitch|
|H H H H H H H H||Pronounced aitch aitch aitch|
|Da-da-de-de-da||Pronounced with five syllables, roughly as you would expect|
|Da-da-de-de-da-||Pronounced dee-ay-dash-di-dash-di-dash-di-dash-di-dash. The dashes are pronounced for anything with hyphens in it that also ends in a hyphen, despite the fact that when splitting Da-da-de-de-da-de-da-de-da-de-da-de-da-da-de-da-da into a haiku, it’s correct punctuation to leave the hyphen at the end of the line:
Though in a different context, where – is a minus sign, and meant to be pronounced, it might need to go at the start of the next line. Greater-than and less-than signs have the same ambiguity, as they are not pronounced when they surround a single word as in an html tag, but are if they are unmatched or surround multiple words separated by spaces. Incidentally, surrounding da-da in angle brackets causes the dash to be pronounced where it otherwise wouldn’t be.
|U.S or u.s||Pronounced you dot es (this way, domain names such as angelastic.com are pronounced correctly.)|
|U.S. or u.s.||Pronounced you es|
|US||Pronounced you es, unless in a capitalised sentence such as ‘TAKE US AWAY’, where it’s pronounced ‘us’|
I also discovered what I’m pretty sure is a bug, and I’ve reported it to Apple. If two carriage returns (not newlines) are followed by any integer, then a dot, then a space, the number is pronounced ‘zero’ no matter what it is. You can try it with this file; download the file, open it in TextEdit, select the entire text of the file, then go to the Edit menu, Speech submenu, and choose ‘Start Speaking’. Quite a few haiku were missed or spuriously found due to that bug, but I happened to find it when trimming out harmless whitespace.
Apart from that bug, it’s all very clever. Note how even without the correct punctuation, it pronounces the ‘dr’s and ‘st’s in this sentence correctly:
the dr who lives on rodeo dr who is better than the dr I met on the st john’s st turnpike
However, it pronounces the second ‘st’ as ‘saint’ in the following:
the dr who lives on rodeo dr who is better than the dr I met in the st john’s st john
This is not just because it knows there is a saint called John; strangely enough, it also gets this one wrong:
the dr who lives on rodeo dr who is better than the dr I met in the st john’s st park
I could play with this all day, or all night, and indeed I have for the last couple of days, but now it’s your turn. Download the new Haiku Detector and paste your favourite novels, theses, holy texts or discussion threads into it.
If you don’t have a Mac, you’ll have to make do with a few more haiku from the New Scientist special issue on the brain which I mentioned in the last post:
Being a baby
is like paying attention
with most of our brain.
But that doesn’t mean
there isn’t a sex difference
in the brain,” he says.
They may even be
a different kind of cell that
just looks similar.
It is easy to
see how the mind and the brain
We like to think of
ourselves as rational and
It didn’t seem to
matter that the content of
these dreams was obtuse.
I’d like to thank the people of the xkcd Time discussion thread for writing so much in so many strange ways, and especially Sciscitor for exporting the entire thread as text. It was the test data set that kept on giving.
I’ve been sitting on some improvements to Haiku Detector for a while, and it’s about time I released the new version. I had been planning to put this version on the app store, but I’m waiting to hear back from somebody about an icon for it. So for now, you can download it without going through the store. It should work on Mac OS X 10.6 or later.
This version finds haiku made up of multiple sentences rather than only those made of 17-syllable sentences. I also fixed the bug which caused it to crash occasionally when dealing with very long texts. To celebrate, I’ll go through some of the same texts I did when I first released Haiku Detector, and see what new haiku are discovered. To start with, John Scalzi‘s Old Man’s War. This version of Haiku Detector finds 304 haiku in it. Sometimes, sentences can be included in more than one haiku:
“I’m sorry. My sense
of humor was surgically
removed as a child.”
“My sense of humor
was surgically removed as
a child.” “Oh,” I said.
“Oh,” I said. “That was
a joke,” she said, and stood up,
extending her hand.
Here are some of my favourites of the multi-sentence haiku:
She asked, still without
actually looking up
at me. “Pardon me?”
“Okay,” I said. “Mind
if I ask you a question?”
“I’m married,” she said.
“Well, she doesn’t have
to live with you, now does she.”
“How was the cookie?”
“Our friend Thomas would
make it to mile six before
his heart imploded.”
This one sounds like it could be a metaphysical statement about what consciousness is in general:
Your consciousness is
perceiving the small time lag
between there and here.
“I would not presume
to assume, Master Sergeant!”
‘Presume to assume’?
My wife’s out here, sure.
But she’s happy to live her
new life without me.
“Let me see.” Silence.
The familiar voice again.
“Get this log off him.”
“The question now is
what is really going on.”
“Any thoughts on it?”
I think this one is my favourite:
I can just be me.
But I think you could love me
if you wanted to.
I found a lot of new haiku in the CMS paper announcing the discovery of the Higgs boson, but they were all combinations of names from the stupendous author list. Since I included some from New Scientist last time, here are some from the issue of New Scientist that I am currently reading, a special issue on the human brain:
are allowing us to see
the brain in action.
The sound waves broke up
the synchronous firing,
ending the seizure.
Sometimes an experiment
The ancient Greeks knew
about thought experiments
These two go together:
Does that mean we should
revise our definition
the same one had been used since
I have many ideas for improving Haiku Detector, and I’d still like to see if I can detect the best-sounding haiku using linguistic tagging, but before that I’m thinking of rewriting the whole thing in Swift as a learning exercise. Since I don’t have a day job at the moment, I have a bit of free time if I strategically ignore sections of my to-do list. Actually, on that note, here are some particularly obvious haiku from the Mac OS X and iOS Human Interface guidelines:
At a minimum,
a menu displays a list
of menu items.
A picker displays
a set of values from which
a user picks one.
That will do for now. I hope you enjoy playing with the new version of Haiku Detector.
Recently I had the honour of being a fan juror for the Logan Whitehurst Memorial Awards for Excellence in Comedy Music (Logan Awards for short.) It was great to finally have an important reason to listen to comedy music for several full days, and a response to the eye-rolling of my friends when I mentioned yet another funny song, although deciding which songs to vote for was pretty tough. As a juror I had to listen to or watch all the songs and music videos nominated by the general public, and choose my favourite five nominees in each category. I can’t tell you which ones I voted for, but the finalists (chosen based on the votes of all the jurors, with ties broken by Dr. Demento) have been announced. Since the page on the Logan Awards site doesn’t link to the songs in question, I thought I’d link to them here. In alphabetical order:
Outstanding Parody Song
- The Boobles — Have Natural Es (a parody of The Beatles’ ‘Act Naturally‘)
- Carrie Dahlby feat. Wyngarde — Almost Parent Time (a parody of Mike Reno and Ann Wilson’s ‘Almost Paradise‘)
- Devo Spice (feat. Power Salad) — Snack Bar (a parody of Macklemore’s ‘Thrift Shop‘)
- Kyle Kallgren & Tony Goldmark — Kill the Mouse (a parody of ‘The Mob Song‘ from Disney’s Beauty and the Beast. You might also like to see the video in which it appeared, though that was irrelevant to the judging in this category)
- ‘Weird Al’ Yankovic — Word Crimes (a parody of Robin Thicke’s ‘Blurred Lines‘)
Outstanding Original Comedy Song
- the great Luke Ski — Fake Adult
- Mikey Mason — Last Day at Work
- Mikey Mason — Settle
- Power Salad — Amazon Drone
- Worm Quartet — Fueled by Angst
Outstanding Comedy Music Video
- CBS Follies — Bitch in Business
- Epic Rap Battles of History — Sir Isaac Newton vs. Bill Nye
- Rhett and Link — I’m on Vacation
- ‘Weird Al’ Yankovic — Tacky
- ‘Weird Al’ Yankovic — Word Crimes
Here’s a YouTube playlist of the music video finalists:
The winners will be announced at FuMPFest on 5—7 June in Wheeling, Illinois. If you like funny music and are anywhere near there, I recommend going; it sounds like great fun. Many comedy musicians will be there, including guests of honour The Arrogant Worms. I saw some of the same performers and fans at the MarsCon 2014 dementia track, and it was a blast.
I’d like to give the other nominees a bit of publicity, but it’s difficult to do that without people making inferences about my votes, so here is a YouTube playlist of all the videos nominated in open nominations for the ‘Outstanding Comedy Music Video’ category, sorted in ascending order of views because the ones with the fewest views need the publicity more.
If you’re a Paul and Storm fan, you might be wondering why no songs from their 2014 album Ball Pit are in the finals. Some of their songs were nominated in the open nominations, but they became ineligible for the award when Paul Sabourin joined the jury. In honour of Paul’s noble sacrifice, I present to you a nominated video of this song of theirs which would have made the judging harder for me, if only because of all the freeze-framing to see the details:
If you’d like to hear more comedy music, consider subscribing to The FuMP podcasts, where you can get several comedy songs a week for free. You can find even more funny music on the Mad Music Archive, the Dr. Demento Show, or Songs About Science & Math. Also, check out the Logan Whitehurst website to find out about the awards’ namesake and buy his music.