Posts Tagged OmniGraffle
A while ago I found a post about Surprising shared word etymologies, where the author had found words with common origins (according to Etymological Wordnet) which had the most dissimilar meanings (according to GloVe: Global Vectors for Word Representation.) I loved the post, but my main takeaway from it was the The History of English Podcast, linked in the Further reading section. I immediately started listening to that, in reverse order (that’s just the easiest thing to do in the Apple Podcasts app. Back when podcasts were in iTunes, I used to listen to all my podcasts on shuffle, so if you like order, this is an improvement) starting from Episode 148. I’ve since finished it and started listening to something else before I go back for the newer episodes. I was in it for the English, but I also learnt lot more history than I expected to.
Back in October, hearing about how yet another absurd list of words all derived from the same root word (I think in this case it was bloom, flower, phallus, bollocks, belly, flatulence, bloat, fluid, bladder, blow, and blood from episode 62) I decided I couldn’t just listen to these ridiculous linguistic family trees any more; I had to see them. As you might have seen in previous posts, my go-to for creating that kind of diagram is using AppleScript to control OmniGraffle. So I wrote an AppleScript to make tree diagrams showing words that are all derived from the same root word(s) as a given word. Before I bore you with the details, I’ll show you a little example. This is what it gave when I asked for the English word ‘little’.
The root word is in a blue oval, the words in the same language as the one I asked about (in this case, English) are in brown rounded rectangles, and the words in other languages are in black rectangles. I thought about having a different colour and shape for each language, and a legend, but decided to keep things simple for now.
The script also generates a simple image description, which I’ve used in the caption. I intended it for use as alt text, but some of these diagrams are difficult to read at the size shown, so even people who don’t use screen readers can benefit from the description. You can also click on any diagram for a full-sized pdf version.
It doesn’t describe the entire structure of the tree (I’m trying not to get distracted researching nice ways to do that for arbitrary trees!) but it’s probably better than nothing. It only lists the words in the language you asked about (assuming that is English), since English screen readers likely wouldn’t read the other ones correctly anyway. It might be cool to autogenerate sound files using text-to-speech in voices made for the other languages and attach those to the nodes to enrich the experience when navigating through them in OmniGraffle or some other format it can export, but that’s a project for another day.
On the subject of accessibility, I’m happy that the History of English Podcast provides transcripts, so I can easily find the episodes relevant to some of these diagrams.
Simplifying the diagrams
Sometimes the diagrams get crowded when a lot of words are derived from another word in the same language, or a lot of other languages derived words from the same word. I wrote a second script to group words into a single node if they’re all derived from the same word, don’t have any words derived from them, and are all in the same kind of shape as the word they’re derived from. That last constraint means that if you searched for an English word, English words all derived from the same English word will be grouped together, and non-English words all derived from the same non-English word will be grouped together, but English words derived from a non-English word (or vice versa) are not, because I think they are more interesting and less obvious.
It’s actually quite satisfying to watch this script at work, as it deletes extra nodes and puts the text into a single node, so I made a screen recording of it doing this to the diagram of the English word ‘pianoforte’. I’m almost tempted to add pleasant whooshing sound effects as it sweeps through removing nodes.
Words and their etymologies
The data from Etymological Wordnet comes as a tab-separated-values file. AppleScript is best at telling other applications what to do, not doing complicated things itself, so I left all the tsv parsing up to Numbers, and had my script communicate with Numbers to get the data. The full data has too many rows for Numbers to handle, but I only needed the rows with the type
rel:etymology, so I created a file with just those rows using this command:
grep 'rel:etymology' etymwn.tsv > etymology.tsv
then opened the resulting etymology.tsv file in Numbers, and saved it as a numbers file. This means missing out of a few etymological links (some of which are mentioned below), but it’s good enough for most words.
The file simply relates words to the words in the first column to words they are derived from in the third column.
Each word is listed with a language abbreviation, a colon, then the word. The readme that comes with the Etymological Wordnet data says, ‘Words are given with ISO 639-3 codes (additionally, there are some ISO 639-2 codes prefixed with “p_” to indicate proto-languages).’ However, I found that not all of the protolanguage codes used were in ISO 639-2, so I ended up using ISO 639-5 data for protolanguages and ISO 639-3 data for the other languages, both converted to Numbers files and accessed the same way as the etymology data.
The script starts by finding the ultimate root word(s) of whatever word you entered. It finds the word each word is immediately derived from, then finds the word that was derived from, and so on, until it gets to a word that doesn’t have any further origin. Some words have multiple origins, either because they’re compound words, homographs, or just were influenced by multiple words, so sometimes the script ends up with several ultimate root words. This part of the script ignores origins that have hyphens in them, because they’re likely common prefixes or suffixes, and if you’re looking up ‘coagulate’, you’re unlikely to want every single word derived from a Latin word with a prefix ‘co-‘.
For each of the root words, the script finds all words derived from it, and all words derived from those, and so on, and adds them to the diagram.
In case you want to try making your own trees, I’ve put the AppleScripts and the Numbers sheets used for this in a git repository. It turns out having the version history is not terribly useful without tools to diff AppleScript, which is not plain text. It is possible to save AppleScript as plain text, but I didn’t do that in the beginning, so the existing version history is not so useful. It looks like AS Source Diff could help.
There are a lot of frustrating things about AppleScript when you’re used to using more modern programming languages. Sometimes that’s part of the fun, and sometimes it’s part of the not-fun.
Trees from Surprising Shared Etymologies
I tried making diagrams of some of the interesting related words mentioned in The History of English Podcast, such as the one with flower, bollocks, phallus and blood mentioned earlier, but the data usually didn’t go back that far. So I tried the ones mentioned in the Surprising shared etymologies post, because I knew they were found in the same data. In several cases I found the links didn’t actually hold up, as the words were descended from unrelated homonyms. I’ve done my best to figure out which parts of these trees are correct, but can’t guarantee I got everything right, so take this information with a grain of research.
“piano” & “plainclothed”
This was a bit of a puzzle, because there is actually no origin given in the data for English word ‘piano’, although it is given as the origin of many words in other languages. But their example in the ‘datasets’ section shows English: pianoforte, so I used that instead.
I could have added a row to the spreadsheet linking English ‘pianoforte’ with English ‘piano’, and then the many words in other languages that derive from English ‘piano’ would have shown in the diagram as well. Click on the diagram for a pdf version.
“potable” & “poison”
Also potion! According to the data, Latin potio is derived both from Latin poto, and from Latin potus, which is itself derived from Latin poto. The word is its own niece! I had to make a change to the script to ensure there wouldn’t be double connections in this case.
“actor” & “coagulate”
Agile and exiguous, too! It’s starting to get a bit complicated.
“estate” & “contrast”
This tree also includes ‘prostate’, but only ‘pro-state’ (meaning favouring the government) derives from English ‘state’ as shown here. Prostate the body part is actually related, but only if we go back to the Proto-Indo-European root *sta-, which is not in the Etymological Wordnet data. Since the data doesn’t distinguish between the two meanings of ‘prostate’, this tree erroneously includes prostatectomy and cryoprostatectomy, a procedure I was happier not knowing about.
If you think it’s surprising that ‘estate’ and ‘contrast’ are related, have a look at other words derived from *sta-. Understand, obstetrics, Taurus, Kazakhstan… if Etymological Wordnet had that data, this tree would resemble Pando.
“pay” & “peace”
This one comes up in episode 59 of the podcast — the word ‘pay’ literally meant ‘make peace’. It’s not too hard to imagine how paying someone would pacify them. The diagram is incorrect though. ‘Peace’ is shown as being derived from Middle English pece. This is actually the source of ‘piece’, but not ‘peace’. As far as I can tell, pece (and therefore also ‘piece’) shouldn’t even be in this tree. The word ‘peace’ is derived from Middle English pees, near the middle of the diagram, so it is still related to ‘pay’.
“cancer” & “cancel” & “chancellor”
As explained in episode 99 of The History of English Podcast, chancellor is just the Parisian French version of the Norman French canceler. The word ‘cancel’ didn’t come from ‘canceler’, though — ‘cancel’ and ‘chancellor’ both come from a word meaning lattice, whether the lattice a chancellor stands behind, or that of crossing something out to cancel it. The same word also give rise to ‘incarcerate’, but that link is not in the data.
As far as I can tell, these are not actually related to the English word ‘cancer’, though. There are two unrelated Latin words ‘cancer’, one meaning ‘lattice’, and the other meaning ‘crab’, and thus crab-like cancer tumours.
“fantastic” & “phenotype”
This also shows that ‘craptastic’ is related to ‘phasor’. Sometimes the best things about these are the lists of derivative slang words.
“college” & “legalize”
Also ‘cull’, ‘legend’, and ‘colleague’.
“lien” & “ligament”
‘Cull’ should not be in this diagram, as it’s related to a different homonym of Latin colligo. See the ‘Limitations‘ section below.
“journal” & “journey”
Surprising shared word etymologies says:
While it seems like “journal” and “journey” should be close cousins, their nearest common ancestor is in fact quite old – the Latin “diurnus”, meaning “daily”.
This seems about right from the data, and I’m surprised they didn’t both come from the Old French jor. My dictionary of French etymology doesn’t list the French versions of either word.
This is the tree I get if I start from the word ‘journal’. If I start with ‘journey’, it shows that Latin diurnum is also given as an origin of Old French jor, but this adds a lot of complication to the tree and only one extra English word, ‘abatjour’.
“educate” & “subdue”
I’m not sure how they got these two, to be honest. They may indeed be related, if, as etymonline says, subdue came from the same root as subduce, and subduce and educate came from Proto-Indo-European *deuk- (or *dewk-, as wiktionary spells it). There’s a lot about other words from that root (not including ‘subdue’) in episode 85 of the podcast.
I don’t know how they got this from the Etymological Wordnet data, though. Etymological Wordnet was extracted from an older version of wiktionary, and it doesn’t have very many Proto-Indo-European roots. The post says that ‘subdue’ comes from the latin subduco, meaning ‘lead under’. But even looking at all the data (not just the rows with ‘
rel:etymology‘), ‘subdue’ is only linked to other English words. Perhaps they were looking at ‘subduce’ instead.
The post also says they both come from Latin duco. If I look at all the data, I can get to Latin duco from ‘educate’ (via Latin educatio and educo.) But looking more closely at that link on wiktionary (the source of Etymological Wordnet’s data) it seems there are two meanings of Latin educo, one coming from Latin duco and one coming from Latin dux, and it’s the dux origin that seems more relevant to education. However Proto-Indo-European *deuk- is the hypothetical source of dux, so that’s how it relates to subdue.
I’m getting a bit lost following these words around wiktionary and etymonline. I believe they’re related, but I’m not sure if they’re related via Latin duco, and I haven’t a clue how the relationship was found in the Etymological Wordnet data (I should probably read and/or run their ruby code to find out), so I can’t generate even an erroneous family tree of it.
Did you notice that the word ‘cull’ shows up in both the tree for ‘college’ and the one for ‘ligament’? Does that mean that ‘ligament’ is also related to ‘college’? Nope. The issue here is that the Latin colligo has two distinct meanings with different origins, one via Latin ligo, and one via Latin lego. ‘cull’ derives from the ‘bring together’ meaning of colligo, which derives from lego, so it’s actually not related to ‘ligament’. Only one origin for colligo is shown on each of these two trees, since neither ‘college’ nor ‘ligament’ are derived from colligo, so the script only got to colligo when coming down from one of the ultimate root words, rather than when going up from the search word. But if we create a tree starting with the word ‘cull’, it gets both origins and the resulting tree makes it look like ‘college’ and ‘ligament’ are related.
Since the data only has plain text for each word, there’s no way for the script to know for sure that colligo isn’t one word with multiple origins (like ‘fireside’ is), but actually two separate words with different origins. And there’s no way for it to know which origin for colligo happens to be the one that ultimately gave rise to ‘cull’.
A trivial example
I’ll leave you with a tree I found while looking for a trivial example to show at the beginning. Here’s the tree for ‘trivial’. There are many more related words given in episode 37 of The History of English Podcast.
Perhaps I will not post something interesting every day for the rest of the month, but I should at least try.
Today I watched this video from the Virtual Linguistics Campus:
After that, I intended to analyse some sentences myself, but I got sidetracked thinking of simple ways to make diagrams like the ones in the video. It looks like there are apps and LaTeX packages to do something like it, but just for fun, I modified the AppleScript I wrote for diagramming monduckens to turn text like this:
Clause(Adverb(Perhaps) NP(Noun(you)) VP(Auxiliary(will) Adverb(never) Verb(find) NP(Determiner(a) Noun(job)) PP(Preposition(as) NP(Determiner(a) Noun(linguist))))) Clause(Conjunction(but) Noun(you) VP(Auxiliary(should) Adverbial(at least) Verb(try)))
into a tree like this in OmniGraffle:
Note that I am not sure if this is strictly correct (I think the adverbial ‘at least’ could have been broken into words, and the conjunction perhaps shouldn’t have been included in the second clause) but it’s how it is in the video. Redone with only rectangles (which is an option when running the script) and using the exact same Tree nester script the monducken diagrams did, this can then be turned into a rather oversized and misaligned version of the sentence with rectangles around the constituents:
I didn’t have a lot of time, so it’s pretty crude as yet, but it would be fairly simple to adjust the settings of the shapes to be more like what’s in the video. I’m posting it now in order to continue with Holidailies.
While we’re on the subject of grammar, The Doubleclicks have just covered a Tom Lehrer song about adverbs. I get this song in my head every single time I answer a ‘how’ question with an L-Y adverb, so I am very happy about the cover.
Below is a flow chart to help people determine when to use which pronouns to use in English. Click for a full-sized pdf version. I did not originally intend for it to be Zork-themed, but the first example sentence I thought of for ‘you’ happened to be ‘You are likely to be eaten by a grue.’ I hope the resulting colours don’t make it too hard to read, though perhaps that would help people remember it. If I’d realised it was going to end up so Zorky, I would have made the questions and answers read more like a text adventure. It’s just as well I didn’t, since would probably have made it less comprehensible to the non-native English speakers who are most likely to need help with pronouns.
I am not a linguist, so if you spot something I’ve got wrong or missed out (apart from interrogative, relative and possessive pronouns, details of gender-neutral singular pronouns and informal plurals of ‘you’, and other cases where additional people are named separately, which I omitted for the sake of simplicity), let me know. My original plan was to do such flow charts for all four Swiss languages, with English for comparison and maybe Māori for its interesting system of pronouns, but this one took long enough, and it ought to be the easiest for me. I also intended to have clearer and more interesting example sentences, and simpler-to-understand questions about subjects and objects, but I’m already a few weeks late due to visitors and travel, and I’m busy preparing for a cruise, so this will have to do for now.
The match photo is by Sebastien Ritter. I used it to keep grues away and to ensure the diagram would be illuminating at least in some sense.
First, check out Vi Hart‘s video about the Thanksgiving turduckenen-duckenen:
Okay, there are monkeys instead of turkeys, and the mathematics isn’t quite as explicit, but it’s pretty similar, don’t you think? Now, let’s imagine that Mike Phirman is actually singing the recipe for a fractal turducken, or rather, monducken. You can imagine all the monkeys are turkeys if you’d rather eat the result than present it to some pretty thing to please them. (Note: Please do not kill any actual monkeys.) Monkeys, like birds, belong in trees, so I wrote an AppleScript to draw binary trees in OmniGraffle based on the text of the song. You can try it for yourself if you like; all you need is a Mac, OmniGraffle, and a text file containing some words. See the bottom of this post for links and instructions.
If Mike’s reading the binary tree recipe layer by layer, like the first example in Vi’s video, one possible tree for the first stanza of Chicken Monkey Duck looks like this, where the orange ovals are monkeys, blue hexagons are chickens and green clouds are ducks. You can click it (or any other diagram in this post) for a scalable pdf version where you can read the words:
I added numbers so you can easily tell the chickens, monkeys and ducks apart and see which way to read the tree. It’s simple enough now, but the numbers will be useful for reading later trees which are not in such a natural reading order. This is called a breadth-first traversal of the tree, in case you’re interested. Now, what do birds and monkeys do in trees? They nest! So I wrote another script that will take any tree-like diagram in OmniGraffle and draw what it would look like if the birds, monkeys, or whatever objects they happen to be (the drawing is pretty abstract) were nested inside each other, just like the quails inside the chickens inside the ducks inside the turkey. This is what the monducken described by the first stanza of Chicken Monkey Duck, in the tree structure shown above, would look like:
The Monducken script allows using a different shape for each animal as redundant coding for colourblind people, even though it already chooses colours which most colourblind people should be able to distinguish. But that makes the nested version look a little messy, so here’s the above diagram using only ovals:
If you named this particular recipe in the other way, going down the left side of the tree and then reading each branch in turn in what is known as a pre-order traversal, it would be called a Monenmonenduckduckmon-monmonducken-enenmonduckmon-enmonduck-enduckmonducken-enmonen-duckenenmon-monenmon. It doesn’t sound nearly as nice as Turduckenailailenailail-duckenailailenailail because Mike Phirman didn’t take care to always put smaller animals inside large ones. I’m not holding that against him, because he didn’t realise he was writing a recipe, and besides, it’s his birthday. For reasons I’m not sure I can adequately explain, it’s always his birthday.
But what if I completely misunderstood the song, and his recipe is already describing the fractal monducken as a pre-order traversal, always singing a bird or monkey immediately before the birds and monkeys inside it? Well, don’t worry, I added a ‘pre-order’ option to the script, so you can see what that would look like. Here’s the tree:
and here’s how the actual birds/monkeys would look if you cut them in some way that showed all the animals, dyed them the correct colours, and looked through something blurry (here’s the version with different shapes):
Okay, but that’s only the first stanza. What if we use the whole song? If we pretend the recipe is breadth-first, this just means all the extra monkeys and birds will be at the bottom levels of the tree, so the outer few layers of our monducken will be the same, but they’ll have a whole lot of other things inside them:
Here’s a close-up. Isn’t it beautiful?
If the entire song were treated as a pre-order monducken recipe, we’d still have the same monkey on the outside, but the rest would be quite different:
We could also read the birds and monkeys from left to right, as Vi did in her video. That’s what’s called an in-order tree traversal. But as delicious as they are mathematically, none of these orderings make much sense from a culinary perspective. Even if the monkeys were turkeys, it’s obvious that a nice big goose should be the outer bird. Vi suggested that herself. Of course, we could put the goose on the outside simply by reversing the song so it started with goose. But it would be much more fun and practical to pretend that Mike is naming the two inner birds before the one that contains them. This is called a post-order traversal, because you name the containing bird after the two birds or monkeys it will contain. It makes sense for a recipe. First you prepare a monkey (or turkey) and a chicken, then you immediately prepare a chicken and put them into it. You don’t have your workspace taken up with a whole lot of deboned birds you’re not ready to put anything into yet. Here’s one way the recipe could be done:
Note that no matter what kind of traversal we use, there are actually several ways the recipe could be interpreted. If Mike says ‘monkey chicken chicken’ you know you should take a monkey and a chicken and put them in a chicken. But if the next words are ‘monkey chicken’, do you take that stuffed chicken and a monkey and put them inside a chicken? Do you debone the monkey and the chicken and wait for the next bird to find out what to put them into? What if there’s no next bird? What if there’s only one more bird (let’s say a duck) and you end up with a stuffed chicken, a stuffed duck, and nothing to stuff them into? You’d have to throw one of them out, because obviously your oven only has room for one monducken. Assuming you want two things in each thing, and you don’t know how long the song’s going to be, the best way to minimise this kind of problem is to always take your latest stuffed thing and the next, unstuffed thing, and put them inside the thing after that. The worst that’ll happen is you’ll have to throw out one unstuffed bird or monkey. But then you end up with a really unbalanced monducken, with a whole lot of layers in one part and lonely debonely birdies floating around in the rest.
It helps to have a robot chef on hand to figure out how many full layers of monducken you can make without it being too asymmetric. Mine makes the trees completely balanced as deeply as possible, and then does whatever was easiest to program with the remaining birds and monkeys. In this case it was easiest for my program to stuff a whole lot of extra animals into that one monkey on the left. This is what it looks like, with the varied shapes this time. Luckily, geese are rectangular, so they fill your oven quite efficiently:
I like how you can see the explosion of duck radiating out from the inner left, engulfing all the other birds and monkeys before itself being swallowed by a goose. Such is life.
If you would like to make diagrams like this yourself, there are two AppleScripts you can use. Both of them require OmniGraffle 5 for Mac, and if you want to make trees with more than 20 nodes you’ll probably need to register OmniGraffle.
The first is Monducken diagrammer, which you can download either as a standalone application (best if you don’t know what AppleScript is) or source code (if you want to tweak and critique my algorithms, or change it to use OmniGraffle Professional 5 instead of OmniGraffle 5.) Because it’s AppleScript, it works by telling other applications what to do, rather than doing things itself. So when you run it, TextEdit will ask you to open the text file you want to turn into a tree. Once you’ve opened one, OmniGraffle will start up (you may need to create a new document if it’s just started up) and ask you two things. First it will ask what kind of tree traversal the text file represents. Then it will ask you what kinds of shapes you want to use in your tree. You can select several shapes using the shift and command keys, just as you would for selecting multiple of just about anything on your Mac. Then you can sit back and watch as it creates some shapes and turns them into a tree.
The other one is Tree nester (standalone application/source code) You should have an OmniGraffle document open with a tree-like diagram in it (I suggest a tree generated using Monducken diagrammer; it has not been tested on anything else, and will probably just duplicate most of the shapes that aren’t trees or end up in an infinite loop if there’s a loopy tree) before you run this. It won’t ask any questions; it’ll just create a new layer in the front OmniGraffle document and draw nested versions of any trees into that layer.
If you’re looking at the source code, please bear in mind that I wrote most of this while on a train to Cologne last weekend, based on some code I wrote a while ago to draw other silly diagrams, and I really only dabble in AppleScript, and I forgot about the ‘outgoing lines’ and ‘incoming lines’ properties until I’d almost finished, so it probably isn’t the best quality AppleScript code. Not the worst either though. I welcome any tips.