The English version of the Latin script is dull. It rarely uses diacritics and my perception of it is that it only uses basic letters. That is, it adds no new letters to the basic set, but is this true? The alphabet one grows up with is always likely to seem basic, with other languages either missing out letters we see as essential or adding extra superfluous ones. The question is therefore, how common is it to use a twenty-six letter Latin alphabet? What do I mean by “how common”? Am I counting this by languages, number of speakers or number of reader and writers? Is there a relevant historical context?
The alphabet we use now is neither identical to the one the Roman empire used nor the same as the variants used to write English through history. The Roman alphabet itself was as follows:
ABCDEFGHIKLMNOPQRSTVXYZ
That’s twenty-three letters, with three at the end which were only used in foreign words, actually Greek. K was also rare because initially C was always hard. Q looks at first like a redundant letter as well, but in fact wasn’t because the Latin “QV” was simultaneously articulated, so it was a K and a W pronounced at the same time. I won’t harp on about Latin pronunciation here but V was always pronounced as U or W as English pronounced it. It is a little odd that Latin didn’t just use a Q instead of “QV”, but it didn’t for some reason, and the letter it was adapted from, qoph, is a uvular plosive which is nothing like the pronunciation of Q in Latin.
Latin itself introduced three digraphs later on: Æ, Œ and a third “AV” digraph which I can’t find the Unicode for. “AV” was, as far as I know, lost without trace with the Roman Empire and has never been used in any other language, but the first two throve. All of them seem to represent a gradual shift in the pronunciation of Latin vowels from diphthongs to pure vowels which they seemed to want to represent in spelling. The first two are common in modern English up until quite recently, and the first, known as “æsc” from the name of a rune which represented the same sound, was used in Anglo-Saxon all the way through to the Norman Conquest and beyond, since early Mercian texts in the post-Conquest period persisted with Anglo-Saxon spelling for some time. In English it represented a short “A” sound as in the Southern English English pronunciation of “pant”, and its longer counterpart as in “man”. Later on it was used to write words such as “encyclopædia” or “pæony”, but this was a Latinate approach not connected to its earlier use. There was probably a short period between the change in Midland English spelling and the adoption of Latinate words into English when it wasn’t used. As for Œ, this was used in French and was adopted into English as well, and is not as common as Æ. I’ve seen older alphabets in English-language encyclopædias and dictionaries which are not used for alphabetical order in the book itself but which place Æ, Œ and also the ampersand, &, at the end of the alphabet in that order. There’s a sense in which & is also a digraph, standing for “ET”, but it’s now perceived as an ideogram for a conjunction. Although this ordering didn’t influence alphabetical order in English, it does in Swedish, Finnish, Danish and Norwegian, which also put similar letters to these at the end of the alphabet. In the case of Danish and Norwegian, the letters are Æ and Ø, whereas Swedish and Finnish follow the German convention of using Ä and Ö. All of them follow these with Å, although in Finnish this is of only marginal use.
As a child I used to find the existence of extra letters in non-English Latin alphabets exciting and exotic, but of course had I grown up speaking, reading and writing, say, Danish, they would’ve just seemed normal to me. In fact the French use of Œ never really seemed odd or foreign so much as a mildly taunting reminder that there were stranger and more exciting languages out there than that of the people living fifty-seven kilometres southeast of me, regularly spoken by tourists on the streets of my home city and appearing in various signs and notices about the place. One of the oddities about living in East Kent is that far from being part of a gentle blending of culture and language into another milieu, the area digs its heels in and insists on being even more English than anywhere else. It’s no accident that Nigel Farage represented Thanet South (apparently the constituency’s official name is “South Thanet”). It’s a bit like Gibraltar really.
Scots is very slightly different, although not as she is spelt. It occasionally uses a Z in place and proper names such as Menzies to represent what has now become a /j/ sound, or consonantal “Y” for an English speller. This is from the letter yogh, Ȝ, which represents the “ch” sound as in “loch” and the H as in “human”, and is now written “gh”. Yogh evolved from the letter G as written in the Anglo-Saxon script known as Insular Half-Uncial, but didn’t become a letter in its own right until after the Norman Conquest. The Normans were generally unkeen on letters which didn’t exist in the French alphabet at the time, which is why it changed to “GH”. In general they tended to add an H to letters in English to create a new combination for a sound absent in French by analogy with their own “CH”.
W is more or less a foreign letter for all Romance languages so far as I know. French uses it for loanwords such as “weekend” and “waggon”, and since many of these are from German it’s often pronounced as /v/. Italian lacks quite a few letters found in our alphabet, including J, K, W, X and Y. Presumably to an Italian-language reader, English looks exotic for using these letters, but the weight of population is against this deeming them as unusual or “extra” since it isn’t a particularly widely-spoken language. Of H pronounced as an actual sound instead of an indication of a different pronunciation, only Romanian now has a strongly-pronounced version among the Romance languages, although I did once own a Spanish dictionary published in the twentieth century CE which reported that older Spanish speakers faintly pronounced it, so I presume it only disappeared from the language in the nineteenth. That might also be Castilian rather than Spanish overall.
H is more or less what English has instead of diacritics, and is used for similar purposes in Irish and Gàidhlig when those are written in the same script as English. Gàidhlig has a short alphabet compared to English:
ABCDEFGHILMNOPRSTU
(Plus the letters using a grave accent). The traditional Irish script was based on uncial and therefore differed from English in various ways:
As a dot was often used to replace H, that letter was practically non-existent in Irish, making the alphabet even shorter. English also used to use this script if you go back far enough, sans dots, although for some reason it dotted the Y but not the I. This gave Irish only seventeen letters for a time. Q-Celtic letters are named after trees incidentally.
The shortest alphabet of all, Latin or otherwise, is used by Rotokas, a Papuan language spoken on the island of Bougainville:
A E G I K O P R S T U V
Perhaps surprisingly, this includes two redundant letters as T and S can replace each other, and V is sometimes written B. The Hawaiian alphabet is also quite short:
AEHIKLMNOPUW
However, Hawaiian also has the ʻokina, ʻ, representing the glottal stop, which does count as a letter. Another short alphabet, this time in an unusual order, is used by Samoan:
AEIOUFGLMNPSTVHKR.
H, K and R are only used in foreign loanwords and R is often pronounced L. Samoan G has an odd history. When the first printing press was being sent over to Samoa, there was a storm and many of the N’s got washed overboard, so G was used for the “ng” sound which would otherwise have been spelt that way. So the story goes, anyway.
To people familiar with any of these alphabets as first languages, English would surely seem quite exotic, as our alphabet is twice as long as these, but at least in the case of Hawaiian, English is so dominant that this is unlikely to happen. Samoan is sometimes in a similar situation as it’s spoken on American Samoa.
In European languages, almost every one using the Latin script also uses diacritics, even including Welsh and Gàidhlig, although at least one orthography for Cornish doesn’t really. However, Dutch and Flemish don’t. Dutch, however, views “IJ” as a single letter and capitalises it as one, sticking it between Y and Z. Flemish just uses a Y for this. Frisian does use them, but seems to consider I and Y as variants of the same letter, and uses C only in the combination CH. Scots, as well as using Z and Y occasionally for yogh in proper names, also properly avoids apostrophes in many places where English would put them, because for Scots they’re not historically correct. Scots is descended from Northumbrian rather than Mercian and not all the letters deemed to be missing in English spelling are in fact missing in Scots, so some uses of apostrophes are culturally imperialist.
When I first started to learn German at thirteen, I was excited to be able to use a language with an extra letter, namely the Eszet, ß. However, it isn’t really compulsory, is absent in Swiss German and amounts to a digraph. It’s a long S followed by a final one, and like the ʻokina, lacks separate capital and lowercase forms, making it look a bit weird when something is written in all-caps.
Old English had three other letters it has since lost as well as æsc: Þ,ð and Ƿ. The first, þorn, represents voiceless “TH” and was a worthwhile letter of the alphabet although it’s easily confused with P. The difference is that it has an ascender and a descender. Ð, which is ð in lowercase, is often interchangeable with Þ in Old English manuscripts but is used to represent the voiced “TH” sound. Finally, ƿynn, like þorn, is an adapted rune, but represents /w/. The oddity about the two runic letters is that they were both so similar to P, and I imagine this is one reason why they stopped using them, although there was also suspicion around the use of runes as they were perceived as pagan by the Normans. Even so, the loss of Þ and Ð is unfortunate.
There are two diacritics which I do in fact use in English. One is the diæresis, which marks out the second vowel in a pair when they don’t form a diphthong but are pronounced as separate syllables. There was a trend in the mid-twentieth century to replace hyphenation in words which caused two vowels to be placed adjacently with this diacritic, which seems like a neater solution and also saves space, but it has gone out of fashion. The other thing I do is use a grave accent over an E in a past participle ending when I would pronounce the “-ed” as a separate syllable, which is a poetic thing rather than used in prose. I also use diacritics over some French loanwords into English. All of this is really because I find it quite sad that we don’t really use them. I kind of feel like our failure to use them is a kind of insular assertion of English, not British, identity which is pretty pathetic and spurious, and I’d prefer to join the rest of Europe and employ them. It’s notable that Irish, Gàidhlig and Welsh are all fine with them but English isn’t.
Afrikan languages which adopted the Latin alphabet have a basis for their spelling expressed as “vowels as in Italian, consonants as in English” or some such. This certainly applies to (Ki)Swahili for example, such as the word “jambo”. This is probably one cause of what I think of as the notorious “Afrika” with a K issue, which I came down on the K side of rather than the C side (“seaside”?). Because English could in theory completely abandon C were it not for its occurrence in CH, many Afrikan languages which use Latin script only use K for /k/. I’ve been into this already because although it doesn’t feel quite right in terms of its rational justification, I do it out of respect for those who claim it’s imperialist to spell it with a C because I see them as having specialised experience. Some Afrikan languages which do use the Latin alphabet, such as Yoruba, strictly Yorùbá, do use diacritics, in this case to represent tones, but also dots under the letters to indicate different pronunciations. In the case of Yorùbá, the situation is complicated. What I’ve just described is the approach taken in Nigeria, but in Benin ε and Ɔ are used instead of dotted E and O, and Odùduwà the divine king is said to have revealed a script to Tolúlàṣẹ Ògúntósìn in Benin, so that’s also used. This is typical of West Afrika in that there are more than two dozen recently invented scripts in the region.
Counting languages written using the Latin script by numbers of speakers, which may be much larger than numbers of writers, the dozen most spoken languages of this kind are: English, Spanish, French, Portuguese, Indonesian, German, Turkish, Vietnamese, Hausa, (Ki)Swahili, Italian and Nigerian Pidgin. Yorùbá is the fourteenth, incidentally. That's nearly fifteen hundred million as a first language, but many of them are more widely used as second languages. Of these, English, Nigerian Pidgin, Indonesian and Swahili lack diacritics, counting the use of the hooks on implosive letters in Hausa as diacritics. Turkish uses the dot over the I as a diacritic, so it's hard to know what to do with that because if it's accepted as one that would mean that English just does have diacritics, over I and J, but Turkish romanisation was quite unusual generally and is a bit of an anomaly. Vietnamese uses what strikes me as an excessive amount of them. The total number of first language speakers of English, Indonesian, Nigerian Pidgin and Swahili is 479 million, which puts English in the "unusual" category for not using diacritics.
| The next question is, does English have any unusual letters by number of first language alphabet users in this group? In other words, do any of our letters count as exotic? In toto, the letters missing from at least one of these languages which are present in English, including those used only in foreign loanwords in them, are: C (only as CH in Swahili), F, J, K, Q, V, W, X, Y and Z. Vietnamese is phonologically unusual as a language, and it alone excludes F and W. Of the other letters, W is not used for native words in six of them, Q and X in three, and J and K in two. This means that over a thousand million language users in this group have no native use for W. As for the others, K is not used by 297 million, Q is eschewed by forty-seven million, J by 141 million and X by 130 million. In terms of how “European” these absences are, Q is only absent in the non-European versions unless Turkish is counted as a European language, and K is only absent in European languages. |
Therefore, the best candidate for an unusual or “extra” letter in English seems to be W. This letter is not found in the majority of scripts for the twelve most spoken first languages using the Latin alphabet. Yet I can, as a native English user, look at the alphabet and not see W as exotic, even though it kind of is. In fact it’s so exotic that even the Old English alphabet lacked it.
There’s also a kind of “core” of “normal” letters, though some languages lack these too. These are: A, B, D, E, F, G, H, I, K, L, M, N, O, P, R, S, T and U. The surprises here are L and R, as many languages lack one or the other.
What, then, is English like as a written language aside from its weird orthography? Well, it’s remarkably unadorned with accents and the like, it uses one slightly unusual letter, W, and it tends to use H instead of diacritics. Don’t even get me or anyone else started on the spelling, but if someone with no reading knowledge of the Latin alphabet were to attempt to recognise written English, they should look for a language which uses C not followed by H, and also W, but lacks accents. If they did this, the chances are they would be left with a choice between English, Dutch, Flemish and Indonesian.
Writing this has made me think about West Afrika a lot, so now you’ve got something on that coming.
