kanji – A Box Of Chocolates

John Wilkins’ Dream Of Universal Writing

Posted on 08/08/2021 by zerothly

Not all of the keys on this keyboard produce letters. Obviously there are control characters but leaving those aside, if you were to read some of the characters on here, even if you knew no English, if you have been able to hear, a voice would pop up in your own head which would say them in your own language. For instance, I could type “1984”, and the chances are you would hear something like “neunzehnhundertvierundachtzig” or “dix neuf quatre vingt quatre” (sorry about the inverted commas). When I typed that, I thought the first thing but you may have thought “nineteen eighty-four” for example. There is a sense in which these characters transcend language but continue to represent concepts, which are expressed differently in different languages. Just to doodle vaguely on the keyboard, I can produce a list of such signs with ideographic rather than alphabetic meaning, thus:

¦, ¬,£,$,€,&,*,-,+,=,@,#,~,/.

Not all of these are going to mean something to everyone who reads them, and some of them will mean substantially different things, but I read the above list as follows:

Not both . . . and . . ., not, pounds, dollars, euros, and, multiplied by, minus, plus, equals, at, number, roughly, divided by.

A couple of the signs have more than one meaning and for me there’s a substantial number of other signs, widely used in technical contexts, which still have meaning for me but probably not for most people:

℞, c̄, →, ↔, ∨, ⊕, ∀,∃,⊢, ⟡, □.

These are: prescription (I think of it as a Demotic eye of Horus but that may be incorrect), with, materially entails that, if and only if, and/or, either . . . or . . . , for all of, there exists, it is a theorem that, contingently, necessarily.

There’s a third set I don’t know how to type on a computer which is taken from Blissymbols, a system of signs used for people with communication difficulties, but which seems to have been initially invented by Charles Bliss in 1949 as an international language. Since I use these only in writing, I have no idea if they’re in the Unicode character set, and it would defeat the object of typing or writing less if they were because it would take something like eight keystrokes to type the character which means “see” for example, and having just tried to type that it turns out this computer doesn’t have any in its character set, annoyingly.

Chinese characters, of course, are still ideographic although not everyone realises they also have a phonetic element. Nonetheless, in days of yore Chinese writing was considered practically magical because something written in hanzi, or 漢字 in Chinese characters, could be read all over China regardless of the “dialect” of Chinese one spoke. In fact they aren’t dialects but separate languages, and beyond China the same characters were used to write Korean, Japanese and Vietnamese. In principle, they could be used to write almost any spoken language although of course they aren’t, and 國語 is particularly well-suited to being written in 漢字 because it’s tonal, isolating and purports to be monosyllabic. A language which inflected more, such as Japanese, would prove harder to write, which I presume is why that also uses hiragana, katakana, emoji and romaji. Like Japanese, Chinese has one syllable per character but also in a sense one word per character. If you’re familiar with 漢字 you’ll probably have noticed that I always use the traditional forms of the characters rather than the simplified ones, which is because I’m more familiar with them, but also because they more closely represent the meaning visually. I find simplified Chinese characters quite confounding. As I’ve said, 漢字 are partly phonetic, which makes them more suitable for 國語 than other languages as homophones in 國語 may not be in 廣東話 (Cantonese) or 客家話 (Hakka), to take two examples, because they are in reality different languages. This means that were they to be adapted, for example, for English, we would find ourselves using the same phonetic to represent the “-s”, “-en” or mutation plurals such as “teeth” and “feet” as we would to write words like “door” or “gate”, which would be less obvious than if we had our own system.

The West was quite taken by 漢字 when we first encountered them, which, surprisingly, was not from Marco Polo who never mentioned them, which is one reason why some people think his account is based on hearsay. We had had ideographic scripts in Europe, but they had been largely replaced by alphabetic ones, and in the early Mediæval period they were more phonetic for English and French, for example, than they were to become. They proved to be a major influence on the thought of John Wilkins, but not the only one.

West Afrika has a very large number of scripts, as I’ve mentioned previously. Many of them are said to have come to the promoter in a dream or vision, but it’s also claimed that this is a means of avoiding the admission that they were supposed to be secret but someone decided to reveal them to the world, needing a plausible back story. This may be true because the characters used for writing Vai, for example, also crop up in Suriname. Now these sort of stories might be interpreted in a racist kind of way: surely nothing like that would ever happen here in Merrie England? Well, not so, partly because Merrie England is in many ways politically extremely reminiscent of the kind of politics currently practiced in some parts of Afrika, and probably is to an extent even today. The idea of orishas revealing a script to select individuals in West Afrika in fact has a close parallel in Tudor England in the form of the Enochian language, alleged to have been revealed by angels to John Dee in 1583 CE and having a distinctive script, which looks like this:

It’s easy to see from this that the Enochian alphabet is closely based on the English use of the Latin alphabet at the time. However, the Enochian language itself, constructed or not, is clearly most unlike English in vocabulary and grammar. Even the claim that it was revealed by spirits but was in fact a secret script already in existence applies to Enochian because it seems to be based on an earlier script called Transitus Fluvii – “Passing through the river” – used by Heinrich Cornelius Agrippa in 1533:

Hebrew square script was the basis of this script, as can be seen from the English below the chart. There were another couple of occult scripts in Renaissance England, including Celestial and Malachim. Since at the time Hebrew was believed to be the original human language, all of them bear some resemblance to it.

Wilkins’ idea was somewhat more modern, conscious, open and practical. John Wilkins was a seventeenth century polymath and clerk and one of the founders of the Royal Society. He also proposed a decimal metric system which didn’t catch on, although it’s interesting to contemplate that if the English-speaking world had come up with it first, the US, Liberia and the “U”K might all be metric now and maybe France would be holding out with its own system instead. I should mention in passing that in Anglo-Saxon times there was also a decimal-based measuring system. Two advantages of the metric system are that it’s international and that it has a logical structure and organisation to it, and one of its features is that it tends to be used by the scientific community. All of these things were also intended to be true of Wilkins’ Real Character or Philosophical Language. His aim was to design an artificial language which was logically structured and based on concepts which would enable natural philosophers and diplomats to communicate effectively with each other, and he propounded this in detail in his 1668 tome ‘An Essay towards a Real Character and a Philosophical Language’. An example of it can be seen above, in the form of the Lord’s Prayer written in that script. It can be seen to be cursive and appears to break each word down into separate concepts, each represented by a character. It also has its similarities with Arabic script and the associated Pitman system of shorthand.

Real Character is not phonetic. It could therefore be used in principle for any language. Wilkins had found the arbitrary nature of Latin grammar an unnecessary incumbrence to communication and wished to dispose of it as inefficient. It divides concepts into forty genera further divided into differences, themselves divided into species. It can already be seen from the use of the words “genera” and “species” that the work did have some influence on future classification systems such as that of Linnæus. Wilkins also came up with a system where each category could be replaced by a syllable or sound, enabling the language to be pronounced. This made it a priori, like some other constructed languages such as Solresol, unlike a posteriori conlangs like Volapük, Esperanto and Interlingua, which build on already existing foundations, and this approach steepens the start of the learning curve.

Wilkin’s scheme didn’t catch on of course. One reason was that speaking or understanding the spoken language would be very difficult because single-letter (or phoneme) differences could completely change the meaning of a word, and the question also arises, as raised by Jorge Luis Borges, who has written on this matter at length, of whether they can even be a universal classification system of this kind. I’m not sure this last point matters too much for practical purposes, as almost any classification system organised along logical lines is better than none. We have gender in many European languages distinguishing fairly arbitrarily between types of concept, sometimes on its own as with the Danish en øre vs. et øre, meaning “ear” and a unit of currency, and these do appear to descend from an originally logical scheme involving attributes, objects and agents which later became obscured, and there are plenty of other languages with dozens of genders, where this probably helps as well, but in the case of the Universal Character the divisions would possibly be less arbitrary. That said, even for first language speakers of languages which use grammatical gender, research has shown there is a psychological influence. For instance, speakers of languages for which the word for “bridge” is feminine are more likely to see them as elegant and beautiful, whereas those for whom it’s masculine will often describe them as sturdy and protective. Were we to accept Wilkins’ version of Real Character, we might be allowing ourselves to become trapped in one person’s particular way of dividing up the world, and more creative and useful thoughts might be stifled by that.

Nonetheless Wilkins had an influence on today’s world. Roget’s Thesaurus, for example, initially composed in 1805, divides concepts up in a similar way although the schemes are probably very different. There’s also the Dewey Decimal System, used for classifying library materials, and the aforementioned Linnæan taxonomy. Even Esperanto does something rather similar to Real Character in the distinctive way it breaks words down into morphemes in order to reduce the number of words one needs to learn to become fluent.

There is, however, another aspect to Wilkins’ book which has a perhaps surprising connection. One of the sections of Part 2, chapter 4, deals with botany and breaks plants down into species, remarkably similar to how they’re classified today, which has since been verified using DNA sequencing in recent years. This contrasts with Culpepper and Gerard, both of whom have a perfectly valid classification system for them but which isn’t oriented around taxonomy so much as use and flowering and fruiting times – I won’t go into this in any depth as it impinges on one of my other blogs. However, there is another, probably earlier, work, which is somewhat reminiscent of this: the apparently not-very famous (thanks for that everyone!) Voynich Manuscript.

Again, here I run up against my demarcation problem. I said before that I think the supposèd mystery of the Voynich Manuscript isn’t really a mystery at all, but unfortunately it happens to be kind of deeply wedged in the crevasses between my blogs and there isn’t anywhere appropriate to write about it. However, I will say a couple of things about it here, partly because the last time I mentioned it, I got a fair number of hits on the post, but also because it’s relevant to Wilkins. One of the suggestions as to the nature of the writing in that manuscript, referred to as “Voynichese”, is that it is in fact a similar conceptually-based script to Real Character, although inconsistent in its application of characters to concepts and rather amateurishly done, as is the rest of the codex, which in fact is a pretty massive clue as to what it is. It seems weird how something can stare so many people in the face like this and yet not be noticed. Radiocarbon dating sets the materials whereof the MS is composed to the early sixteenth century, so it pre-dates both John Dee’s Enochian and Wilkins’ Real Character although that does make it contemporary with Transitus Fluvii, which was however not a conceptually-based script – a Begriffschrift if you like (!) (PDF alert). There’s a fair bit of botany in the MS too, of course (I mean, there would be). But this is not the interesting bit. Voynichese words are highly regular in structure. Some characters only occur at the beginning of words, some only in the middle and some only at the end of them. The same is, incidentally, true of many other scripts including Devanagari, Hebrew and Arabic, although only Arabic has all three of those features, so it isn’t a clinching argument for it being a conceptually-organised script. No words have fewer than two letters or more than ten.

If you run the above paragraph through Gender Guesser, you might get an interesting result! Or you might not.

Many people believe the MS is a hoax, but this is kind of beside the point. It may also be automatic writing, in which case it might be expected to have similar characteristics to glossolalia. I don’t know if this has been tested. Glossolalia and xenolalia are big topics in themselves which I shall reserve for another post.

To conclude, then, Real Character is a valiant and appealing effort to which John Wilkins seems to have applied his considerably honed intellect, and although as such was doomed to failure, has still left its mark on the world in numerous ways. But was he pre-empted earlier in the century by an anonymous author?

PS: Ironically, without gaming the Gender Guesser algorithm, this text only comes out at 53% male!

A Profusion Of Characters

Posted on 16/07/2021 by zerothly

Yesterday‘s post was on the question of whether the English version of the Latin alphabet would look odd to someone who couldn’t read English. In other words, to see ourselves as others see us. Today’s is about scripts more widely, particularly those of West Afrika and South Asia, but elsewhere as well.

The geographic and population-related distribution of different scripts is very uneven. In terms of number of written languages, Latin is obviously the most widely-used script. I’m guessing that that’s also true in terms of population. Fifteen hundred million people speak the twelve most commonly spoken languages written in Latin script. This is, I’m guessing, followed by Chinese, which as well as being used on its own to write Chinese “dialects” (actually languages), also gets used to write Japanese in part and has historically used to write many other languages. After that is probably Arabic script, which is actually used in China too, followed by Cyrillic, at least in terms of number of languages, and then probably Devanagari, the script used to write Hindi. After that, Hebrew square script is used to write several languages, though Hebrew itself dominates there, and historically Greek has been used for Greek, Coptic and Gothic and is also ancestral to Latin and Cyrillic. After that, I’m not aware of scripts which are widely used. Hangul, the Korean script, is also used to write the Austronesian language Cia-Cia, and has also been used for Hokkien in Taiwan. That said, it seems that the majority of remaining scripts, of which there are many, are each associated with a single language.

There’s a lot of politics in script choice, and Arabic in particular, which works excellently for the Arabic language itself and also for other Semitic tongues and those which have borrowed lots of Semitic words, is often applied to languages for which it’s unsuitable. The distinctive feature of Semitic, and possibly other Afro-Asiatic, languages is that they have roots based on three consonants which are then modified grammatically by vowels, and sometimes by other consonants. Hence salaam – peace, islam – surrender, muslim – person who has surrendered, for example. The closely related Hebrew language does the same thing. There is an issue with how Arabic script is read, because studies have shown that readers take longer to read each letter than they do with Latin because the shapes are simple and often distinguished using dots above or below them. The official adoption of Arabic script is often a political statement.

In the former Soviet Union, Cyrillic was used in most places, the exceptions being the Jewish Autonomous Oblast in the Far East of the country and the Baltic and Caucasian states. There seems to have been a deliberate policy to introduce differences in the scripts for neighbouring Turkic-speaking nations in particular in order to prevent communication between them. One which particularly sticks in my mind actually used an ampersand for a particular vowel. Mongolian is also officially written in Cyrillic although it used a number of other scripts including the vertical traditional script and another non-cursive vertical script, and ‘Phags Pa, as is also used to write Tibetan. The fact that Cyrillic is used to write such diverse languages means that in theory it could be used for a huge range of tongues which have never been written in Cyrillic, and this brings me to one particular idée fixée of mine, that the Q-Celtic languages should be written using Cyrillic because like Russian and other Slavic languages they have palatised and non-palatised versions of many of their consonants, and it works better than what is done at the moment for any of them. Unfortunately, if you look at place names in Ireland, for example, written in Cyrillic, they just seem to be transliterations of the English versions of those words.

The Americas have their own systems of writing even though Latin scripts dominate, and they really fall into two categories historically. One is the pre-Columbian writing systems used by the Mayans and Aztecs, with an honourable mention for the quipu system of knotted threads used by the Inca. Runes have also been used by Nordic people in North America in the first Christian millennium. There may be others but I’m not aware of them. These are not used much today, although they are in some very limited situations such as on flags and in commercial logos. For instance, the Mexican flag includes a glyph from the Nahuatl script.

The other category constitutes scripts which were consciously invented to represent languages and include Cree syllabics, used for instance to write Inuktitut, Ojibwe and Cree, and the remarkable Cherokee syllabary invented by Sequoyah, an up until that point illiterate native American, which reminds me of clichéed old Western “WANTED” posters but is a valid script in itself. I assume it looks that way because of the kind of font which was popular at the time.

There appears to be a connection between the Cherokee syllabary and the Vai one, shown above. Vai is spoken in Liberia and Sierra Leone, and these two countries have particular histories which may be relevant. Liberia was set up as the goal of the “Back To Africa” movement, where freed slaves went to Afrika to found a nation. Eleven percent of the population is descended from American slaves and the official language is English. It has an oddly American atmosphere to it, but when the ex-slaves colonised it there were tribes already there and were enslaved by the ex-slaves and not considered citizens of the country. Sierra Leone, next door, was also founded by former slaves, this time loyalists who fought on the British side in the American War of Independence and ended up in Nova Scotia and also some Black Londoners who were resettled. There was also Sengbe Pieh, who was the leader of the only successful slave rebellion on a ship which resulted in a court case in the US and their return to Afrika. Hence both of these countries have a peculiar relationship with the West and the US compared to other Afrikan nations. The Vai syllabary was in use by 1833, and prior to that a number of Cherokee had emigrated to Liberia, one of whom, Austin Curtis, had married into a Vai family and become a chief, so it’s possible that the inspiration is Cherokee. Vai was actually first used by Momolu Duwalu Bukele, but he claimed that he received it in a vision, and this is where things get a bit confusing.

There are many claims that different West Afrikan scripts, of which there are a couple of dozen, were received by divine revelation or a vision, but also there are claims that many of them have been found to have ancient origins long before they were used by their apparent inventors, and it isn’t clear whether the scripts are ancient, invented consciously or received in visions. This makes their origins obscure. It’s also notable that there’s a remarkably large number of them. It’s also probable that there was a strong motivation for indigenous West Afrikan scripts to be promoted in order to refute the idea that Black people were inferior, either by inventing the writing themselves in response to this or as discoveries of ancient scripts to demonstrate that ancient Afrikans south of the Sahara were not illiterate. Having said that, I don’t consider it problematic that a script could appear fully-formed in someone’s mind because the same kind of thing happens to me in other situations, and I don’t think I’m unusual in that respect. Vai script was also alleged to have been secret and Bukele may have invented the dream explanation as a cover story. It was apparently used by Afrikan slaves in Suriname before it was supposèdly invented or revealed to Bukele. Then again, the writing in Suriname is said to be due to spirit possession. The whole thing is very confusing, at least to an outsider. All of this is very interesting, but it means that the scripts need to be considered in their own right rather than in terms of their origin in order not to lead to confusion. That said, there are a number of scripts in England which are said to have resulted from the same phenomenon such as Celestial and Enochian.

One of the most striking scripts is illustrated at the top of this post: Benin-Edo. Edo was the main language of the Benin Empire in what is now southern Nigeria, founded in 1180 CE. I haven’t yet been able to find out much about it but it doesn’t appear to be ancient. It existed by 1999 CE though.

An incomplete list of West Afrikan scripts includes: Yorùbá Holy Script, Bassa, N’Ko, Nisbidi, Mende, Bamun, Kukakui and Shumom. One of the issues with writing these languages in the Latin alphabet is that the systems so far invented for them, which I would guess were invented by Christian missionaries, don’t do justice to their phonology. Although there are several widely-spoken and major exceptions to this tendency, most languages which originated in Afrika south of the Sahara are tonal, and this is often not well-represented in the Latin scripts. Moreover, there are a number of sounds in West Afrikan languages, such as the double-articulated “gb” and “kp” and the prenasalised stops such as “ngk” and “mb” (my representation, not part of the actual spelling) which are contrasted with the actual consonant clusters forming completely different words. That is, there can be an N followed by a D or an “ND” sound, and they’re two different things. Syllables represented by vowels can also be poorly represented. Hence there are a number of factors involved in the use of widely varying scripts in West Afrika. I also wonder whether they constitute an important part of the tribes’ identity, and this brings me to South Asia.

The South Asian scripts are all descended from Brahmi. In Northern India and Nepal, these scripts are often characterised by a horizontal line joining the letters or characters in a word together. This is because they were originally written on palm leaves and the line is a vein in the monocotyledonous leaf with its parallel venation. In South India and elsewhere, and in a few scripts in North India, the characters are discrete, being neither cursive nor joined by the line. South Asian scripts are abugidas. Each consonantal character includes an intrinsic vowel, often schwa but sometimes, as with Bengali, a short O, which has to be specifically shown not to be present with a cancelling sign or a vowel modifying the letter placed to the left, right, above or below the consonant. Some also have conjunct consonants, which mix two or three letters together. Gurumukhi, used to write Punjabi, uses a line but lacks conjunct consonants and is particularly clearly written. Gujurati is one exception to the use of the line in a North Indian language, and uses separate characters without an associated line. In South India the letters are always separated. The scrpts extended well beyond India and were modified. Particularly notable is ‘Phags Pa, used to write Tibetan, which although it’s written left to right horizontally like the others, also tends to pile letters up vertically, and is far from being phonetic. Southeast Asia, with the exception of Vietnam, uses Brahmi-derived scripts as well, including the apparently longest alphabet of all, Khmer, used to write Cambodian with sixty-three letters, although this claim seems to have been rescinded as there are now said to be only thirty-three. Khmer uses a lot of letters with the same sound because they were used in Sanskrit and have fallen together.

The situation in South Asia, particularly India, seems to be that every language deserves its own script, which again I would attribute to some kind of identity politics. Not all South Asian languages have a long literary tradition. For instance, Burushaski, a language isolate spoken in Pakistan, may have had a written form which died out, and is written in both Latin and Arabic. Ol Cemet’ is an alphabet as opposed to an abugida or abjad (Arabic and Hebrew) invented in 1925, and is used to write Santali, a Munda language.

This isn’t intended to be an exhaustive survey of the writing systems of the world so much as a sketch of the situation. It seems to obey some kind of 80:20-like distribution and resembles the distribution of languages, although not in the same areas. Most of the Islamic world uses Arabic, most of the Russian- and Slavic-influenced world Cyrillic, and South Asia has a plethora of scripts of its own. East Asia has a number of Chinese-influenced scripts with the exception of Hangul which is logically organised as opposed to having evolved without conscious influence. West Afrika in particular has a large number of scripts due to a variety of factors. The Americas have had scripts which are now extinct and are now dominated by Latin, but also have some non-Latin scripts which were consciously designed. Finally, most of Europe, Afrika and Oceania use the Latin script, though Arabic and Tifinagh are used in North Afrika and the first was historically used further south. The usual 80:20 type rule can be applied to this, where the majority of languages use a small number of scripts but a fair proportion also have their own, but it’s notable that outside South Asia, which has both a large number of languages and a large number of scripts, often one per language, most scripts are only used by one or two languages, and they do not correspond to areas of great linguistic diversity. It’s also clearly a lot easier for a constructed script to be adopted than a constructed language such as Esperanto.