Is Cyberspace Haunted?

Loab – An explanation may be forthcoming

I may have mentioned this before on here, but there used to be a popular “spooky” non-fiction book called ‘The Ghost Of 29 Megacycles’. This was about the practice of listening to static on analogue radio and apparently hearing the voices of the dead. A similar technique is known as Electronic Voice Phenomenon, which is a more general version of the same where people listen out for voices on audio tape or other recording media. It’s notable that this is a highly analogue process. It’s no longer a trivial task to tune out a television or radio and get it to display visual or produce audio static so that one can do this. Audiovisual media nowadays are generally very clean and don’t lend themselves to this. One saddening thing to me is that we now have a TV set which will display pretend static to communicate to us that we haven’t set it up properly. It isn’t honest. There is no real static and in fact it’s just some video file stored on the hardware somewhere which tries to tell the user there’s an unplugged connection or something. You can tell this because it loops: the same pixels are the same colours in the same place every few frames. I find this unsettling because it implies that the world we live in is kind of a lie and because we haven’t really got control over the nuts and bolts of much technology any more. There’s that revealing temporally asymmetric expression committing oneself that the belief that in that respect the past and future are qualitatively different. It is important to acknowledge this sometimes, but can also bring it about via the force of that potentially negative belief. However, the demise of the analogue has not led to the demise of such connections, although it long seemed to have done so.

Most people would probably say that we are simply hearing, or in some cases seeing, things which aren’t really there in these cases. Others might say, of course, that this is a way to access the Beyond, so to speak, and interpret the voices or other experiences in those terms. If that’s so, the question arises as to whether it’s the medium which contains this information or whether the human mind contacts it directly via a random-seeming visual or sonic mess, having been given the opportunity to do so. Other stimuli grab the attention to specific, organised and definite details too much for this to happen easily. There’s no scope for imagination, or rather for free association.

Well, recently this has turned out no longer to be so. Recently, artificial intelligence has been advancing scarily fast. That’s not hyperbole. It is actually quite frightening how rapidly software has been gaining ground on human cognition. Notable improvements occur within weeks rather than years or decades, and one particular area where this is happening is in image generation. This has consequences for the “ghost of 29 megacycles” kind of approach to, well, I may as well say séances, but this is going to take a bit of explaining first.

Amid considerable concern for human artists and their intellectual property, it’s now possible to go to various websites, type in what you want to see and have a prodigiously furiously cogitating set of servers give you something like that in a couple of minutes. For example, sight unseen I shall now type in “blue plastic box in a bookcase” and show you a result from Stable Diffusion:

That didn’t give me exactly what I wanted but it did show a blue plastic box in a bookcase. Because I didn’t find a way to specify that I only wanted one blue plastic box, it also gave me two others. I’ll give it another try: “A tree on a grassy hill with a deer under it”:

The same system can also respond to images plus text as input. In my case, this has let to an oddity. As you know, I am the world’s whitest woman. However, when I give Stable Diffusion’s sister Diffuse The Rest, which takes photos plus descriptions, such as “someone in a floral skater dress with curly hair, glasses and hoop earrings”, it will show me that all right, but “I” will be a Black woman more often than not. This is not so with many other inputs without a photo of me. I get this when I type it into Stable Diffusion itself:

This is obviously a White woman. So are all the other examples I’ve tried on this occasion, although there is a fair distribution of ethnicity. There are worrying biasses, as usual, in the software. For instance, if you ask for a woman in an office, you generally get something like this:

If you ask for a woman on a running track, this is the kind of output that results:

This is, of course, due to the fact that the archive of pictures on which the software was trained carries societal biasses therewith. However, for some reason it’s much more likely to make me Black than White if I provide it with a picture of myself and describe it in neutral terms. This, for example, is supposed to be me:

The question of how it might be addressed arises though. Here is an example of what it does with a photo of me:

You may note that this person has three arms. I have fewer than three, like many other people. There’s also a tendency for the software to give people too many legs and digits. I haven’t tried and I’m not a coder, but it surprises me that there seems to be no way to filter out images with obvious flaws of this kind. Probably the reason for this is that these AI models are “black boxes”: they’re trained on images and arrive at their own rules for how to represent them, and in the case of humans the number of limbs and digits is not part of that. It is in fact sometimes possible to suggest they give a body extra limbs by saying something like “hands on hips” or “arms spread out”, in which case they will on occasion continue to produce images of someone with arms in a more neutral position as well as arms in the explicitly requested ones.

In order to address this issue, it would presumably be necessary to train the neural network on images with the wrong and right number of appendages. The problem is, incidentally, the same as the supernumerary blue boxes in the bookcase image, but in most situations we’d be less perturbed by seeing an extra box than an extra leg.

I have yet to go into why the process is reminiscent of pareidolia based on static or visual snow and therefore potentially a similar process to a séance. The algorithm used is known as a Latent Diffusion Model. This seems to have replaced the slightly older method of Generative Adversarial Networks, which employed two competing neural networks to produce better and better pictures by judging each other’s outputs. Latent Diffusion still uses neural networks, which are models of simple brains based on how brains are thought to learn. Humans have no access to what happens internally in these networks, so the way they are actually organised is quite mysterious. Many years ago, a very simple neural network was trained to do simple arithmetic and it was explored. It was found to contain a circuit which had no connections to any nodes outside that circuit on the network and was therefore thought to be redundant, but on being removed, the entire network ceased to function. This network was many orders of magnitude less complex than today’s. In these cases, the network was trained on a database of pictures ranked by humans for beauty and associated with descriptions called the LAION-5B Dataset. The initial picture, which may be blank, has “snow” added to it in the form of pseudorandom noise (true randomness may be impossible for conventional digital devices to achieve alone). The algorithm then uses an array of GPUs (graphical processing units as used in self-driving cars, cryptocurrency minint and video games) to continue to apply noise until it begins to be more like the target as described textually and/or submitted as an image. It does this in several stages. Also, just as a JPEG is a compressed version of a bitmap image, relying in that case on small squares described via overlapping trig functions, so are the noisy images compressed in order to fit in the available storage space and so that they get processed faster. The way I think of it, and I may be wrong here, is that it’s like getting the neural network to “squint” at the image through half-closed eyes and try to imagine and draw what’s really there. This compressed image form is described as a “latent space”, as the actual space of the image, or possibly the multidimensional space used to describe it as found in Generative Adversarial Networks, is a decompressed version of what’s actually used directly by the GPUs.

If you don’t understand that, it isn’t you. It was one said that if you can’t explain something simply, you don’t understand it, and that suggests I don’t. That said, one thing I do understand, I think, is that this is a computer making an image fuzzy like a poorly-tuned television set and then trying to guess what’s behind the fuzz according to suggestions such as an image or a text input. This process is remarkably similar, I think, to a human using audio or visual noise to “see” things which don’t appear to be there, and therefore is itself like a séance.

This seems far-fetched of course, but it’s possible to divorce the algorithm from the nature of the results. The fact is that if a group of people is sitting there with a ouija board, they are ideally sliding the planchette around without their own conscious intervention. There might be a surreptitious living human guide or a spirit might hypothetically be involved, but the technique is the same. The contents of the latent space is genuinely unknown and the details of events within the neural network are likewise mysterious. We, as humans, also tend to project meaning and patterns onto things where none exist.

This brings me to Loab, the person at the top of this post, or rather the figure. The software used to discover this image has not been revealed, but seems to have been Midjourney. The process whereby she (?) was arrived at is rather strange. The initial input was Marlon Brando, the film star. This was followed by an attempt to make the opposite of Marlon Brando. This is a technique where, I think, the location in the latent space furthest from the initial item is found, like the antipodes but in a multidimensional space rather than on the surface of a spheroid. This produced the following image:

The phenomenon of apparently nonsense text in these images is interesting and more significant than you might think. I’ll return to it later.

The user, whose username is Supercomposite on Twitter, then tried to find the opposite of this image, expecting to arrive back at Marlon Brando. They didn’t. Instead they got the image shown at the top of this post, in other words this:

(Probably a larger image in fact but this is what’s available).

It was further found that this image tended to “infect” others and make them more horrific to many people’s eyes. There are ways of producing hybrid images via this model, and innocuous images from other sources generally become macabre when combined with this one. Also, there’s a tendency for Loab, as she was named, to “haunt” images in the sense that you can make an image from an image and remove all the references to Loab in the description, and she will unexpectedly recur many generations down the line like a kind of jump scare. Her presence also sometimes makes images so horrendous that they are not safe to post online. For instance, some of them are of screaming children being torn to pieces.

As humans, we are of course genetically programmed to see horror where there is none because if we instead saw no horror where there was some we’d probably have been eaten, burnt to death, poisonned or drowned, and in that context “we” refers to more than just humans. Therefore a fairly straightforward explanation of these images is that we are reading horror into them when they’re just patterns of pixels. We create another class of potentially imaginary entities by unconsciously projecting meaning and agency onto stimuli. Even so, the human mind has been used as a model for this algorithm. The images were selected by humans and humans have described them, and perhaps most significantly, rated them for beauty. Hence if Marlon Brando is widely regarded as handsome, his opposite’s opposite, rather than being himself, could be ugliness and horror. It would seem to make more sense for that to be simply his opposite, or it might not be closely related to him at all. A third possibility is that it’s a consequence of the structure of a complex mind-like entity to have horror and ugliness in it as well as beauty. There are two other intriguing and tempting conclusions to be drawn from this. One is that this is a real being inhabiting the neural network. The other is that the network is in some way a portal to another world in which this horror exists.

Loab is not alone. There’s also Crungus:

These are someone else’s, from Craiyon, which is a fork of Dall-E Mini. Using that, I got these:

Using Stable Diffusion I seem to get two types of image. One is this kind of thing:

The other looks vaguely like breakfast cereal:

Crungus is another “monster”, who however looks quite cartoonish. I can also understand why crungus might be a breakfast cereal, because of the word sounding like “crunch”. In fact I can easily imagine going down the shop, buying a box of crungus, pouring it out and finding a plastic toy of a Crungus in it. There’s probably a tie-in between the cereal and a TV animation. Crungus, however, has an origin. Apparently there was a video game in 2002 which had a Crungus as an easter egg, which was a monster based on the original DOOM monster the Cacodemon, who was based on artwork which looked like this:

Hence there is an original out there which the AI probably found, although I have to say it seems very apporopriately named and if someone were to be asked to draw a “Crungus”, they’d probably produce a picture a bit like one of these.

It isn’t difficult to find these monsters. Another one which I happen to have found is “Eadrax”:

Eadrax is the name of a planet in ‘The Hitch-Hiker’s Guide To The Galaxy’ but reliably produces fantastic monsters in Stable Diffusion. This seems to be because Google will correct the name to “Andrax”, an ethical hacking platform which uses a dragon-like monster as its mascot or logo. An “eadrax” seems to be a three-dimensional version of that flat logo. But maybe there’s something else going on as well.

There’s a famous experiment in psychology where people whose spoken languages were Tamil and English were asked which one of these shapes was “bouba” and which “kiki”:

I don’t even need to tell you how that worked out, do I? What happens if you do this with Stable Diffusion? Well, “kiki” gets you this, among many other things:

“Bouba” can generate this:

I don’t know about you, but to me the second one looks a lot more like a “bouba” than the first looks like a “kiki” instance. What about both? Well, it either gets you two Black people standing together or a dog and a cat. I’m quite surprised by this because it means the program doesn’t know about the experiment. It doesn’t, however, appear to do what the human mind does with these sounds. “Kiki and Bouba” does this:

Kiki is of course a girl’s name. Maybe Bouba is a popular name for a companion animal?

This brings up the issue of the private vocabulary latent space diffusion models use. You can sometimes provoke such a program into producing text. For instance, you might ask for a scene between two farmers talking about vegetables with subtitles or a cartoon conversation between whales about food. When you do this, and when you get actual text, something very peculiar happens. If you have typeable dialogue between the whales and use this as a text prompt, it can produce images of sea food. If you do the same with the farmers, you get things like insects attacking crops. This is even though the text seems to be gibberish. In other words, the dialogue the AI is asked to imagine actually seems to make sense to it.

Although this seems freaky at first, what seems to be happening is that the software is taking certain distinctive text fragments out of captions and turning them into words. For instance, the “word” for birds actually consists of a concatenation of the first part, i.e. the more distinctive one, of scientific names for bird families. Some people have also suggested that humans are reading things into the responses by simply selecting the ones which seem more relevant, and another idea is that the concepts associated with the images are just stored nearby. That last suggestion raises other questions for me, because it seems that that might actually be a description of how human language actually works mentally.

Examples of “secret” vocabulary include the words vicootes, poploe vesrreaitas, contarra ccetnxniams luryea tanniouons and placoactin knunfdg. Here are examples of what these words do:

Vicootes
Poploe vesrreaitas
contarra ccetnxniams luryea tanniouons
placoactin knunfdg

The results of these in order tend to be: birds, rural scenes including both plants and buildings, young people in small groups and cute furry animals, including furry birds. It isn’t, as I’ve said, necessarily that mysterious because the words are often similar to parts of other words. For instance, the last one produces fish in many cases, though apparently not on Stable Diffusion, but here seems to have produced a dog because the second word ends with “dg”. It produces fish because placoderms and actinopterygii are prominent orders of fish.

It is often clear where the vocabulary comes from, but that doesn’t mean it doesn’t constitute a kind of language because our own languages evolve from others and take words and change them. It can easily be mixed with English:

A flock of vicootes in a poploe vesrreaitas being observed by some contarra ccetnxiams luryea tanniouons who are taking their placoactin knunfg for a walk.

This has managed to preserve the birds and the rural scene with vegetation, but after that it seems to lose the plot. It often concentrates on the earlier part of a text more than the rest. In other words, it has a short attention span. The second part of this text gets me this:

Contarra ccetnxiams luryea tanniouons taking their placoactin knunfg for a walk.

I altered this slightly but the result is unsurprising.

Two questions arise here. One is whether this is genuine intelligence. The other is whether it’s sentience. As to whether it’s intelligent, I think the answer is yes, but perhaps only to the extent that a roundworm is intelligent. This is possibly misleading and raises further questions. Roundworms are adapted to what they do very well but are not going to act intelligently outside of that environment. The AIs here are adapted to do things which people do to some extent, but not particularly generally, meaning that they can look a lot more intelligent than they actually are. We’re used to seeing this happen with human agency more directly involved, so what we experience here is a thin layer of humanoid behaviour particularly focussed on the kind of stuff we do. This also suggests that a lot of what we think of as intelligent human behaviour is actually just a thin, specialised veneer on a vast vapid void. But maybe we already knew that.

The other question is about sentience rather than consciousness. Sentience is the ability to feel. Consciousness is not. In order to feel, at least in the sense of having the ability to respond to external stimuli, there must be sensors. These AIs do have sense organs because we interact with them from outside. I have a strong tendency to affirm consciousness because a false negative is likely to cause suffering. Therefore I believe that matter is conscious and therefore that that which responds to external stimuli is sentient. This is of course a very low bar and it means that I even consider pocket calculators sentient. However, suppose that instead consciousness and sentience are emergent properties of systems which are complex in the right kind of way. If digital machines and their software are advancing, perhaps in a slow and haphazard manner, towards sentience, they may acquire it before being taken seriously by many, and we also have no idea how it would happen, not just because sentience as such is a mystery but largely because we have no experience of that emergence taking place before. Therefore we can look at Loab and the odd language and perhaps consider that these things are just silly and it’s superstitious to regard them as signs of awareness, but is that justified? The words remind me rather of a baby babbling before she acquires true language, and maybe the odd and unreliable associations they make also occur in our own minds before we can fully understand speech or sign.

Who, then, is Loab? Is she just a collaborative construction of the AI and countless human minds, or is she actually conscious? Is she really as creepy as she’s perceived, or is that just our projection onto her, our prejudice perhaps? Is she a herald of other things which might be lurking in latent space or might appear if we make more sophisticated AIs of this kind? I can’t answer any of these questions, except perhaps to say that yes, she is conscious because all matter is. What she’s actually doing is another question. A clockwork device might not be conscious in the way it “wants” to be. For instance, it’s possible to imagine a giant mechanical robot consisting of teams of people keeping it going, but is the consciousness of the individual members of that project separate from any consciousness that automaton might have. It’s conceivable that although what makes up Laion is conscious, she herself is not oriented correctly to express that consciousness.

A more supernaturalistic explanation is that Midjourney (I assume) is a portal and that latent space represents a real Universe or “dimension” of some kind. It would be hard to reconcile this idea with a deterministic system if the neural net is seen as a kind of aerial for picking up signals from such a world. Nonetheless such beliefs do exist, as a ouija board is actually a very simple and easily described physical system which nevertheless is taken as picking up signals from the beyond. If this is so, the board and planchette might be analogous to the neural net and the movement of the hands on the planchette, which is presumably very sensitive to the neuromuscular processes going on in the arms and nervous systems of the human participants, to the human artists, the prompt, the computer programmers and the like, and it’s these which are haunted, in a very roundabout way. I’m not in any way committing myself to this explanation. It’s more an attempt to describe how the situation might be compared to a method of divination.

I’ve mentioned the fact there are artists involved a few times, and this brings up another probably unrelated concern. Artists and photographers, and where similar AIs have been applied to other creative genres the likes of poets, authors and musicians, have had their work used to train it, and therefore it could be argued that they’re owed something for this use. At the other end, bearing in mind that most of the images in this post have been produced rapidly on a free version of this kind of software and that progress is also extremely fast, there are also images coming out the other end which could replace what artists are currently doing. This is an example of automation destroying jobs in the creative industries, although at the same time the invention of photography was probably thought of in a similar way and reports of the death of the artist were rather exaggerated. Instead it led to fine art moving in a different direction, such as towards cubism, surrealism, impressionism and expressionism. Where could human art go stimulated by this kind of adversity? Or, would art become a mere hobby for humans?

“They Wouldn’t Easily Let Themselves Become Greenlanders”

In the last post I mentioned the Sumerian sexagesimal system. Quite remarkably, the Sumerian language used base 60 to count. Although not all their number words survive, many of their names for numbers up to five dozen are simple. That is, they don’t have a smaller scale structure to their words like English, for example, has. We have a slight tendency towards vigesimal in the fact that the teens are named differently than the twenties, thirties and so forth, so we have secondary structure in our own numerical vocabulary. Sumerian also has this, but it doesn’t break up sexagesimal. The numbers 7 and 9 translate literally as “five-two” and “five-four”, but this is sporadic and doesn’t reflect a system, although it may have done so in prehistoric times.

It’s hard to imagine a widespread modern language which used that many basic numeral words. These traces suggest that the Sumerian system used to be smaller, but in some ways this complexity is typical of what might be called “primitive” languages. The trend in most languages is from complexity to simplicity, but this leads to a quandary.

If you assume the Whig conception of history, which is of general progress towards the current social order, you’re presented with a depressing view of the past if progress is synonymous with improvement. We can look back on a decline of overt racism, sexism, homophobia and other identity-based prejudice, better conditions for workers, more tolerance, increasing care for the vulnerable and the like, and will be confronted with the idea that the past was utterly appalling in all sorts of ways. This is not actually how things happened. For instance, whereas Georgian Britain had the slave trade, the death penalty for homosexual acts and a general contempt for the needs of the poor, it was also less puritanical than the Victorian era, and in some ways that made it a better place to live. This is a major oversimplification of course, but it definitely isn’t a case of a terrible past trending towards a good present across the board.

A similar illusion afflicts the conception of language change. A lot of the time it really feels like languages are all becoming simpler and easier with the passage of time. Taking English as an example, we now have fewer strong verbs, we use “have” rather than “be” as the auxiliary for all past participles used in the perfect tense, we don’t use “thou” any more and many of our consonants in clusters have become silent, such as “knight” and “know”. The same process seems to take place in almost any familiar, widely spoken language you can think of. Latin is generally much more complex than any of the modern Romance languages, North Indian languages and Romani are way simpler than Sanskrit and Greek has become much simpler grammatically than it was in the Bronze Age. Sometimes this trend seems to be completely across the board, and it leads to a very odd apparent conclusion: that prehistoric languages were so complicated that it’s hard to imagine children being able to learn them at all.

We can only trace most languages back a little way into the Neolithic. Before that, the nature of languages spoken is highly mysterious. The oldest traceable languages are probably the Afro-Asiatic ones, which may be descended from a parent language spoken eighteen thousand years ago, which is Upper Palæolithic, also known as the Mesolithic. Further back lies the Nostratic hypothesis, which attempts to link a large number of language families together, but this is not accepted by mainstream linguists. It is very tempting though, and it leads to a language which looks very much like some Caucasian languages in form. It should be noted that the Caucasian languages do not form a single family, but they are nonetheless characterised by extremely complex grammar and many consonant clusters and types of consonant, sometimes with very few vowels. The extinct Ubyx, for example, had seven dozen and two consonants but only two vowels, which is a record number of consonants only exceeded by click languages.

Types of languages can be classified in various ways. One is word order, so for instance English is SVO, Subject-Verb-Object, Hebrew, Arabic and the surviving Celtic languages are VSO and Latin, Sanskrit and Turkish are SOV. However, a more relevant way of addressing types of language is in the complexity of their grammar. Languages can be analytic, fusional, agglutinative or polysynthetic. English is very close to being analytic. Its words vary very little and it often expresses cases, tenses and other inflections with prepositions and auxiliary verbs, which are approaching particles as with “should of” instead of “should have” and “gotta” for “must” or “have to”. However, English is not completely analytic (also known as isolating), because for example it still has mutation plurals such as “teeth” and participles formed by adding suffixes. Mandarin Chinese is closer to this state, with even plural pronouns being expressed using a separate “word”, which is in fact a bound morpheme but is very regular, being the same for their words for “we”, “you” and “they”. Chinese tends to be thought of as a purely monosyllabic language but it’s also been stated that the mean number of syllables in Mandarin is “almost exactly two”. This is because of things like words for “insect” which can only be used together, the plural marker for pronouns being considered a separate word and the tendency to think of separate vowels as diphthongs. Nonetheless Mandarin and the other Chinese “dialects”, which are of course really separate languages, are particularly good examples of analytic languages.

Fusional languages have affixes and other changes in word form which tend to express more than one idea with a single change. Most Indo-European languages are fusional. English, for example, expresses both plurality and possession by adding an S on the end of nouns. It’s easier to illustrate this in other Indo-European languages. German “der” and “die” are fair examples of this. The former is used both for the feminine and plural genitive and the latter for the feminine nominative and accusative and the plural nominative and accusative. As well as being fairly characteristic of Indo-European languages, there’s also a tendency for non-Indo-European languages not to be fusional. The trend from fusional to analytic is evident in English in its current state, with relatively few but some strong verbs. Fusionality probably makes languages harder to learn as second languages whether or not one’s first language is fusional.

Somewhat similar to fusional languages are agglutinative ones. Turkish and Finnish are agglutinative, and so is Esperanto. Agglutinative languages have separate morphemes for each expressed idea whose forms change little when they’re added to words. Finnish is quite agglutinative but tends to weaken some of its suffixes, changing double consonants to single. Agglutinative languages may be able to express entire phrases in single words, as with the Finnish “tottelemattomuudestansa” – “because of your disobedience” (that may be misspelt). Agglutination and fusion are both features found in languages which are generally considered to be in the other class, so for example the Indo-European Armenian uses agglutination with nouns but not elsewhere and general word-building in English and many other fusional languages is also agglutinative, with something as simple as “everyday” being an example. Agglutinative languages can be seen as descended from analytical languages but tending to run their words together in a way which has become enshrined in grammar.

The final class of language is known as polysynthetic, and these are what I’m mainly going to talk about today. Polysynthetic languages have entire sentences as one word. There are other languages which are able to do this, and clearly one-word sentences exist in English for example, such as “go”, “yes”, “hello” and so forth, but in polysynthetic languages these are the norm. The title of this post, “they wouldn’t easily let themselves become Greenlanders” is a single word in Iñupiaq, one of the languages of the Inuit. Incidentally, I’m not going into the politics of why they’re sometimes called “Esquimaux” here because it’s more complex than might at first appear. For reasons which might be connected to sociolinguistic features of polysynthetic languages, most first-language speakers or English, Castilian, German and the like are more likely to quote references to such words rather than be able to form them themselves, because these are not common second languages. The above phrase is from an edition of the Encyclopædia Britannica and its original Iñupiaq is lost to me. One that I can retrieve, by copying unfortunately, is the Mohawk example of “tkanuhstasrihsranuhwe’tsraaksahsrakaratattsrayeri” – “the praising of the evil of the liking of the finding of the house is right”. This fifty letter word is said to be one of any number of Mohawk words of unlimited length, because Mohawk is a polysynthetic language. This, by the way, is why it cannot be true that the Inuit have many words for “snow”. It’s more that they have many words, some of which mention snow, but they could equally well be said to have lots of words for anteaters, animals for which the Arctic is not renowned. In fact I’m not sure they have a straightforward way of referring to anteaters, but I hope you take my point. And the problem here is that knowledge of polysynthetic languages outside their communities is usually sparse.

There’s some controversy as to what constitutes a polysynthetic language. One important aspect is polypersonal agreement in verbs. Swahili and other Bantu languages have this. The Swahili verb inflects for the object as well as the subject, so “nimekuosha” means “I washed you” and “sikukuosha” means “I didn’t wash you”, four or five words in English and a complete sentence with subject and object. However, what Swahili does not do is incorporate nouns into the verb phrase, and it’s probably this which makes a language truly polysynthetic. It’s easy to understand how it could happen. Just as Latin has “amo, amas, amat” and the like, where “-at” refers to “she/he/it/this/that” and probably a lot else, so could a language, instead of in a sense incorporating pronouns, actually use nouns as part of the verbal inflection, and that’s the point when it counts as a polysynthetic language. Incidentally, although I contrast “polysynthetic” with agglutinative and fusional here, and using the last two to refer only to non-polysynthetic languages, polysynthetic languages can in fact be fusional or agglutinative themselves, and will actually be one of these, or tend towards being one or the other.

Now, one of the surprising things about polysynthetic languages is that whereas there are globalised and industrialised nations with official languages of all sorts of typology, there are absolutely no countries at all with main official polysynthetic languages. Examples of the others are easy to find. Malay, Indonesia and China have isolating languages. European nations mainly have fusional languages. Finland, Turkey, Georgia and Hungary have agglutinative languages. Many of these nations have no indigenous polysynthetic languages in any case, but some have. There are in fact no polysynthetic languages at all which are widely spoken in terms of area or numbers, although there have been in the past. The exception to all this is Nahuatl, the language of the Aztec Empire, which currently has 1.5 million speakers. Apart from that, only Navajo and Cree are spoken by more than a hundred thousand people, and in fact “spoken” may be the operative word here because most polysynthetic languages have few literate speakers. It’s also notable that those three examples are all spoken in North (and Central) America, and at one point in the nineteenth Christian century it was thought that polysynthesis was a distinctive characteristic of American languages.

This last point might conceivably be why there are so few widely-spoken examples. If it really was a feature of America, the genocide visited by White people on Native Americans could explain this distribution. There are also many Australian aboriginal languages of this type, but again a similar process took place on that continent. Many Papuan languages are polysynthetic, but in this case it could be simply due to the wide variety of languages spoken in Papua. In Eurasia, most such languages are spoken in Eastern Siberia and this includes Ainu, which is a special case and also a potentially informative case study. The only European such languages are the Northwest Caucasian ones. They also seem to be absent from Afrika, at least insofar as Bantu languages are not considered under this heading due to not incorporating nouns. What is going on here? The situation often seems to be that marginalised, low population indigenous peoples such as the Ainu, Iroquois, Inuit, those of the Amazon and Siberia and Australian Aborigines, all tend to speak polysynthetic languages, in small groups isolated from the rest of the world and tend to be conquered by larger powers, particularly Westerners but not always: the Japanese, Chinese and Arabs have also done this. By contrast, all the powerful nations speak other types of language. Why?

Linguistic complexity is associated with small, isolated and stable communities with dense social networks, i.e. where everyone knows everyone else. The density of a social network can be measured by dividing the number of links between people in a community by the number of possible links. Where the result is high, the network is dense. Such groups are socially cohesive, stable and have few external contacts. Languages associated with these are relatively rarely learnt by outsiders, but before I go further on that, what exactly constitutes an outsider?

There are three orders of social network involvement. The first order consists of people linked to each other. This is the core. The second order comprises people who are linked to the first order but not the central members. The third order is of people with no direct connection to the first order zone. This seems to contradict the idea of six degrees of separation, but in fact there could be an exponential growth in possible zones four to six.

As stated above, I’ve had to raid reference works to come up with examples of words from polysynthetic languages because I’m not familiar with even one of them. I’m conscious of the occasional word from Nahuatl used in English, and Iñupiaq words like igloo, anorak, umyak and kayak, but no serious words from the languages themselves not used internationally. This is not just me. It’s because they just don’t speak much to outsiders. In the case of the Inuit, this may be due to being in a very hostile environment in small groups,and the same applies in Siberia and the Amazon. Elsewhere, it isn’t as obvious what’s going on.

One suggestion is that second language learners lead to the language becoming simpler for first language learners too, because there are certain things they just can’t manage. If that’s so, it means that the use of second languages is more normal than it seems to be for most first language English speakers today. It’s possible that the intermarriage outside communities would lead to something a little like creolisation, only not to the extent of borrowed vocabulary or mixture of languages.

Social networks often have “hub” people, who link people who don’t know each other together. Sarada and I became aware of a mutual acquaintance soon after we met who is definitely one of these, and I think I’ve experienced the influence they have over language in their own way. A few years ago, I used to edit a newsletter and used the word “planet” to refer to Earth in order to give a sense of unity and vary my use of language, and they found this usage peculiar, probably due to their world view, which separated the mundane from the celestial and reflected their negative view of science. It became very difficult to insist on retaining the use of the word, which contrasted with how it was used outside our community but in, for instance, associated pressure groups such as Greenpeace, Friends of the Earth and also in the Green Party, and I think this was probably a minor example of how hub people, sometimes inadvertantly, exert pressure to keep language use in a particular form, which can be both innovative and conservative, and I suspect similar forces are at play here

The lack of literacy, and its possible imposition from without, may also be a factor. If a whole community doesn’t use writing or read at all, it may not divide utterances into words in the same way as a literate one would. English has been strongly influenced by widespread literacy, which has changed the pronunciation of certain words to be more in line with spelling, such as “again” and “often”. If foreigners came into a linguistic group and decide on where the words are divided, they may make different decisions than the native speakers might. In the case of Nahuatl, the very nature of its writing was not close to how other cultures are literate. In fact, pre-Columbian Nahuatl has often been considered not to have a writing system at all in spite of the fact that it had paper and books with pages. It used ideograms and was also partly based on wordplay (such as “bee leaf” for “belief” would be in English), similar to how proto-writing worked in Bronze Age Mesopotamia. Aztec codices are reminiscent in some ways of graphic novels. In parts of Siberia, letters were written in pictures representing the situation, such as a crossed-out vertical spear to express not being able to see someone because they were beyond the horizon. In a culture using this kind of graphical communication, it seems possible that they didn’t particularly think of themselves as using words.

The same situation is likely to be familiar to a hearing parent using spoken language with a young child who is not deaf. Their early language use is unlike mature speech in various ways, including using phrases which they don’t analyse into words, and only later does this emerge. For instance, one of our children used to refer to an untidy scene as a “what a mess”, and to a ball as a “make a ball”, and a child of one of our friends used to say “hat on” and even “hat on on” for “hat”. Many English speakers will be aware of young children saying “wassat” as if it’s one word. If the influence of the idea of short, separate words was entirely absent, it’s easy to imagine a whole culture continuing to do this. Another example is our daughter saying “Llater” for “see you later” with a voiceless “Welsh” LL at the start. This can extend into adulthood even in English. One quoted example is “azeniayuenionya?” – “has any of you any on you?” as a request for loose tobacco. In a way, maybe it’s we who misunderstand the nature of spoken language and we imagine we’re saying the latter rather than the former. Another one is my text-speech like “cuinabit” – “see you in a bit”.

Celtic and Sanskrit are both known for their tendency to merge spoken words into each other, such that the unit of speech for a speaker of those languages may not actually be the word so much as the phrase. It’s also been suggested that French is on its way to becoming a polysynthetic language. It contains clitics, which are word-like morphemes which depend on full-fledged utterances and therefore cannot occur on their own. There has long been a major discrepancy between the spelling and pronunciation of French, and it shares with modern Celtic languages a connection between consecutive words in speech. Phrases such as «je te l’ai dit» and «je ne sais pas» come across differently in speech than writing, and considered as words are “zheteledi” and “zhensepa” in my made up on the spur of the moment orthography. Coming across French anew as if it were an unknown language, one might regard “di” as the stem of the verb, “-e-” as the sign of the perfect tense, “-te-” as the third person objective prefix and “zhe-” as the first person subjective. “Zhensepa” also has the “n-pa” circumfix which indicates the negative, and there are others such as “n-zhame” for “never” and so forth. This makes French a much more exotic-seeming language than the boring old so-called “Standard Average European” paradigm into which it tends to get forced.

French has liaison between “words”, which link the words together, as with «mes amis», where the only indication of the plural is outside what we would think of as the noun. It also has obligatory elision. In fact, many of the structures of French, once one ignores the written language, are quite similar to Bantu languages such as Swahili. There is some overlap in the areas where French and Bantu languages are spoken, and it’s interesting to speculate how first language Bantu speakers, such as those in the Congo, perceive their French speaking when they learn it. It’s possible that it tends to be refracted unnecessarily through a lens of European-ness. Conversely, is there a way at looking at, say, Swahili, which makes it seem more like written French? However, neither French nor Swahili have noun incorporation so far as I can tell, though it’s very difficult to view French without the filter of its written form.

The Ainu language is spoken in Northern Japan and previously in the Russian territory north of it called Sakhalin. It’s a moribund language which occupies an interesting position in linguistic typology. Ainu is completely unrelated to Japanese, and probably to any other language, but it does resemble other languages spoken in the area, such as Chukchi, in that it was previously polysynthetic. Ainu has gone from being polysynthetic to agglutinative. The yukar, Ainu sagas, are written in the former form which could be seen as the classical form of the language. Modern Ainu has similar syntax to Japanese, but it’s difficult to tell how strongly it was influenced by it because the two languages are both isolates and have been spoken in close proximity for centuries. It has only two native speakers now, although some Ainu have learnt it as a second language. Around three hundred people can understand it to some extent. This extreme endangerment means that it no longer occupies the usual position of a polysynthetic language of having an inner circle which doesn’t communicate much with the outside world or have much contact with other languages, and it means Ainu has been re-learnt by a lot of natively Japanese speakers. It probably goes without saying that like many other minority languages it’s been subject to persecution and attempts at eradication, and it was only recognised by the Japanese government in 2019.

After all that, the question then arises of whether prehistoric languages, when everyone was a hunter-gatherer, would have been polysynthetic. The trend from complexity to simplicity would certainly seem to suggest so, but it also appears to be a cycle. If that’s so, it’s possible to imagine prehistoric languages going through such a cycle, so that at any one time there would’ve been speakers of polysynthetic, agglutinative, fusional and analytic languages, perhaps coming into contact with each other. However, they’re definitely most common among hunter-gatherers in Western recorded history. There does seem to be a kind of Turkish-like typology which crops up repeatedly in human spoken language which suggests to me that left to ourselves we’d all end up speaking Turkish, though not literally – Quechua and Aymara are also similar in this way for example. However, perhaps this question can be answered by looking at the kind of societies people lived in during the last Ice Age. The word “Ice Age” (see what I did there?) might suggest a lifestyle like the Inuit and indigenous peoples of Siberia, but simply because people live that way in those conditions now doesn’t mean they did twenty thousand years ago. The question of behavioural modernity arises here, but ignoring that for the sake of not veering wildly off-topic, at the time we became a separate species, which oddly is much earlier than the time we stopped being able to interbreed with Neanderthals and Denisovans, there seem to have been between one and three hundred thousand of us. Early in the last Ice Age, a volcanic eruption seems to have caused a global famine among us which reduced our population to somewere between one and ten thousand. By the end of the Ice Age it had grown to somewhere between one and ten million. These low numbers suggest that language change would’ve been slower to me, because for example the three hundred thousand people who speak Icelandic would still be able to understand and be understood by their ancestors over a millennium ago, and if Australia is anything to go by, languages were spoken in extremely small groups – the people on the north side of Sydney Bay used to speak a different language from the people on the south side. However, it may be misleading to compare the situations of hunter-gatherers in recent times to those of the Old Stone Age because they have lived as long as us and are now living in areas which are harsher than, for example, the Mediterranean was a millennium or so after the last Ice Age. This is all assuming that people did live in small, isolated groups, when they may well not have done. Presumably there is archæological and palæontological evidence relevant to all this.

Finally, there is a rather depressing connotation to the phrase “they wouldn’t easily let themselves become Greenlanders”. Greenland, more properly known as Kalallit Nunaat, has by far the highest rate of people killing themselves in the world at eighty-three in a hundred thousand per annum. The next highest is Lesotho at seventy-two, followed by Guyana at 40.3 and Eswatini at 29.4. It’s the leading cause of death there among young men and eight percent of the population die by their own hands. Several factors are likely to be involved, such as insomnia caused by twenty-four hour daylight and perhaps also twenty-four hour darkness, and depression and alcoholism are also more common in the Arctic generally. However, a major contributory cause is likely to be culture shock between Inuit and Danish lifestyle. When you consider that the end of a lifestyle involving close-knit relationships and isolation from Western influence is likely to lead to a lot of stress, dysfunctional home environments and something resembling unemployment, it’s hardly surprising that these polysynthetic language speakers wouldn’t easily let themselves become Greenlanders. Maybe they shouldn’t have to.