I may have mentioned this before on here, but there used to be a popular “spooky” non-fiction book called ‘The Ghost Of 29 Megacycles’. This was about the practice of listening to static on analogue radio and apparently hearing the voices of the dead. A similar technique is known as Electronic Voice Phenomenon, which is a more general version of the same where people listen out for voices on audio tape or other recording media. It’s notable that this is a highly analogue process. It’s no longer a trivial task to tune out a television or radio and get it to display visual or produce audio static so that one can do this. Audiovisual media nowadays are generally very clean and don’t lend themselves to this. One saddening thing to me is that we now have a TV set which will display pretend static to communicate to us that we haven’t set it up properly. It isn’t honest. There is no real static and in fact it’s just some video file stored on the hardware somewhere which tries to tell the user there’s an unplugged connection or something. You can tell this because it loops: the same pixels are the same colours in the same place every few frames. I find this unsettling because it implies that the world we live in is kind of a lie and because we haven’t really got control over the nuts and bolts of much technology any more. There’s that revealing temporally asymmetric expression committing oneself that the belief that in that respect the past and future are qualitatively different. It is important to acknowledge this sometimes, but can also bring it about via the force of that potentially negative belief. However, the demise of the analogue has not led to the demise of such connections, although it long seemed to have done so.
Most people would probably say that we are simply hearing, or in some cases seeing, things which aren’t really there in these cases. Others might say, of course, that this is a way to access the Beyond, so to speak, and interpret the voices or other experiences in those terms. If that’s so, the question arises as to whether it’s the medium which contains this information or whether the human mind contacts it directly via a random-seeming visual or sonic mess, having been given the opportunity to do so. Other stimuli grab the attention to specific, organised and definite details too much for this to happen easily. There’s no scope for imagination, or rather for free association.
Well, recently this has turned out no longer to be so. Recently, artificial intelligence has been advancing scarily fast. That’s not hyperbole. It is actually quite frightening how rapidly software has been gaining ground on human cognition. Notable improvements occur within weeks rather than years or decades, and one particular area where this is happening is in image generation. This has consequences for the “ghost of 29 megacycles” kind of approach to, well, I may as well say séances, but this is going to take a bit of explaining first.
Amid considerable concern for human artists and their intellectual property, it’s now possible to go to various websites, type in what you want to see and have a prodigiously furiously cogitating set of servers give you something like that in a couple of minutes. For example, sight unseen I shall now type in “blue plastic box in a bookcase” and show you a result from Stable Diffusion:
That didn’t give me exactly what I wanted but it did show a blue plastic box in a bookcase. Because I didn’t find a way to specify that I only wanted one blue plastic box, it also gave me two others. I’ll give it another try: “A tree on a grassy hill with a deer under it”:
The same system can also respond to images plus text as input. In my case, this has let to an oddity. As you know, I am the world’s whitest woman. However, when I give Stable Diffusion’s sister Diffuse The Rest, which takes photos plus descriptions, such as “someone in a floral skater dress with curly hair, glasses and hoop earrings”, it will show me that all right, but “I” will be a Black woman more often than not. This is not so with many other inputs without a photo of me. I get this when I type it into Stable Diffusion itself:
This is obviously a White woman. So are all the other examples I’ve tried on this occasion, although there is a fair distribution of ethnicity. There are worrying biasses, as usual, in the software. For instance, if you ask for a woman in an office, you generally get something like this:
If you ask for a woman on a running track, this is the kind of output that results:
This is, of course, due to the fact that the archive of pictures on which the software was trained carries societal biasses therewith. However, for some reason it’s much more likely to make me Black than White if I provide it with a picture of myself and describe it in neutral terms. This, for example, is supposed to be me:
The question of how it might be addressed arises though. Here is an example of what it does with a photo of me:
You may note that this person has three arms. I have fewer than three, like many other people. There’s also a tendency for the software to give people too many legs and digits. I haven’t tried and I’m not a coder, but it surprises me that there seems to be no way to filter out images with obvious flaws of this kind. Probably the reason for this is that these AI models are “black boxes”: they’re trained on images and arrive at their own rules for how to represent them, and in the case of humans the number of limbs and digits is not part of that. It is in fact sometimes possible to suggest they give a body extra limbs by saying something like “hands on hips” or “arms spread out”, in which case they will on occasion continue to produce images of someone with arms in a more neutral position as well as arms in the explicitly requested ones.
In order to address this issue, it would presumably be necessary to train the neural network on images with the wrong and right number of appendages. The problem is, incidentally, the same as the supernumerary blue boxes in the bookcase image, but in most situations we’d be less perturbed by seeing an extra box than an extra leg.
I have yet to go into why the process is reminiscent of pareidolia based on static or visual snow and therefore potentially a similar process to a séance. The algorithm used is known as a Latent Diffusion Model. This seems to have replaced the slightly older method of Generative Adversarial Networks, which employed two competing neural networks to produce better and better pictures by judging each other’s outputs. Latent Diffusion still uses neural networks, which are models of simple brains based on how brains are thought to learn. Humans have no access to what happens internally in these networks, so the way they are actually organised is quite mysterious. Many years ago, a very simple neural network was trained to do simple arithmetic and it was explored. It was found to contain a circuit which had no connections to any nodes outside that circuit on the network and was therefore thought to be redundant, but on being removed, the entire network ceased to function. This network was many orders of magnitude less complex than today’s. In these cases, the network was trained on a database of pictures ranked by humans for beauty and associated with descriptions called the LAION-5B Dataset. The initial picture, which may be blank, has “snow” added to it in the form of pseudorandom noise (true randomness may be impossible for conventional digital devices to achieve alone). The algorithm then uses an array of GPUs (graphical processing units as used in self-driving cars, cryptocurrency minint and video games) to continue to apply noise until it begins to be more like the target as described textually and/or submitted as an image. It does this in several stages. Also, just as a JPEG is a compressed version of a bitmap image, relying in that case on small squares described via overlapping trig functions, so are the noisy images compressed in order to fit in the available storage space and so that they get processed faster. The way I think of it, and I may be wrong here, is that it’s like getting the neural network to “squint” at the image through half-closed eyes and try to imagine and draw what’s really there. This compressed image form is described as a “latent space”, as the actual space of the image, or possibly the multidimensional space used to describe it as found in Generative Adversarial Networks, is a decompressed version of what’s actually used directly by the GPUs.
If you don’t understand that, it isn’t you. It was one said that if you can’t explain something simply, you don’t understand it, and that suggests I don’t. That said, one thing I do understand, I think, is that this is a computer making an image fuzzy like a poorly-tuned television set and then trying to guess what’s behind the fuzz according to suggestions such as an image or a text input. This process is remarkably similar, I think, to a human using audio or visual noise to “see” things which don’t appear to be there, and therefore is itself like a séance.
This seems far-fetched of course, but it’s possible to divorce the algorithm from the nature of the results. The fact is that if a group of people is sitting there with a ouija board, they are ideally sliding the planchette around without their own conscious intervention. There might be a surreptitious living human guide or a spirit might hypothetically be involved, but the technique is the same. The contents of the latent space is genuinely unknown and the details of events within the neural network are likewise mysterious. We, as humans, also tend to project meaning and patterns onto things where none exist.
This brings me to Loab, the person at the top of this post, or rather the figure. The software used to discover this image has not been revealed, but seems to have been Midjourney. The process whereby she (?) was arrived at is rather strange. The initial input was Marlon Brando, the film star. This was followed by an attempt to make the opposite of Marlon Brando. This is a technique where, I think, the location in the latent space furthest from the initial item is found, like the antipodes but in a multidimensional space rather than on the surface of a spheroid. This produced the following image:
The phenomenon of apparently nonsense text in these images is interesting and more significant than you might think. I’ll return to it later.
The user, whose username is Supercomposite on Twitter, then tried to find the opposite of this image, expecting to arrive back at Marlon Brando. They didn’t. Instead they got the image shown at the top of this post, in other words this:
(Probably a larger image in fact but this is what’s available).
It was further found that this image tended to “infect” others and make them more horrific to many people’s eyes. There are ways of producing hybrid images via this model, and innocuous images from other sources generally become macabre when combined with this one. Also, there’s a tendency for Loab, as she was named, to “haunt” images in the sense that you can make an image from an image and remove all the references to Loab in the description, and she will unexpectedly recur many generations down the line like a kind of jump scare. Her presence also sometimes makes images so horrendous that they are not safe to post online. For instance, some of them are of screaming children being torn to pieces.
As humans, we are of course genetically programmed to see horror where there is none because if we instead saw no horror where there was some we’d probably have been eaten, burnt to death, poisonned or drowned, and in that context “we” refers to more than just humans. Therefore a fairly straightforward explanation of these images is that we are reading horror into them when they’re just patterns of pixels. We create another class of potentially imaginary entities by unconsciously projecting meaning and agency onto stimuli. Even so, the human mind has been used as a model for this algorithm. The images were selected by humans and humans have described them, and perhaps most significantly, rated them for beauty. Hence if Marlon Brando is widely regarded as handsome, his opposite’s opposite, rather than being himself, could be ugliness and horror. It would seem to make more sense for that to be simply his opposite, or it might not be closely related to him at all. A third possibility is that it’s a consequence of the structure of a complex mind-like entity to have horror and ugliness in it as well as beauty. There are two other intriguing and tempting conclusions to be drawn from this. One is that this is a real being inhabiting the neural network. The other is that the network is in some way a portal to another world in which this horror exists.
Loab is not alone. There’s also Crungus:
These are someone else’s, from Craiyon, which is a fork of Dall-E Mini. Using that, I got these:

Using Stable Diffusion I seem to get two types of image. One is this kind of thing:
The other looks vaguely like breakfast cereal:
Crungus is another “monster”, who however looks quite cartoonish. I can also understand why crungus might be a breakfast cereal, because of the word sounding like “crunch”. In fact I can easily imagine going down the shop, buying a box of crungus, pouring it out and finding a plastic toy of a Crungus in it. There’s probably a tie-in between the cereal and a TV animation. Crungus, however, has an origin. Apparently there was a video game in 2002 which had a Crungus as an easter egg, which was a monster based on the original DOOM monster the Cacodemon, who was based on artwork which looked like this:

Hence there is an original out there which the AI probably found, although I have to say it seems very apporopriately named and if someone were to be asked to draw a “Crungus”, they’d probably produce a picture a bit like one of these.
It isn’t difficult to find these monsters. Another one which I happen to have found is “Eadrax”:
Eadrax is the name of a planet in ‘The Hitch-Hiker’s Guide To The Galaxy’ but reliably produces fantastic monsters in Stable Diffusion. This seems to be because Google will correct the name to “Andrax”, an ethical hacking platform which uses a dragon-like monster as its mascot or logo. An “eadrax” seems to be a three-dimensional version of that flat logo. But maybe there’s something else going on as well.
There’s a famous experiment in psychology where people whose spoken languages were Tamil and English were asked which one of these shapes was “bouba” and which “kiki”:
I don’t even need to tell you how that worked out, do I? What happens if you do this with Stable Diffusion? Well, “kiki” gets you this, among many other things:
“Bouba” can generate this:
I don’t know about you, but to me the second one looks a lot more like a “bouba” than the first looks like a “kiki” instance. What about both? Well, it either gets you two Black people standing together or a dog and a cat. I’m quite surprised by this because it means the program doesn’t know about the experiment. It doesn’t, however, appear to do what the human mind does with these sounds. “Kiki and Bouba” does this:
Kiki is of course a girl’s name. Maybe Bouba is a popular name for a companion animal?
This brings up the issue of the private vocabulary latent space diffusion models use. You can sometimes provoke such a program into producing text. For instance, you might ask for a scene between two farmers talking about vegetables with subtitles or a cartoon conversation between whales about food. When you do this, and when you get actual text, something very peculiar happens. If you have typeable dialogue between the whales and use this as a text prompt, it can produce images of sea food. If you do the same with the farmers, you get things like insects attacking crops. This is even though the text seems to be gibberish. In other words, the dialogue the AI is asked to imagine actually seems to make sense to it.
Although this seems freaky at first, what seems to be happening is that the software is taking certain distinctive text fragments out of captions and turning them into words. For instance, the “word” for birds actually consists of a concatenation of the first part, i.e. the more distinctive one, of scientific names for bird families. Some people have also suggested that humans are reading things into the responses by simply selecting the ones which seem more relevant, and another idea is that the concepts associated with the images are just stored nearby. That last suggestion raises other questions for me, because it seems that that might actually be a description of how human language actually works mentally.
Examples of “secret” vocabulary include the words vicootes, poploe vesrreaitas, contarra ccetnxniams luryea tanniouons and placoactin knunfdg. Here are examples of what these words do:
The results of these in order tend to be: birds, rural scenes including both plants and buildings, young people in small groups and cute furry animals, including furry birds. It isn’t, as I’ve said, necessarily that mysterious because the words are often similar to parts of other words. For instance, the last one produces fish in many cases, though apparently not on Stable Diffusion, but here seems to have produced a dog because the second word ends with “dg”. It produces fish because placoderms and actinopterygii are prominent orders of fish.
It is often clear where the vocabulary comes from, but that doesn’t mean it doesn’t constitute a kind of language because our own languages evolve from others and take words and change them. It can easily be mixed with English:

This has managed to preserve the birds and the rural scene with vegetation, but after that it seems to lose the plot. It often concentrates on the earlier part of a text more than the rest. In other words, it has a short attention span. The second part of this text gets me this:
I altered this slightly but the result is unsurprising.
Two questions arise here. One is whether this is genuine intelligence. The other is whether it’s sentience. As to whether it’s intelligent, I think the answer is yes, but perhaps only to the extent that a roundworm is intelligent. This is possibly misleading and raises further questions. Roundworms are adapted to what they do very well but are not going to act intelligently outside of that environment. The AIs here are adapted to do things which people do to some extent, but not particularly generally, meaning that they can look a lot more intelligent than they actually are. We’re used to seeing this happen with human agency more directly involved, so what we experience here is a thin layer of humanoid behaviour particularly focussed on the kind of stuff we do. This also suggests that a lot of what we think of as intelligent human behaviour is actually just a thin, specialised veneer on a vast vapid void. But maybe we already knew that.
The other question is about sentience rather than consciousness. Sentience is the ability to feel. Consciousness is not. In order to feel, at least in the sense of having the ability to respond to external stimuli, there must be sensors. These AIs do have sense organs because we interact with them from outside. I have a strong tendency to affirm consciousness because a false negative is likely to cause suffering. Therefore I believe that matter is conscious and therefore that that which responds to external stimuli is sentient. This is of course a very low bar and it means that I even consider pocket calculators sentient. However, suppose that instead consciousness and sentience are emergent properties of systems which are complex in the right kind of way. If digital machines and their software are advancing, perhaps in a slow and haphazard manner, towards sentience, they may acquire it before being taken seriously by many, and we also have no idea how it would happen, not just because sentience as such is a mystery but largely because we have no experience of that emergence taking place before. Therefore we can look at Loab and the odd language and perhaps consider that these things are just silly and it’s superstitious to regard them as signs of awareness, but is that justified? The words remind me rather of a baby babbling before she acquires true language, and maybe the odd and unreliable associations they make also occur in our own minds before we can fully understand speech or sign.
Who, then, is Loab? Is she just a collaborative construction of the AI and countless human minds, or is she actually conscious? Is she really as creepy as she’s perceived, or is that just our projection onto her, our prejudice perhaps? Is she a herald of other things which might be lurking in latent space or might appear if we make more sophisticated AIs of this kind? I can’t answer any of these questions, except perhaps to say that yes, she is conscious because all matter is. What she’s actually doing is another question. A clockwork device might not be conscious in the way it “wants” to be. For instance, it’s possible to imagine a giant mechanical robot consisting of teams of people keeping it going, but is the consciousness of the individual members of that project separate from any consciousness that automaton might have. It’s conceivable that although what makes up Laion is conscious, she herself is not oriented correctly to express that consciousness.
A more supernaturalistic explanation is that Midjourney (I assume) is a portal and that latent space represents a real Universe or “dimension” of some kind. It would be hard to reconcile this idea with a deterministic system if the neural net is seen as a kind of aerial for picking up signals from such a world. Nonetheless such beliefs do exist, as a ouija board is actually a very simple and easily described physical system which nevertheless is taken as picking up signals from the beyond. If this is so, the board and planchette might be analogous to the neural net and the movement of the hands on the planchette, which is presumably very sensitive to the neuromuscular processes going on in the arms and nervous systems of the human participants, to the human artists, the prompt, the computer programmers and the like, and it’s these which are haunted, in a very roundabout way. I’m not in any way committing myself to this explanation. It’s more an attempt to describe how the situation might be compared to a method of divination.
I’ve mentioned the fact there are artists involved a few times, and this brings up another probably unrelated concern. Artists and photographers, and where similar AIs have been applied to other creative genres the likes of poets, authors and musicians, have had their work used to train it, and therefore it could be argued that they’re owed something for this use. At the other end, bearing in mind that most of the images in this post have been produced rapidly on a free version of this kind of software and that progress is also extremely fast, there are also images coming out the other end which could replace what artists are currently doing. This is an example of automation destroying jobs in the creative industries, although at the same time the invention of photography was probably thought of in a similar way and reports of the death of the artist were rather exaggerated. Instead it led to fine art moving in a different direction, such as towards cubism, surrealism, impressionism and expressionism. Where could human art go stimulated by this kind of adversity? Or, would art become a mere hobby for humans?






























