The Way In

Backing the losers

I’ve a tendency to back losers. For instance, Prefab Sprout and The The were my favourite bands when I was in my late teens and both were markedly unsuccessful. In keeping with this trend, I used to have a Jupiter Ace computer and I’ve alluded to this a few times on this blog. Jupiter Cantab, the company which designed and made it, had a total of I think five employees, seemed to work out of a garage and a suburban house in Cambridgeshire and had a turnover of around five thousand pounds. They went bust maybe a year or two after releasing the Ace in October 1982 CE and had to sell off all their old stock, a happy thing for me as it meant I could acquire a new computer. Its hardware was very basic even for late ’82, but its firmware decidedly was not. Unlike practically every other home computer of the time, the Ace used not BASIC but FORTH as its native programming language. Perversely, I considered writing a BASIC interpreter for it for a while but it seemed a bit silly so I didn’t.

FORTH, unlike BASIC as it was at the time, was considered a “proper” programming language. It has two distinctive features. One is that it uses a data structure known as the stack, which is a list of numbers in consecutive locations in memory presented to the user as having a top and a bottom. Words in the FORTH language usually take data off the top of the stack, operate on them and may leave one or more results on it. This makes the syntax like Latin, Turkish or Sanskrit rather than English, since instead of writing “2+2”, you write “2 2 +”, which leaves 4 on the stack. The other feature is that rather than writing single programs the user defines words, so for example to print out the character set one writes:
: CHARSET ( This is the defined word and can be whatever the user wants except for control characters ) 255 32 DO I EMIT LOOP ;
If one then types in CHARSET and presses return (or ENTER) in the Ace’s case), it will print out every character the Ace can display except for the graphics characters whose codes are below 32.

What you see above is the output from the Ace when you type in VLIST, i.e. list every word in its vocabulary. I think there are a total of about 140 words. All of them fit in 8K and show that FORTH is a marvellously compact language compared to BASIC or in fact most other programming languages. For instance, the ZX81’s BASIC has around forty-one words. FORTH on the Ace, and in general, was so fast that the cheapest computer faster than it, the IBM PC, cost more than a hundred times as much. For instance, in order to produce sound it was possible, as well as using the word BEEP, to define a word that counted from 0 to 32767 between vibrations and still produce a respectable note by moving the speaker in and out. By contrast, the ZX81 would take nearly ten minutes to count that high and had no proper sound anyway. This is a somewhat unfair comparison but illustrates the gulf between the speed of this high level language and the other.

Whittling Down The Vocab

As I’ve said, FORTH consists of words defined in terms of other words and therefore some people object to calling code written in it “programs”, preferring “words”. The fact that this definitional process was core to the language immediately made me wonder what would constitute a minimal FORTH system. There are quite a few words easy to dispense with in the vocabulary listed above. For instance, the word SPACES prints whatever number of spaces is indicated by the number on top of the stack, so 32 SPACES prints 32 spaces. However, this word could’ve been defined by the user, thus:
: SPACES 0 DO SPACE LOOP ;

The DO-LOOP control structure counts between the two numbers on top of the stack and executes the code between DO and LOOP the number of times it takes to do that. It can be taken a lot further than that though. SPACE and CR are two words with a similar structure: they print out a character. SPACE unsurprisingly prints out a space. CR does a carriage return. Both are part of the standard ASCII character set and the word for printing the ASCII character indicated by the number on top of the stack is EMIT, so they can be defined thus:
: SPACE 32 EMIT ;

: CR 13 EMIT ;

Hence three words are already shown to be unnecessary to the most minimal FORTH system, and the question arises of what, then, would constitute such a system. What’s the smallest set of words needed to do this?

The Ace had already added quite a lot of words which are not part of standard FORTH-79 and omitted others which are easily defined, examples being all the floating point words, PLOT, BEEP, CLS, VIS and INVIS. Others are trivial to define, such as 2+, 1- and 1+. Others are a bit less obvious: PICK can be used to replace DUP, SWAP and OVER, ROT is a special case of ROLL and so can be defined in those terms. . , that full stop, which prints a number, can be replaced by the number formatting words <#, # and #> . You can continue to whittle it down until you have a very small number of words along with the software which accepts input and definitions, and you’re done. In fact, if you know the hardware well enough you can make it even smaller because, with the Jupiter Ace for example, you know where the display is stored in RAM, how the stacks work (there are actually two because practically all computers implicitly use a stack for subroutines) and when it comes down to it it’s even possible to define words which accept machine code, the numbers computers actually use which represent simple instructions like adding two numbers together or storing one somewhere.

This is about how far I got until recently I managed to join two ideas together that I hadn’t previously managed.

Logic To Numbers

As you probably know, my degrees are in philosophy and my first degree is in the analytical tradition, which is the dominant one in the English-speaking world. It’s very common for philosophy degrees to be rubbished by the general public and even within philosophy, continental and analytical philosophers are often hostile to each other. What may not be appreciated is that much of philosophy actually closely resembles mathematics and by extension, computer science. When the department at my first university was closed down, some of it merged with computing. It also turns out, a little surprisingly, that one of my tutors, Nicholas Measor, was a significant influence on the theory of computing, having helped develop modal mu calculus, which is concerned with completeness of systems and temporal logic. He wrote a paper called “Duality and the Completeness of the Modal mu-Calculus” in the ’90s. This has kind of caused things to fall into place for me.

The Dedekind-Peano axioms for the set of natural numbers are central to the theoretical basis of arithmetic. They go as follows:

  1. 0 is a natural number.
  2. For every natural number x, x=x.
  3. For every natural number x equal to y, y is equal to x.
  4. For all natural numbers x, y, z, if x=y and y=z then x=z.
  5. For all a and b, if a is b natural number and a=b then a is a natural number.
  6. Let S(n) be “the successor of n”. Then for every natural number n, S(n) is a natural number.
  7. For every natural number S(m) and S(n), if S(m) = S(n) then m=n.
  8. For every natural number n, S(n)=0 is false.
  9. If K is a set such that 0 is in K, andfor every natural number nn being in K implies that S(n) is in K,then K contains every natural number.

You can then go on to define addition, subtraction, multiplication and inequalities. Division is harder to define because this is about integers, and dividing one integer by another may lead to fractions, decimals and so forth. I’ve known about all this since I was an undergraduate but hadn’t given it much thought. It is, incidentally, possible to take this further and define negative, real and presumably imaginary, complex and hypercomplex numbers this way, but the principle of knowing that that’s possible is enough really.

Dyscalculic Programming

If you have a language which can express all of these axioms, you have a system which can do most arithmetic. And this is where I had my epiphany, just last week: you could have a programming language which didn’t initially use numbers at all, because numbers could be defined in terms of these axioms instead. It would be difficult to apply this to FORTH because it uses sixteen bit signed binary integers as its only data type but I don’t think it’s impossible and that would mean there could be a whole earlier and more primitive programming language which doesn’t initially even use numbers. This is still difficult and peculiar because all binary digital computers so far as I know use sequences of zeros and ones, making this rather theoretical. It’s particularly hard to see how to marry this with FORTH.

Proof Assistants

Well, it turns out that such programming languages do exist and that they occupy a kind of nebulous territory between what are apparently called “proof assistants” and programming languages. Some can be used as both, others just as the former. A proof assistant is a language somewhat similar to a programming language but helps the user and computer together arrive at proofs. I have actually used one of these without realising that that was what I was doing, back in the ’80s, where the aforementioned Nicholas Measor wrote an application for the VAX called “Citrus” after the philosopher E. J. Lemmon, who incidentally died 366 days before I was born, whose purpose was to assist the user to prove sequents in symbolic logic. My approach to this was to prove them myself, then just go along to the VAX in the computer basement and type in what I’d proven, although it was admittedly helpful on more than one occasion. While using this, I mused that it was somewhat like a programming language except that it wasn’t imperative but declarative and wondered how one might go about writing something like that. I also considered the concept of expressive adequacy, also known as functional completeness, in this setting in connection once again with FORTH, realising that if the Schaeffer stroke were to be included as a word in FORTH a whole host of definitions could easily provide any bitwise function. It was also borne in upon me that all the logics I’d come across so far were entirely extensional and that it might even be a distinctive feature of logic and mathematics per se that it was completely extensional in form. However, I understand that there are such things as intensional logics, and I suppose modality might be seen in that way although I always conceive of it in terms of possible world semantics and multivalent truth values.

It goes further than that though. I remember noticing that ALGOL 60 lacks input/output facilities in its standard, which made me wonder how the heck it was supposed to be used. However, it turns out that if you are sufficiently strict with yourself you can absolutely minimise I/O and do everything inside the compiler except for some teensy little bit of interaction with the user, and this instinct, if you follow it, is akin to functional programming, a much later idea which enables you to separate the gubbins from how it looks to the user. And there are purely functional languages out there, and at this point I should probably try to express what I mean.

From Metaphysics To Haskell

Functional programming does something rather familiar. Considering the possibility that programming can dispense with numbers as basic features, the emphasis shifts to operators and functions and they become “first-class citizens”. This, weirdly but then again not so weirdly, is exactly what happens in category theory. Haskell is the absolutely paradigmatic functional language, and it’s been said that when you think you’re programming in it, it feels like you’re actually just doing maths. This approach lacks what you’d think would be a crucial feature of operating a computer just as ALGOL 60 can’t actually print or read input, and such things are known as “side-effects” in functional programming. If a function does anything other than take the values, performing an operation on them and returning the result, that’s a side-effect. This makes it easier to formally verify a program, so it’s linked to the mu calculus.

I’ve now mentioned Haskell, which brings me a bit closer to the title of this post and now I’m going to have to talk about monads. Monads are actually something like three different things and it’s now occurred to me that if you put an I at the start rather than an M you get “ionad”, which gives me pause, but this is all quite arcane enough. Firstly, Leibniz’s metaphysics prominently features the concept of monads. In 1714, he brought out a ninety-paragraph book called ‘The Monadology’ setting out his beliefs. It wasn’t originally his idea but he developed it more than others. Leibniz’s monads are indivisible units within reality which has no smaller parts and is entirely self-contained, though not physical like an atom. Anything that changes within it has to arise within itself – “it has to really want to change”. Since monads don’t interact there’s an arrangement called “pre-ordained harmony” where things in each monad are destined to coincide appropriately. I mean, I think this is all very silly and it arises from Descartes and the problem of the interaction of soul and body, but it’s still a useful concept and got adopted into maths, specifically into category theory. In that, it’s notoriously and slightly humorously defined thus: “in concise terms, a monad is a monoid in the category of endofunctors of some fixed category”, and this at least brings us to the functor. A functor is a mapping between categories. Hence two different fields of maths might turn out to have identical relationships between their elements. It’s a little like intersectionality in sociopolitical terms, in that for example racism and sexism are different in detail but are both forms of marginalisation, the difference being that intersectionality is, well, intersectional, meaning that different kinds of oppression do interact, so it isn’t quite the same as either a monad or a functor. Finally, in Haskell a monad is – er. Okay, well at this point I don’t really know what a monad is in Haskell but the general idea behind Haskell was originally that it was safe and also useless because you could never get anything into or out of a program written in it. This isn’t entirely true because it does do work in a thermodynamic sense, so if you take a computer switched on but doing nothing and you run a Haskell program on it which does something, it does get at least slightly warmer. That is, it does stuff to the data already inside it which you can never find out about, but it’s entirely self-contained and does its own thing. So that’s all very nice for it, but rather frustrating, and just now I don’t know how to proceed with this exactly, except that I can recognise that the kind of discipline one places oneself under by not knowing how one is going to get anything on the screen, out of the speakers or off the keyboard, trackball, mouse or joystick has the potential of making one’s programming extremely pure, if that’s the word: operating in an extremely abstract manner.

Home Computing In The 1960s

I do, however, know how to proceed with what I was thinking about earlier. There is some tiny vocabulary of FORTH, perhaps involving a manner of using a language which defines numbers in the terms outlined above, which would be simple enough to run on a simple computer, and this is where things get theoretical, because according to Alan Turing any computer, no matter how simple, can do anything any other computer can do given enough time and resources. This is the principle of the Turing machine. Moreover, the Turing machine can be realised in terms of a language referred to as the Lambda Calculus.

Underneath the user interface of the Jupiter Ace operates the Z80A microprocessor. This has 694 instructions, which is of course quite a bit more than the 140 words of the Jupiter Ace’s primitive vocabulary. Other processors have fewer instructions, but all are “Turing-complete”, meaning that given enough time and memory they can solve any computing problem. In theory a ZX81 could run ChatGPT, just very, very, very slowly and with a very big RAM pack. So the question arises of how far down you can strip a processor before it stops working, and this is the nub of where I’m going, because actually you can do it with a single instruction, and there are even choices as to which instruction’s best.

The easiest one to conceive of is “subtract and branch if negative”. This is a machine which has two operands in memory. One of them is a number, which it subtracts from the number it already has in mind. If the result of this number turns out to be negative, it looks at the other operand and jumps to the number indicated in the memory. Otherwise it just goes to the next memory address and repeats the operation. It would also save space on a chip if the values are stored in memory rather than the chip itself, so I propose that the program counter and accumulator, i.e. where the data are stored, are in the main memory.

Moreover, I’ve been talking about a single instruction but in fact that actual instruction can be implied. It doesn’t need to exist explicitly in the object code of the memory. Instead it can be assumed and the processor will do what that instruction demands anyway, so in a way this is a zero instruction set CPU.

What this very simple computer does is run a program that emulates a somewhat more complex machine which runs the stripped down FORTH natively. This is where it gets interesting, because the very earliest microprocessors, the 4004 and 4040, needed more transistors than this machine would, and it would’ve been entirely feasible to put this on a single silicon wafer in 1970. This is a microcomputer like those found in the early ’80s for which the technology and concepts existed before the Beatles split up.

This is of course a bit of a mind game, though not an entirely useless one. What I’ve basically discovered is that I already have a lot of what I need to know to do this task, but it’s on a level which is hard to apply to the problem in hand. But it is there. This is the way in.

Wordle

You must surely know Wordle, but just in case you don’t, it’s a daily game where you have to guess a five-letter word in six goes. I won’t post an example in case it’s one you haven’t done. If you get the right letter in the wrong place, you get a yellow tile. The right letter in the right place gets you green. The wrong letter completely yields a black tile.

This is very similar to the old computer game Moo and the board game Mastermind. I’d even go so far as to say it basically is Moo, but with words rather than numbers or patterns of colours. Mastermind works like this. One player sets up four pegs of different colours, including missing ones, which is hidden from the other. The other then has to guess by putting down their own sets of pegs along something like twenty holes, depending on the size of the board, and the first player places black or white pegs in a 2×2 grid at the side of the guess row indicating right pegs in the right place (black) and right colours in the wrong place (white). The colours of the guesses seem to have changed, as indicated by this illustration:

Photo taken by User:ZeroOne

Completely wrong guesses are indicated by blanks. Mastermind came out in 1970, and in my own child mind there was a clear association with the quiz programme but in fact the two have little in common. I actually thought it was a tie-in at the time, but I no longer think it was and it seems to have been coincidental. I have a tendency to be silly with games, and even more so as a child, and I remember one of my “patterns” being completely empty. I won with that one.

Moo is kind of the same game. Invicta, the company which made the “board” game, actually branded a calculator-like device in 1977 on which one could play that game:

By MaltaGC – Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=114645559

Moo was written for the TITAN computer system in the 1960s. I first came across it in a BASIC programming language primer in 1981 CE. This version has the computer generate a four-digit positive integer which the user then has to guess. Right guesses in the right place are “bulls” and right ones in the wrong place are “cows”. It’s been proven that any four-digit sequence needs at most seven goes to get it right. Before being computerised, Bulls & Cows was a paper and pencil game.

Wordle is a little different, but not very. There is a four-letter version which was apparently also a pencil and paper game and that variant is out there online too. I imagine the issue with that is that a lot of profanities would be used.

Now, I could make myself out to be an expert on Wordle but I’m really not. I’m sure there are strategies but so far I haven’t worked one out much. Because you only get one go a day, it’s hard to practice. Certain combinations of letters are more likely in English generally. For instance, if there’s a Q there will very probably be a U, meaning that Q is only likely to be found in the first three positions. A while back I actually studied form for the first hundred or so Wordle answers, tallying which letters were most likely to occur in which positions. This is not completely random because five-letter English words will inevitably have certain features. For instance, no word in English, so far as I know, begins with “MK” or “TK”, but many begin with “SL” and “TH”. Based on this selection, it came to appear that the most likely word was “SRORE”, which is plainly wrong. A more likely word is “SHIRE”. However, there is actually a definitive list of all Wordle answers and that word was used on 22nd January 2022. No, I haven’t looked closely at that list although I do know what the last word is.

Wordle has a finite lifespan. It will end on 20th October 2027 and there are a total of 2 314 words. However, there is a much larger number of words allowed in the sequence leading up to the answer. I have the number 12 000 in my head but maybe not. This is the biggest difference between Wordle on the one hand and Moo and Mastermind on the other. The other two permit any combination, meaning that nothing needs to be stored. To use modern jargon, the patterns in the other two are procedurally generated. There is an algorithm which determines which symbols occur where, or rather, it’s completely random. In computerised versions it’s more likely to be pseudo-random. Not so with Wordle, which needs a series of stored words. Or does it? Is there a way to determine meaningful English five-letter words? It’s already clear that very few if any of end in Q, but my intuition tells me that at best there would be a substantial number of nonsense words if you tried to do this, which rules out that approach.

It might be thought that the game’s reliance on a set number of words would make it a creature of the age of cheap information storage, but this is only partly true. The ASCII version of an uncompressed list of 2314 five letter words is only 11K plus a couple of hundred bytes, and even that’s a lazy way of storing them because only two dozen and two characters are in use, which is only five bits per character, reducing it to just over 7K. This is with no real compression algorithm, but of course there can easily be one because of common sequences of vowels and consonants, or both, and the non-occurrence of letters in particular places. For instance, it’s rare for a Y to occur immediately before another letter in the middle of a word and rare for an I to occur at the end. However, because of the list of permissible words this is not the whole story, and if that is 12 000 the storage needed will be several times larger at around 37K.

I will now appear to digress.

In 1987, the big hit computer game was of course Tetris, also known, a little irritatingly, as TETЯIS. At the time the kind of home computer you might find which was somewhat up-market but nonetheless just about affordable for some people was the Amiga 500, with a 68000 CPU, 512K of RAM and a screen resolution comparable to that of a PAL TV. Nonetheless, there is now a version of Tetris for the ZX81 and it would’ve been feasible to write one for the very first mass-market microcomputers, particularly the Apple in low-resolution graphics mode. This brings to mind the oddity that whereas some inventions depend heavily on a series of predecessors leading up to shortly before their own appearance, others are just waiting for someone to think of them. The form factors of PC cases are, er, a case in point. It would’ve been entirely feasible for a Georgian cabinet maker to have churned out wooden and metal PC cases although it would’ve been a while before anything suitable could’ve been put in them, so there would’ve been no market.

End of apparent digression.

Wordle is an example of this low-dependency type of game. In 1755, a couple of people could’ve taken Samuel Johnson’s dictionary and used it to play it. Even in computer form it could’ve existed quite a long time ago. The limiting factor is the storage space needed for the list of possible intermediate words. There are a total of 12 972 words in its dictionary, whereof only a small fraction are permissible as answers. It’s possible, using modern compression algorithms, to get this down to 43K but that doesn’t mean an old computer would have had the storage space to process such algorithms, which might also be very slow. However, even without working very hard it’s feasible to get this down to 40Kb, meaning that the entire dictionary could be held in the RAM of an eight-bit computer. That computer would of course have to do other things than just hold words in its memory. Dispensing with that requirement, it would also be possible to take the same approach as early spell-checking algorithms and have the machine check words for feasibility. For instance, it could disallow any instance of five identical letters or impronounceable consonant clusters, or, as Commodore 64 Scrabble used to do, simply trust the user not to use nonsense words. With the latter approach, only 12K of RAM would be required for the list of right answers.

Here’s a possible 16K ZX81 implementation of Wordle. There are 2314 words in the dictionary, compressed to a packed five-bit per letter form, taking up less than 12K. There is no error-checking for forbidden words. A correct letter in the correct place is shown as white on black, a correct letter in the wrong place as black on white flashing with white on black, which would have to be done through software, and a completely incorrect letter is black on white. The program asks for the date when you start, converts it to a position value for a compressed series of strings (maximum string length on the ZX81 was 4096, so three strings would be needed) and loads that four-byte value into a short string literal, clearing the last seven bits which would belong to the next word. Every word the user inputs is appropriately compressed and compared five bits at a time with the string. This is then displayed on screen with only the flashing letters stored by position in the display file. If all letters are correct, the appropriate response is generated from an array depending on how many turns the user has had.

I think it’s clear that if this is feasible on a 16K ZX81 it would also be feasible on practically any computer (except maybe the MC-10) manufactured since 1982, and in most cases colour could be used. This is not a difficult game to implement, even in BASIC, although it seems to lend itself more to BCPL, C, FORTH or Assembler. It’s just eminently doable, and it even existed in some form on computers back to about 1968.

As to strategy, I have little idea. There’s little opportunity to practice with only one word a day, and in that way it’s a bit of a leveller. I have developed very rudimentary tricks. For instance, I tend to move a letter one place in either direction if it’s the right one in the wrong place, I start with a small list of possible words (CRANE, ADIEU, POINT or SOARE) and avoid impossible consonant clusters, which I have to anyway because they aren’t in the dictionary. I can’t actually implement this algorithm because it would involve looking at the word list and therefore cheating, but obviously it can be done quite straightforwardly.

Bye for now.

The Apple Mannikin

Back in the 1970s, computer graphics were at a relatively primitive stage. A lot of them were just wireframe, and this very style became iconic of high technology and the futuristic. The 1979 Disney Film ‘The Black Hole’ was notable for having the longest ever CGI sequence in a feature film up until that time, at around a minute and a half. Here it is:

In the cinema, that looked pretty impressive to me at the time, as I’m sure it did others. However, CGI as we’re familiar with it today also existed, as with NASA’s sequence illustrating the Voyager missions, which was however updated with textures from the mission itself. Then there was Sunstone, also from 1979:

A few years later, there was ‘The Works’ in 1984:

However, by then they should’ve known better, because changes were taking place in mathematics which were reaching some kind of climax by that point, namely research into fractals.

I don’t really understand calculus, but I probably inaccurately think of it in two ways: trying to work out where a wiggly line will go next, and finding the slope of a curve at a particular point, with the emphasis on “point”. It’s where my understanding of maths runs out and therefore a bit of a locked gate for me because of what lies beyond in terms of its practical applications, which I can’t access. Nonetheless I am aware that in 1872, Karl Weierstrass announced his discovery of a function expressed by a wiggly line on a graph which was spiky everywhere, no matter how close you zoomed in on it. This is of course the Weierstrass Function, and looks like this:

The zoomed in bit is to show that it’s spiky on every level. Although it’s a line, there’s no curved or straight stretch anywhere along its length where it isn’t changing direction, no matter how small the difference between the values of x is. This is referred to as “nowhere differentiable”. The function can be expressed thus:

where α=the natural logarithm of a divided by the natural logarithm of b. There are plenty of discontinuous functions like this, but this has values at every point. Sometimes it seems like the nineteenth century consists largely of the eighteenth century status quo and simplicity being overturned at every point, just as the seventeenth century feels like a time of rising sophistication after the relative calm of the sixteenth, preceded by the complexity of the Middle Ages, and so on, which of course makes sense from a Marxist and Kuhnian perspective (note the singular).

This was the first of a series of curves, infinite really, which became known as fractals. The standard, and wrong, way of describing a fractal is that it’s self-similar. There are many self-similar fractals, such as the Koch Snowflake:

This starts out as a triangle, to whose sides spikes are added, making a partly concave dodecagon, to whose sides spikes are added, making a four dozen-sided shape and so forth ad infinitum. The above shape, partly blurred by the fact that it isn’t a vector image due to the difficulty of using vector graphics on WordPress, has seven iterations and therefore 12 228 sides, or it would have if it was actually drawn as opposed to being a raster image. And we’re back to computer graphics. However, most fractals are not self-similar in that way. The coastline of this island is fractal. The shorter the ruler used to measure it, the longer it gets, and you could be reduced to measuring between the grains of sand on a beach or the bumps on a cliff face, at which point the tides and whether something counts as wet come into consideration, but it isn’t self-similar. There aren’t lots of “little Britains” just off our coast which themselves have littler Britains off theirs and so on, appealing though the idea might be.

A fractal is actually a shape with a non-integral number of dimensions. Whereas a square has two dimensions and a cube three, and a line one, it’s useful to consider dimensionality as having values in between whole numbers. The Koch Snowflake, for example, has about 1.262 dimensions, and Great Britain 1.21. The reason the number of dimensions a fractal has is not integral is that the “size” of some shapes, such as the measure polytopes of the line segment, square, cube and tesseract, can be thought of as its measure to the power of the number of dimensions it has, and this is in those cases a whole number but in the cases of fractals. The Koch Snowflake is a wiggly line which meets itself, but it comes close to filling the area around the perimeter of a roughly hexagonal shape, so it’s neither one-dimensional – it isn’t a line – nor two-dimensional – it isn’t a hexagon or a star – but somewhere in between. However, although these ideal platonic shapes are self-similar, most fractals are not, but that doesn’t stop them from having a fractional or irrational number of dimensions.

The real world is not like the smoothness seen in computer graphics, particularly earlier ones. The three videos at the start of this post are all coolly mathematical and, while difficult to produce, involve simple shapes textures with simple textures. With the aid of fractals, it became easy to generate this kind of picture:

This image dates from around 1982. In ‘The Works’, there is some kind of bumpy terrain and I’m not sure how this was generated. As far as I know, this was first used in a feature film, ‘Star Trek II’, in 1982:

The structure of this clip is quite interesting because it goes from old-style wire frame models through textured rendering of three-dimensional objects and ends with the mapping of a fractally-generated surface. At the end of the Voyager missions to Saturn in late 1980, it was mentioned that the CGI people who had produced the videos of the mission and mapped the textures taken by the Voyagers’ cameras onto models of the planets and moons had left to work on ‘Star Trek II’. I presume this is what they went on to do. Incidentally, this disbanding of the team working on the Voyager projects, which was related to the six-year gap between the Saturn and Uranus encounters, shows the difficulty the kind of societies which send rockets into space have with achieving long-term projects. They couldn’t just keep these people on the payroll for six years while they did nothing, so we get this clip but at what cost? What else didn’t we get and who else was “let go”?

This is a “making of” video of the same:

A further tangential detail: the star field is as seen from ε Indi. Alnitak, Alnilam and Mintaka are seen as lined up near the beginning of the clip, indicating their relatively great distance, and as the commentary mentions, the Sun is visible as part of Ursa Major near the end. The constellation of Indus is opposite that of Ursa Major in the sky – it’s a Southern constellation – and ε Indi is almost twelve light years away. This particular sequence is a milestone in the development of CGI.

Raster scan CGI on flat displays is often quite rationally organised at a fairly low level, in that the screen is seen as a rectangular array of pixels like a graph, with the origin either at a corner or the centre. This means that the famous Mandelbrot Set image – the Apfelmännchen or “apple mannikin” as it’s known in German – is effectively a graph with the X axis running horizontally along the middle of the picture. It’s often difficult to remember that this X axis at the centre is in fact the real number line. These are the actual axes of that graph:

Perhaps surprisingly, zero is near one side of the cardioid (heart shape) whereas intuition would suggest it was at the bottom of Seahorse Valley where the circle and cardioid meet. It can be seen from this graph that the set is based on some kind of calculation involving real numbers, but what about the vertical axis?

The vertical axis represents the so-called imaginary numbers. These are numbers based on a concept which originally arose when it was realised that the square root of minus one seemed to be impossible. Since signs cancel out in multiplication, -1 x -1 is 1, so it clearly isn’t the real number one, and the only option appears to be to invent a second axis and think of numbers as existing on a plane as coördinates. These are known as complex numbers. They have both a real and an imaginary part. The word “imaginary” is used for want of a better term, as in fact these numbers are just as real as “real” numbers. There are also hypercomplex numbers such as quaternions and octonions which are a generalisation of this idea from the plane to space and hyperspace. On the whole, all of these numbers can be added, subtracted and the like, but the operations concerned don’t always have the same properties as those on real numbers. For instance, real number multiplication is commutative: 4 x 5 = 5 x 4. Octonion mutliplication is not, and this is crucial because for reasons I won’t go into here, it leaves the possibility that there is an omniscient observer open – it prevents Bell’s Theorem from being a proof of atheism.

The formula used to generate the Mandelbrot Set is quite simple, but before I get to that I’m briefly going to wallow in complex numbers for a bit. Complex numbers are combinations of real and imaginary numbers, more specifically the sum of a real number and the product of a real and complex number. They are useful, and here are two examples:

AC current is generated by a rotary dynamo and its voltage and current therefore vary as a sine wave – that’s the dynamo spinning. Capacitors and inductors alter this variation. Capacitors delay current, for example. The impedance, which is the opposition of a circuit to a current when voltage is applied. This can be calculated using complex numbers and power consumption can be reduced by doing these calculations to design efficient AC circuits.

An object with a rest mass can never move at the velocity of light because it would have infinite mass. However, if mass can be validly expressed by a complex number, this would not be a problem and therefore if tachyons – particles which only move faster than light – exist, they would have to have complex mass.

Hence complex numbers do have real world applications. They’re not only pure mathematics.

The Mandelbrot Set can be plotted on a plane without explicitly using complex numbers. There are programming languages, such as FORTRAN, which have complex numbers as a data type just as they do real and integer numbers, and in FORTH and other threaded interpretative languages there are ways of defining words which can perform operations on complex numbers as two values on the stack, but this isn’t necessary to plot the Mandelbrot Set, although it is effectively what’s happening when it’s done. The formula for the set is zn+1 = zn2 + c, where z and c are a complex numbers and z is an integer. On a computer screen, each pixel is used as an example of c. This is a QBASIC program to generate the Mandelbrot Set in full:

DECLARE SUB InitPalette ()
DECLARE SUB DrawFractal ()
DECLARE SUB SaveScreen (filename AS STRING, w AS INTEGER, h AS INTEGER)

SCREEN 13
CLS
LINE (0, 0)-(320, 200), 15, BF

CONST w = 320
CONST h = 200

CALL InitPalette
CALL DrawFractal
CALL SaveScreen("MANDELBR.RAW", w, h)

REM This is fractal rendering code, the other functions are to make it look nicer
SUB DrawFractal
    DIM sx AS SINGLE
    DIM xy AS SINGLE
    DIM x AS SINGLE
    DIM y AS SINGLE
    DIM x2 AS SINGLE
    DIM y2 AS SINGLE

    DIM p AS INTEGER
    DIM r AS INTEGER
    DIM g AS INTEGER
    DIM B AS INTEGER

    CONST maxi = 100
    CONST colours = 256

    FOR py = 0 TO h - 1
        REM scale Y to -1:+1
        sy = (py / h) * 2! - 1
        FOR px = 0 TO w - 1
            REM scale x to -2.5:1
            sx = (px / w) * 3.5 - 2.5
            vy = 0
            vx = 0
            i = 0
            x = 0
            y = 0
            x2 = 0
            y2 = 0
            WHILE (x2 + y2 < 4) AND (i < maxi)
                xt = x2 - y2 + sx
                y = 2 * x * y + sy
                x = xt
                x2 = x * x
                y2 = y * y
                i = i + 1
            WEND
            c = i / maxi * colours
            PSET (px, py), c
        NEXT
    NEXT
END SUB


REM Changes the palette as the default is not pretty
SUB InitPalette
    DIM red AS LONG
    DIM green AS LONG
    DIM blue AS LONG
    DIM colour AS LONG

    FOR i = 0 TO 63
        blue = i
        green = i / 2
        red = i / 3
        colour = blue * 65536 + green * 256 + red
        PALETTE i, colour
        PALETTE i + 128, colour
    NEXT i
    FOR i = 63 TO 0 STEP -1
        blue = i
        green = i / 2
        red = i / 3
        colour = blue * 65536 + green * 256 + red
        PALETTE i + 64, colour
        PALETTE i + 192, colour
    NEXT i
END SUB

SUB SaveScreen (filename AS STRING, w AS INTEGER, h AS INTEGER)
    OPEN filename FOR OUTPUT AS #1
    FOR y = 0 TO h - 1
        FOR x = 0 TO w - 1
            PRINT #1, CHR$(POINT(x, y));
        NEXT x
    NEXT y
    CLOSE #1
END SUB

This is in QBASIC, Microsoft’s BASIC for DOS. It’s possible to zoom in by changing the ranges of py and sy in the control loops, and also their step value. The above code is both fancy and not optimised.

The first time I plotted the set, it was on an Acorn Electron. This was an eight-bit computer based on BBC Micro architecture, and was notably cut down from the glories of its predecessor. One of the ways in which this was done was by providing only four dynamic RAM chips and using them to store bytes in two halves, which meant that every time the CPU interacted with memory the clock speed was effectively halved – it had to fetch or store one byte in two cycles. It was possible to speed the computer up somewhat by fooling the video hardware into thinking it was using a text mode, because this skipped two scan lines per line of text and there was no need for the CPU to halt when the video hardware accessed video RAM, but this spoilt the display. The video modes I used to display the set were MODE 2 and MODE 0. The first of these is an eight-colour (supposèdly sixteen but half are merely flashing versions of the others) 256 x 160 display, and has the benefit of being colourful while not being very detailed. The other is 640 x 256 in two colours, allowing a lot of detail and to be honest I prefer it. Doing it on a BBC Micro would, for the reasons just mentioned, be faster than on an Electron, but that’s what I had. However, it’s dead easy to speed it up by only calculating the top or bottom half and mirroring it on the other half of the display. I think it took about eight hours to do the whole set.

I also did Seahorse Valley, which doesn’t benefit from any kind of reflection hack, and it took twenty hours. Not having a printer, I recorded it on a VHS cassette, which I still have somewhere. There are ways of speeding it up a lot, such as writing it in machine code and using fixed point arithmetic rather than floating point, and not bothering to calculate large blank bits of the picture, but I didn’t do those. Around the same time as I was doing this, a DOS program called FRACTINT was developed which managed to do it using only integer arithmetic and was therefore much faster.

The Mandelbrot Set is among the most complex objects in the Universe, or rather the Multiverse. It’s often claimed that the human brain is this, but in fact this object is far more complex because no matter how far you zoom in, there’s always more. It can also be extended into four dimensions because each point on it can be used to generate a Julia Set, such as this:

This can be thought of as a two-dimensional cross-section of an analogue of the Mandelbrot Set, as it varies continuously according to the location of the point used. One perhaps surprising fact about the set is that it took about fifteen years to prove that it was actually a fractal, over the period when it was all the rage and everyone was calling it one, when in fact it wasn’t known to be one. Also around this time it was proven to be as complex as it possibly could be.

The Mandelbrot Set itself does not reflect anything in the physical world, or rather the formula used to generate it seems not to have any practical application, but there is another set which is very similar and does represent something real, in the area of magnetism. This formula:

is quite reminiscent of the Mandelbrot Set’s, and describes what happens inside a magnetic material when it’s heated to the point where it’s completely demagnetised. A cold magnet is magnetic all the way through. All of its constituent parts which are individually magnetic are lined up. As it’s heated, the random movement of the atoms begins to dislodge the alignments in apparently random places, but they can in fact be predicted using this formula. If you plot this in the same way as the Mandelbrot Set, you get something like this:
From here. Will be removed on request.

There’s a blog mentioning this here.

Because of the visual effectiveness of fractal and this kind of imaging, it was suggested at some point that the nature of mathematical discovery had changed, because it was now possible to visualise much of what woul previously have seemed highly obscure. This has been seen as both bad and good. It’s good in that it makes maths more accessible and appealing, but it may also lead towards a bias towards the kind of maths that can do this kind of thing.

Finally, it occurs to me that my metaphor for consciousness being a property of matter like magnetism could be extended meaningfully to model a dying brain. What if the way consciousness works involves a whole, fully awake and living brain as one of the stable states, the “dark bits” as it were on the Mandelbrot Set, but that on falling asleep, having a seizure, being starved of oxygen or under the influence of drugs it fragments consciousness physically into little areas throughout the brain which are individually conscious but not unified? And what if this could be modelled mathematically? I don’t know where I’m going with this but it sounds promising.

Mathbox?

This is a single-chip pocket calculator from 1976. It’s probably the most basic actual calculator in existence or ever built, because of how it works and its very low specs. It’s integer-only, has only six digits on its display and uses RPN – Reverse Polish Notation. This means you put in the numbers first, hence the ENTER key, then the operation, so for example 4-3 would be typed in as 4 <ENT>3<ENT>-<ENT>. There is notably no “=” key because it isn’t necessary. I don’t know if you could put in a series of numbers and have the machine work its way through them, but two advantages of this method are that you don’t need to worry about priority (BODMAS – Brackets, Division, Multiplication, Addition, Subtraction – the order in which you work out the calculation) and you don’t need brackets because of the absence of priority.

This calculator is so simple that I could almost design it myself. However, since I set my aim even lower, if I designed it, it would be octal or hex rather than decimal. I remember this calculator being reviewed in ‘Which?’ magazine and thinking two things:

  1. It’s crap, and
  2. I could do that.

The fact I can remember that review and had that thought suggests it was on the market for a number of years.

The big difficulty would be how to implement division. Division is often achieved by shifting and adding if I recall correctly, just as multiplication is achieved by that method but in the opposite direction. Multiplication is shifting to the left, i.e. to more significant bits, and division shifts to the right, i.e. to less significant ones. If a bit got shifted past the most significant bit in the register holding the result, it would activate the overflow flag and the letter “E” would presumably show on the display.

This “Mathbox” has something in common with the ZX81. Although the home computer in question was very primitive, it was also technologically advanced in terms of hardware because in its basic form it had only five chips, as opposed to the ZX80 which had twenty-one. A ZX80 is upgradable to a ZX81 simply by changing the system software, so all those chips are functionally equivalent in the two machines. However, this is only possible because of the low system specs. It has 1K of RAM, so it uses 2 x 2114 static RAM chips, each storing four bits, and all the discrete logic (integrated circuits) are bundled together into the Uncommitted Logic Array. It also uses system software to generate the screen for three-quarters of the time, so it’s very slow, although the 1802-based COMX 35 which came out two years later was even slower, probably because the 1802 CPU is bloody weird. So there’s a trade-off between what can be done cheaply with the technology available at the time and the low specifications which make that possible at all, driving down manufacturing costs. The Mathbox has a single integrated circuit inside which isn’t even set in a neat rectangular or square package but is just in a blob, which apparently is also the case in the Furby, but this makes it cheap. The only other component was a resistor.

One thing I don’t understand about old pocket calculators is whether they performed more advanced mathematical functions in hardware or software. Integer addition and subtraction is straightforward and can easily be done with a few logic gates plus the convention that half of the numbers available are understood to be representations of negative numbers, which allows subtraction to be carried out by complementary addition. This is also achievable by inverting the bit pattern. Likewise, if there’s no exponent notation or representation involved, “floating” point numbers can be added and subtracted as if they’re fixed point or integer numbers. A fixed point number is something like a monetary value in pounds and pence or dollars and cents: it doesn’t really have a decimal point but you can just pretend it has. You might think you’re working out £89.95 plus fifteen percent VAT (£13.49 rounded down, £13.50 rounded up) but really you’re just adding 8995 and 1350.

One odd thing about this Mathbox, and in fact many calculators, is that they seem to use Binary-Coded Decimal (BCD) rather than actual binary, which is literally a bit of a waste of space in some ways. I shall explain. In binary, the integer 170 is 10101010 and 85 is 01010101. These are easy to remember because they alternate ones and zeros and are complementary. In BCD, each nybble (four-bit value) is confined to the integers between zero and nine, so one is 0001 and so on up to nine, which is 1001. This makes 85 10000101 in this system, and also gives 170 twelve bits as it would be 000101110000. This calculator has six figures, meaning it can represent integers up to 999 999, which is twenty-four bits of BCD, but two dozen bits can represent values up to 33 554 431, so the power of the device is somewhat curtailed. This curtailment also makes the design of the hardware more complex in terms of actual calculations, but easier in terms of producing a display. With an eight-figure display the discrepancy is greater, and with a ten-digit one greater still. A similar handicap also occurs with the use of standard form, which tends to run up to 99 places either side of the decimal point, when it could go up to 255. Moreover, using hex or octal wouldn’t induce this problem, so although it might be harder to read, the calculator would be more powerful and somewhat faster.

That’s the real Mathbox then, and it is the size of a matchbox. Coincidentally, the ZX81 was nicknamed a “matchbox”, because it was oddly small for the time and also rather featureless. Such devices are much more common today, such as the Raspberry π and the relatively minute stick PCs which come along from time to time. There’s also the Intel Compute Stick and the Gigabyte Brix GB-BACE-3160, whose 56.1 mm x 107.6 mm x 114.4 mm dimensions I find oddly appealing.

Although they can be said to be boxes which do maths, they don’t visibly do what most people tend to think of as maths. For instance, they decompress video imagery and I imagine have some kind of 3-D hardware acceleration, but although they could do so via an application, they generally wouldn’t respond to someone typing in formulas and equations on a keyboard.

I’d like to talk about two experiences of mine which probably say more about me than anything else, but which illustrate my preferences quite well. In 1968, a statistical software package was released called SPSS, which aimed to automate batch processing of statistical data. When I studied statistics as part of my first degree in ’85/’86, it was fairly usable, with a command line and a strong connection to the equations I held in my head for many different statistical tests. I then dropped out of psychology, mainly in fact due to the emphasis it placed on statistics which still seems totally wrong-headed to me, but it is exceedingly useful to understand how statistics works. I went back to studying psychology in the ‘noughties, when I attempted a psychology degree with the Open University, and found that SPSS was kind of completely sealed off to me. In an attempt to make it more user-friendly, it now resembled nothing so much as a spreadsheet, there wasn’t an obvious way to input equations and I had the same problem with it as I do with spreadsheets. I still feel that if the user is brought up on using the GUI version of SPSS that exists today, she will be distanced from what she’s actually doing when she attempts to carry out statistical operations on data, and it’s vitally important that we don’t get distanced from statistics, not least for civic reasons.

And then there are spreadsheets themselves. I struggle with these because I perceive them as having the same kind of distancing effect as a GUI. I feel like I don’t know what I’m doing when I use a spreadsheet, and that the data are trapped inside the computer in a little virtual box.

Taking this outside the realm of computers for a second, I have once ridden a scooter and once ridden a motorbike, both on private ground, and I felt the same disturbing abstraction from physical reality that I feel with this. This is even more pronounced when driving cars, which I can just about manage in terms of the technicalities of driving but are really not a good idea for me. A push bike has a connection with the ground, which is of course reduced by the artifice of gears, but if you stop pedalling on a flat surface you slow down, if you start pedalling you accelerate, and so on. You can also see everything happening. For this reason I’m not even keen on cable brakes because they feel like magic to me – I can’t see what’s going on inside them and I still don’t get them. I’d also link this to I-Tal, but not yet.

Getting back to computers, there’s a funny essay about this called ‘Real Programmers Don’t Use PASCAL‘, written in about 1980 as a parody of ‘Real Men Don’t Eat Quiche’.

As you probably know, I’m technically a sole trader and have had the need to crunch numbers, and have also done so in research projects I’ve pursued for CPD (Continuing Professional Development). Most of this I’ve done using pen and paper because that way I feel more in control. When I’ve felt the need to do the same in a more automated way, I’ve used a programming language called APL – “A Programming Language”. This was formerly known as Iverson Notation, and consists of terse strings of symbols, letters and numbers which can be used to perform remarkably involved calculations on large sets of numbers. It’s also been described as a “write-only language” because although the programmer might understand what the sequent does, the chances are nobody else will. In this respect it resembles a particular style of writing FORTH, another language which appeals strongly to me. However, FORTH can be easily rendered more legible by choosing meaningful names for words, constants and variables, and commenting more extensively within the definitions. The same doesn’t apply to APL. This sequent is taken from the Wikipedia article on the language:

x[⍋x6?40]

It generates six pseudo-random integers between one and forty which are guaranteed not to repeat. If I wanted to do that in BASIC, it would take many lines of code. All reserved words, if that’s the right way to describe them, in APL consist of a single character although many of those are achieved by overstriking – you type a character, backspace back onto it and type another which is then superimposed. APL uses symbols now available in Unicode but usually not in ASCII, so it was necessary to use both a special keyboard and character set for it. One idiosyncracy of APL is that, like the RPN calculator introduced above, it doesn’t have operator precedence. Sequents are just interpreted from right to left, as I recall.

APL used to have a reputation for using up a lot of memory, as it was oriented towards vector processing and this involved dealing with large data sets. This is of course by the standards of the day, and since it actually dates from 1957 and wasn’t originally designed as a computer programming language, this is no longer an issue. A replacement ROM and keyboard plus 64K RAMPack was offered for the ZX81 to enable APL to be used relatively inexpensively (£300 in 1982), but this, like the Arabic language ROM, seems to have been lost to history. On the subject of foreign languages, APL is also a leveller because it isn’t “in English”, unlike most other programming languages. Someone with no written knowledge of the English language would not be at a disadvantage compared to someone who was fluent.

What I’m working up to saying is that this is another possible route to a device which could be called a “Mathbox”. Today’s devices are highly overpowered compared to the kind of machine needed to run APL. They don’t need high resolution graphics, sound, a pointing device or all sorts of other things to do it. All they need is a keyboard, storage, a power supply and a monitor, and their size would be dictated by the size of the interfaces rather than the hardware. A standard keyboard could be used, wireless or USB, although clearly a specialised APL keyboad would be better. These do exist. A micro-SD card could be used for storage and a mini-HDMI socket for display. I envisage this last providing a green on black eighty column display on a 1920 x 1080p monitor, which would in proportion be 80×45, occupying only 3.5Kb of video RAM. Many of the functions could be realised in hardware, perhaps via a FPGA. A limited degree of firmware might be desirable but it wouldn’t need to be very extensive, particularly since the machine code could be optimised for APL. It would also be ludicrously fast because it wouldn’t get bogged down using system resources to run the GUI or multitask.

This, then, is the other, modern Mathbox. It has things in common with the original, such as the compromise between usability and simplicity, and is therefore worthy of the name.

There you go then. And to think this all came out of a simple typo I made yesterday.