Backing the losers
I’ve a tendency to back losers. For instance, Prefab Sprout and The The were my favourite bands when I was in my late teens and both were markedly unsuccessful. In keeping with this trend, I used to have a Jupiter Ace computer and I’ve alluded to this a few times on this blog. Jupiter Cantab, the company which designed and made it, had a total of I think five employees, seemed to work out of a garage and a suburban house in Cambridgeshire and had a turnover of around five thousand pounds. They went bust maybe a year or two after releasing the Ace in October 1982 CE and had to sell off all their old stock, a happy thing for me as it meant I could acquire a new computer. Its hardware was very basic even for late ’82, but its firmware decidedly was not. Unlike practically every other home computer of the time, the Ace used not BASIC but FORTH as its native programming language. Perversely, I considered writing a BASIC interpreter for it for a while but it seemed a bit silly so I didn’t.
FORTH, unlike BASIC as it was at the time, was considered a “proper” programming language. It has two distinctive features. One is that it uses a data structure known as the stack, which is a list of numbers in consecutive locations in memory presented to the user as having a top and a bottom. Words in the FORTH language usually take data off the top of the stack, operate on them and may leave one or more results on it. This makes the syntax like Latin, Turkish or Sanskrit rather than English, since instead of writing “2+2”, you write “2 2 +”, which leaves 4 on the stack. The other feature is that rather than writing single programs the user defines words, so for example to print out the character set one writes:
: CHARSET ( This is the defined word and can be whatever the user wants except for control characters ) 255 32 DO I EMIT LOOP ;
If one then types in CHARSET and presses return (or ENTER) in the Ace’s case), it will print out every character the Ace can display except for the graphics characters whose codes are below 32.
What you see above is the output from the Ace when you type in VLIST, i.e. list every word in its vocabulary. I think there are a total of about 140 words. All of them fit in 8K and show that FORTH is a marvellously compact language compared to BASIC or in fact most other programming languages. For instance, the ZX81’s BASIC has around forty-one words. FORTH on the Ace, and in general, was so fast that the cheapest computer faster than it, the IBM PC, cost more than a hundred times as much. For instance, in order to produce sound it was possible, as well as using the word BEEP, to define a word that counted from 0 to 32767 between vibrations and still produce a respectable note by moving the speaker in and out. By contrast, the ZX81 would take nearly ten minutes to count that high and had no proper sound anyway. This is a somewhat unfair comparison but illustrates the gulf between the speed of this high level language and the other.
Whittling Down The Vocab
As I’ve said, FORTH consists of words defined in terms of other words and therefore some people object to calling code written in it “programs”, preferring “words”. The fact that this definitional process was core to the language immediately made me wonder what would constitute a minimal FORTH system. There are quite a few words easy to dispense with in the vocabulary listed above. For instance, the word SPACES prints whatever number of spaces is indicated by the number on top of the stack, so 32 SPACES prints 32 spaces. However, this word could’ve been defined by the user, thus:
: SPACES 0 DO SPACE LOOP ;
The DO-LOOP control structure counts between the two numbers on top of the stack and executes the code between DO and LOOP the number of times it takes to do that. It can be taken a lot further than that though. SPACE and CR are two words with a similar structure: they print out a character. SPACE unsurprisingly prints out a space. CR does a carriage return. Both are part of the standard ASCII character set and the word for printing the ASCII character indicated by the number on top of the stack is EMIT, so they can be defined thus:
: SPACE 32 EMIT ;
: CR 13 EMIT ;
Hence three words are already shown to be unnecessary to the most minimal FORTH system, and the question arises of what, then, would constitute such a system. What’s the smallest set of words needed to do this?
The Ace had already added quite a lot of words which are not part of standard FORTH-79 and omitted others which are easily defined, examples being all the floating point words, PLOT, BEEP, CLS, VIS and INVIS. Others are trivial to define, such as 2+, 1- and 1+. Others are a bit less obvious: PICK can be used to replace DUP, SWAP and OVER, ROT is a special case of ROLL and so can be defined in those terms. . , that full stop, which prints a number, can be replaced by the number formatting words <#, # and #> . You can continue to whittle it down until you have a very small number of words along with the software which accepts input and definitions, and you’re done. In fact, if you know the hardware well enough you can make it even smaller because, with the Jupiter Ace for example, you know where the display is stored in RAM, how the stacks work (there are actually two because practically all computers implicitly use a stack for subroutines) and when it comes down to it it’s even possible to define words which accept machine code, the numbers computers actually use which represent simple instructions like adding two numbers together or storing one somewhere.
This is about how far I got until recently I managed to join two ideas together that I hadn’t previously managed.
Logic To Numbers
As you probably know, my degrees are in philosophy and my first degree is in the analytical tradition, which is the dominant one in the English-speaking world. It’s very common for philosophy degrees to be rubbished by the general public and even within philosophy, continental and analytical philosophers are often hostile to each other. What may not be appreciated is that much of philosophy actually closely resembles mathematics and by extension, computer science. When the department at my first university was closed down, some of it merged with computing. It also turns out, a little surprisingly, that one of my tutors, Nicholas Measor, was a significant influence on the theory of computing, having helped develop modal mu calculus, which is concerned with completeness of systems and temporal logic. He wrote a paper called “Duality and the Completeness of the Modal mu-Calculus” in the ’90s. This has kind of caused things to fall into place for me.
The Dedekind-Peano axioms for the set of natural numbers are central to the theoretical basis of arithmetic. They go as follows:
- 0 is a natural number.
- For every natural number x, x=x.
- For every natural number x equal to y, y is equal to x.
- For all natural numbers x, y, z, if x=y and y=z then x=z.
- For all a and b, if a is b natural number and a=b then a is a natural number.
- Let S(n) be “the successor of n”. Then for every natural number n, S(n) is a natural number.
- For every natural number S(m) and S(n), if S(m) = S(n) then m=n.
- For every natural number n, S(n)=0 is false.
- If K is a set such that 0 is in K, andfor every natural number n, n being in K implies that S(n) is in K,then K contains every natural number.
You can then go on to define addition, subtraction, multiplication and inequalities. Division is harder to define because this is about integers, and dividing one integer by another may lead to fractions, decimals and so forth. I’ve known about all this since I was an undergraduate but hadn’t given it much thought. It is, incidentally, possible to take this further and define negative, real and presumably imaginary, complex and hypercomplex numbers this way, but the principle of knowing that that’s possible is enough really.
Dyscalculic Programming
If you have a language which can express all of these axioms, you have a system which can do most arithmetic. And this is where I had my epiphany, just last week: you could have a programming language which didn’t initially use numbers at all, because numbers could be defined in terms of these axioms instead. It would be difficult to apply this to FORTH because it uses sixteen bit signed binary integers as its only data type but I don’t think it’s impossible and that would mean there could be a whole earlier and more primitive programming language which doesn’t initially even use numbers. This is still difficult and peculiar because all binary digital computers so far as I know use sequences of zeros and ones, making this rather theoretical. It’s particularly hard to see how to marry this with FORTH.
Proof Assistants
Well, it turns out that such programming languages do exist and that they occupy a kind of nebulous territory between what are apparently called “proof assistants” and programming languages. Some can be used as both, others just as the former. A proof assistant is a language somewhat similar to a programming language but helps the user and computer together arrive at proofs. I have actually used one of these without realising that that was what I was doing, back in the ’80s, where the aforementioned Nicholas Measor wrote an application for the VAX called “Citrus” after the philosopher E. J. Lemmon, who incidentally died 366 days before I was born, whose purpose was to assist the user to prove sequents in symbolic logic. My approach to this was to prove them myself, then just go along to the VAX in the computer basement and type in what I’d proven, although it was admittedly helpful on more than one occasion. While using this, I mused that it was somewhat like a programming language except that it wasn’t imperative but declarative and wondered how one might go about writing something like that. I also considered the concept of expressive adequacy, also known as functional completeness, in this setting in connection once again with FORTH, realising that if the Schaeffer stroke were to be included as a word in FORTH a whole host of definitions could easily provide any bitwise function. It was also borne in upon me that all the logics I’d come across so far were entirely extensional and that it might even be a distinctive feature of logic and mathematics per se that it was completely extensional in form. However, I understand that there are such things as intensional logics, and I suppose modality might be seen in that way although I always conceive of it in terms of possible world semantics and multivalent truth values.
It goes further than that though. I remember noticing that ALGOL 60 lacks input/output facilities in its standard, which made me wonder how the heck it was supposed to be used. However, it turns out that if you are sufficiently strict with yourself you can absolutely minimise I/O and do everything inside the compiler except for some teensy little bit of interaction with the user, and this instinct, if you follow it, is akin to functional programming, a much later idea which enables you to separate the gubbins from how it looks to the user. And there are purely functional languages out there, and at this point I should probably try to express what I mean.
From Metaphysics To Haskell
Functional programming does something rather familiar. Considering the possibility that programming can dispense with numbers as basic features, the emphasis shifts to operators and functions and they become “first-class citizens”. This, weirdly but then again not so weirdly, is exactly what happens in category theory. Haskell is the absolutely paradigmatic functional language, and it’s been said that when you think you’re programming in it, it feels like you’re actually just doing maths. This approach lacks what you’d think would be a crucial feature of operating a computer just as ALGOL 60 can’t actually print or read input, and such things are known as “side-effects” in functional programming. If a function does anything other than take the values, performing an operation on them and returning the result, that’s a side-effect. This makes it easier to formally verify a program, so it’s linked to the mu calculus.
I’ve now mentioned Haskell, which brings me a bit closer to the title of this post and now I’m going to have to talk about monads. Monads are actually something like three different things and it’s now occurred to me that if you put an I at the start rather than an M you get “ionad”, which gives me pause, but this is all quite arcane enough. Firstly, Leibniz’s metaphysics prominently features the concept of monads. In 1714, he brought out a ninety-paragraph book called ‘The Monadology’ setting out his beliefs. It wasn’t originally his idea but he developed it more than others. Leibniz’s monads are indivisible units within reality which has no smaller parts and is entirely self-contained, though not physical like an atom. Anything that changes within it has to arise within itself – “it has to really want to change”. Since monads don’t interact there’s an arrangement called “pre-ordained harmony” where things in each monad are destined to coincide appropriately. I mean, I think this is all very silly and it arises from Descartes and the problem of the interaction of soul and body, but it’s still a useful concept and got adopted into maths, specifically into category theory. In that, it’s notoriously and slightly humorously defined thus: “in concise terms, a monad is a monoid in the category of endofunctors of some fixed category”, and this at least brings us to the functor. A functor is a mapping between categories. Hence two different fields of maths might turn out to have identical relationships between their elements. It’s a little like intersectionality in sociopolitical terms, in that for example racism and sexism are different in detail but are both forms of marginalisation, the difference being that intersectionality is, well, intersectional, meaning that different kinds of oppression do interact, so it isn’t quite the same as either a monad or a functor. Finally, in Haskell a monad is – er. Okay, well at this point I don’t really know what a monad is in Haskell but the general idea behind Haskell was originally that it was safe and also useless because you could never get anything into or out of a program written in it. This isn’t entirely true because it does do work in a thermodynamic sense, so if you take a computer switched on but doing nothing and you run a Haskell program on it which does something, it does get at least slightly warmer. That is, it does stuff to the data already inside it which you can never find out about, but it’s entirely self-contained and does its own thing. So that’s all very nice for it, but rather frustrating, and just now I don’t know how to proceed with this exactly, except that I can recognise that the kind of discipline one places oneself under by not knowing how one is going to get anything on the screen, out of the speakers or off the keyboard, trackball, mouse or joystick has the potential of making one’s programming extremely pure, if that’s the word: operating in an extremely abstract manner.
Home Computing In The 1960s
I do, however, know how to proceed with what I was thinking about earlier. There is some tiny vocabulary of FORTH, perhaps involving a manner of using a language which defines numbers in the terms outlined above, which would be simple enough to run on a simple computer, and this is where things get theoretical, because according to Alan Turing any computer, no matter how simple, can do anything any other computer can do given enough time and resources. This is the principle of the Turing machine. Moreover, the Turing machine can be realised in terms of a language referred to as the Lambda Calculus.
Underneath the user interface of the Jupiter Ace operates the Z80A microprocessor. This has 694 instructions, which is of course quite a bit more than the 140 words of the Jupiter Ace’s primitive vocabulary. Other processors have fewer instructions, but all are “Turing-complete”, meaning that given enough time and memory they can solve any computing problem. In theory a ZX81 could run ChatGPT, just very, very, very slowly and with a very big RAM pack. So the question arises of how far down you can strip a processor before it stops working, and this is the nub of where I’m going, because actually you can do it with a single instruction, and there are even choices as to which instruction’s best.
The easiest one to conceive of is “subtract and branch if negative”. This is a machine which has two operands in memory. One of them is a number, which it subtracts from the number it already has in mind. If the result of this number turns out to be negative, it looks at the other operand and jumps to the number indicated in the memory. Otherwise it just goes to the next memory address and repeats the operation. It would also save space on a chip if the values are stored in memory rather than the chip itself, so I propose that the program counter and accumulator, i.e. where the data are stored, are in the main memory.
Moreover, I’ve been talking about a single instruction but in fact that actual instruction can be implied. It doesn’t need to exist explicitly in the object code of the memory. Instead it can be assumed and the processor will do what that instruction demands anyway, so in a way this is a zero instruction set CPU.
What this very simple computer does is run a program that emulates a somewhat more complex machine which runs the stripped down FORTH natively. This is where it gets interesting, because the very earliest microprocessors, the 4004 and 4040, needed more transistors than this machine would, and it would’ve been entirely feasible to put this on a single silicon wafer in 1970. This is a microcomputer like those found in the early ’80s for which the technology and concepts existed before the Beatles split up.
This is of course a bit of a mind game, though not an entirely useless one. What I’ve basically discovered is that I already have a lot of what I need to know to do this task, but it’s on a level which is hard to apply to the problem in hand. But it is there. This is the way in.





