The Way In

Backing the losers

I’ve a tendency to back losers. For instance, Prefab Sprout and The The were my favourite bands when I was in my late teens and both were markedly unsuccessful. In keeping with this trend, I used to have a Jupiter Ace computer and I’ve alluded to this a few times on this blog. Jupiter Cantab, the company which designed and made it, had a total of I think five employees, seemed to work out of a garage and a suburban house in Cambridgeshire and had a turnover of around five thousand pounds. They went bust maybe a year or two after releasing the Ace in October 1982 CE and had to sell off all their old stock, a happy thing for me as it meant I could acquire a new computer. Its hardware was very basic even for late ’82, but its firmware decidedly was not. Unlike practically every other home computer of the time, the Ace used not BASIC but FORTH as its native programming language. Perversely, I considered writing a BASIC interpreter for it for a while but it seemed a bit silly so I didn’t.

FORTH, unlike BASIC as it was at the time, was considered a “proper” programming language. It has two distinctive features. One is that it uses a data structure known as the stack, which is a list of numbers in consecutive locations in memory presented to the user as having a top and a bottom. Words in the FORTH language usually take data off the top of the stack, operate on them and may leave one or more results on it. This makes the syntax like Latin, Turkish or Sanskrit rather than English, since instead of writing “2+2”, you write “2 2 +”, which leaves 4 on the stack. The other feature is that rather than writing single programs the user defines words, so for example to print out the character set one writes:
: CHARSET ( This is the defined word and can be whatever the user wants except for control characters ) 255 32 DO I EMIT LOOP ;
If one then types in CHARSET and presses return (or ENTER) in the Ace’s case), it will print out every character the Ace can display except for the graphics characters whose codes are below 32.

What you see above is the output from the Ace when you type in VLIST, i.e. list every word in its vocabulary. I think there are a total of about 140 words. All of them fit in 8K and show that FORTH is a marvellously compact language compared to BASIC or in fact most other programming languages. For instance, the ZX81’s BASIC has around forty-one words. FORTH on the Ace, and in general, was so fast that the cheapest computer faster than it, the IBM PC, cost more than a hundred times as much. For instance, in order to produce sound it was possible, as well as using the word BEEP, to define a word that counted from 0 to 32767 between vibrations and still produce a respectable note by moving the speaker in and out. By contrast, the ZX81 would take nearly ten minutes to count that high and had no proper sound anyway. This is a somewhat unfair comparison but illustrates the gulf between the speed of this high level language and the other.

Whittling Down The Vocab

As I’ve said, FORTH consists of words defined in terms of other words and therefore some people object to calling code written in it “programs”, preferring “words”. The fact that this definitional process was core to the language immediately made me wonder what would constitute a minimal FORTH system. There are quite a few words easy to dispense with in the vocabulary listed above. For instance, the word SPACES prints whatever number of spaces is indicated by the number on top of the stack, so 32 SPACES prints 32 spaces. However, this word could’ve been defined by the user, thus:
: SPACES 0 DO SPACE LOOP ;

The DO-LOOP control structure counts between the two numbers on top of the stack and executes the code between DO and LOOP the number of times it takes to do that. It can be taken a lot further than that though. SPACE and CR are two words with a similar structure: they print out a character. SPACE unsurprisingly prints out a space. CR does a carriage return. Both are part of the standard ASCII character set and the word for printing the ASCII character indicated by the number on top of the stack is EMIT, so they can be defined thus:
: SPACE 32 EMIT ;

: CR 13 EMIT ;

Hence three words are already shown to be unnecessary to the most minimal FORTH system, and the question arises of what, then, would constitute such a system. What’s the smallest set of words needed to do this?

The Ace had already added quite a lot of words which are not part of standard FORTH-79 and omitted others which are easily defined, examples being all the floating point words, PLOT, BEEP, CLS, VIS and INVIS. Others are trivial to define, such as 2+, 1- and 1+. Others are a bit less obvious: PICK can be used to replace DUP, SWAP and OVER, ROT is a special case of ROLL and so can be defined in those terms. . , that full stop, which prints a number, can be replaced by the number formatting words <#, # and #> . You can continue to whittle it down until you have a very small number of words along with the software which accepts input and definitions, and you’re done. In fact, if you know the hardware well enough you can make it even smaller because, with the Jupiter Ace for example, you know where the display is stored in RAM, how the stacks work (there are actually two because practically all computers implicitly use a stack for subroutines) and when it comes down to it it’s even possible to define words which accept machine code, the numbers computers actually use which represent simple instructions like adding two numbers together or storing one somewhere.

This is about how far I got until recently I managed to join two ideas together that I hadn’t previously managed.

Logic To Numbers

As you probably know, my degrees are in philosophy and my first degree is in the analytical tradition, which is the dominant one in the English-speaking world. It’s very common for philosophy degrees to be rubbished by the general public and even within philosophy, continental and analytical philosophers are often hostile to each other. What may not be appreciated is that much of philosophy actually closely resembles mathematics and by extension, computer science. When the department at my first university was closed down, some of it merged with computing. It also turns out, a little surprisingly, that one of my tutors, Nicholas Measor, was a significant influence on the theory of computing, having helped develop modal mu calculus, which is concerned with completeness of systems and temporal logic. He wrote a paper called “Duality and the Completeness of the Modal mu-Calculus” in the ’90s. This has kind of caused things to fall into place for me.

The Dedekind-Peano axioms for the set of natural numbers are central to the theoretical basis of arithmetic. They go as follows:

  1. 0 is a natural number.
  2. For every natural number x, x=x.
  3. For every natural number x equal to y, y is equal to x.
  4. For all natural numbers x, y, z, if x=y and y=z then x=z.
  5. For all a and b, if a is b natural number and a=b then a is a natural number.
  6. Let S(n) be “the successor of n”. Then for every natural number n, S(n) is a natural number.
  7. For every natural number S(m) and S(n), if S(m) = S(n) then m=n.
  8. For every natural number n, S(n)=0 is false.
  9. If K is a set such that 0 is in K, andfor every natural number nn being in K implies that S(n) is in K,then K contains every natural number.

You can then go on to define addition, subtraction, multiplication and inequalities. Division is harder to define because this is about integers, and dividing one integer by another may lead to fractions, decimals and so forth. I’ve known about all this since I was an undergraduate but hadn’t given it much thought. It is, incidentally, possible to take this further and define negative, real and presumably imaginary, complex and hypercomplex numbers this way, but the principle of knowing that that’s possible is enough really.

Dyscalculic Programming

If you have a language which can express all of these axioms, you have a system which can do most arithmetic. And this is where I had my epiphany, just last week: you could have a programming language which didn’t initially use numbers at all, because numbers could be defined in terms of these axioms instead. It would be difficult to apply this to FORTH because it uses sixteen bit signed binary integers as its only data type but I don’t think it’s impossible and that would mean there could be a whole earlier and more primitive programming language which doesn’t initially even use numbers. This is still difficult and peculiar because all binary digital computers so far as I know use sequences of zeros and ones, making this rather theoretical. It’s particularly hard to see how to marry this with FORTH.

Proof Assistants

Well, it turns out that such programming languages do exist and that they occupy a kind of nebulous territory between what are apparently called “proof assistants” and programming languages. Some can be used as both, others just as the former. A proof assistant is a language somewhat similar to a programming language but helps the user and computer together arrive at proofs. I have actually used one of these without realising that that was what I was doing, back in the ’80s, where the aforementioned Nicholas Measor wrote an application for the VAX called “Citrus” after the philosopher E. J. Lemmon, who incidentally died 366 days before I was born, whose purpose was to assist the user to prove sequents in symbolic logic. My approach to this was to prove them myself, then just go along to the VAX in the computer basement and type in what I’d proven, although it was admittedly helpful on more than one occasion. While using this, I mused that it was somewhat like a programming language except that it wasn’t imperative but declarative and wondered how one might go about writing something like that. I also considered the concept of expressive adequacy, also known as functional completeness, in this setting in connection once again with FORTH, realising that if the Schaeffer stroke were to be included as a word in FORTH a whole host of definitions could easily provide any bitwise function. It was also borne in upon me that all the logics I’d come across so far were entirely extensional and that it might even be a distinctive feature of logic and mathematics per se that it was completely extensional in form. However, I understand that there are such things as intensional logics, and I suppose modality might be seen in that way although I always conceive of it in terms of possible world semantics and multivalent truth values.

It goes further than that though. I remember noticing that ALGOL 60 lacks input/output facilities in its standard, which made me wonder how the heck it was supposed to be used. However, it turns out that if you are sufficiently strict with yourself you can absolutely minimise I/O and do everything inside the compiler except for some teensy little bit of interaction with the user, and this instinct, if you follow it, is akin to functional programming, a much later idea which enables you to separate the gubbins from how it looks to the user. And there are purely functional languages out there, and at this point I should probably try to express what I mean.

From Metaphysics To Haskell

Functional programming does something rather familiar. Considering the possibility that programming can dispense with numbers as basic features, the emphasis shifts to operators and functions and they become “first-class citizens”. This, weirdly but then again not so weirdly, is exactly what happens in category theory. Haskell is the absolutely paradigmatic functional language, and it’s been said that when you think you’re programming in it, it feels like you’re actually just doing maths. This approach lacks what you’d think would be a crucial feature of operating a computer just as ALGOL 60 can’t actually print or read input, and such things are known as “side-effects” in functional programming. If a function does anything other than take the values, performing an operation on them and returning the result, that’s a side-effect. This makes it easier to formally verify a program, so it’s linked to the mu calculus.

I’ve now mentioned Haskell, which brings me a bit closer to the title of this post and now I’m going to have to talk about monads. Monads are actually something like three different things and it’s now occurred to me that if you put an I at the start rather than an M you get “ionad”, which gives me pause, but this is all quite arcane enough. Firstly, Leibniz’s metaphysics prominently features the concept of monads. In 1714, he brought out a ninety-paragraph book called ‘The Monadology’ setting out his beliefs. It wasn’t originally his idea but he developed it more than others. Leibniz’s monads are indivisible units within reality which has no smaller parts and is entirely self-contained, though not physical like an atom. Anything that changes within it has to arise within itself – “it has to really want to change”. Since monads don’t interact there’s an arrangement called “pre-ordained harmony” where things in each monad are destined to coincide appropriately. I mean, I think this is all very silly and it arises from Descartes and the problem of the interaction of soul and body, but it’s still a useful concept and got adopted into maths, specifically into category theory. In that, it’s notoriously and slightly humorously defined thus: “in concise terms, a monad is a monoid in the category of endofunctors of some fixed category”, and this at least brings us to the functor. A functor is a mapping between categories. Hence two different fields of maths might turn out to have identical relationships between their elements. It’s a little like intersectionality in sociopolitical terms, in that for example racism and sexism are different in detail but are both forms of marginalisation, the difference being that intersectionality is, well, intersectional, meaning that different kinds of oppression do interact, so it isn’t quite the same as either a monad or a functor. Finally, in Haskell a monad is – er. Okay, well at this point I don’t really know what a monad is in Haskell but the general idea behind Haskell was originally that it was safe and also useless because you could never get anything into or out of a program written in it. This isn’t entirely true because it does do work in a thermodynamic sense, so if you take a computer switched on but doing nothing and you run a Haskell program on it which does something, it does get at least slightly warmer. That is, it does stuff to the data already inside it which you can never find out about, but it’s entirely self-contained and does its own thing. So that’s all very nice for it, but rather frustrating, and just now I don’t know how to proceed with this exactly, except that I can recognise that the kind of discipline one places oneself under by not knowing how one is going to get anything on the screen, out of the speakers or off the keyboard, trackball, mouse or joystick has the potential of making one’s programming extremely pure, if that’s the word: operating in an extremely abstract manner.

Home Computing In The 1960s

I do, however, know how to proceed with what I was thinking about earlier. There is some tiny vocabulary of FORTH, perhaps involving a manner of using a language which defines numbers in the terms outlined above, which would be simple enough to run on a simple computer, and this is where things get theoretical, because according to Alan Turing any computer, no matter how simple, can do anything any other computer can do given enough time and resources. This is the principle of the Turing machine. Moreover, the Turing machine can be realised in terms of a language referred to as the Lambda Calculus.

Underneath the user interface of the Jupiter Ace operates the Z80A microprocessor. This has 694 instructions, which is of course quite a bit more than the 140 words of the Jupiter Ace’s primitive vocabulary. Other processors have fewer instructions, but all are “Turing-complete”, meaning that given enough time and memory they can solve any computing problem. In theory a ZX81 could run ChatGPT, just very, very, very slowly and with a very big RAM pack. So the question arises of how far down you can strip a processor before it stops working, and this is the nub of where I’m going, because actually you can do it with a single instruction, and there are even choices as to which instruction’s best.

The easiest one to conceive of is “subtract and branch if negative”. This is a machine which has two operands in memory. One of them is a number, which it subtracts from the number it already has in mind. If the result of this number turns out to be negative, it looks at the other operand and jumps to the number indicated in the memory. Otherwise it just goes to the next memory address and repeats the operation. It would also save space on a chip if the values are stored in memory rather than the chip itself, so I propose that the program counter and accumulator, i.e. where the data are stored, are in the main memory.

Moreover, I’ve been talking about a single instruction but in fact that actual instruction can be implied. It doesn’t need to exist explicitly in the object code of the memory. Instead it can be assumed and the processor will do what that instruction demands anyway, so in a way this is a zero instruction set CPU.

What this very simple computer does is run a program that emulates a somewhat more complex machine which runs the stripped down FORTH natively. This is where it gets interesting, because the very earliest microprocessors, the 4004 and 4040, needed more transistors than this machine would, and it would’ve been entirely feasible to put this on a single silicon wafer in 1970. This is a microcomputer like those found in the early ’80s for which the technology and concepts existed before the Beatles split up.

This is of course a bit of a mind game, though not an entirely useless one. What I’ve basically discovered is that I already have a lot of what I need to know to do this task, but it’s on a level which is hard to apply to the problem in hand. But it is there. This is the way in.

Mathbox?

This is a single-chip pocket calculator from 1976. It’s probably the most basic actual calculator in existence or ever built, because of how it works and its very low specs. It’s integer-only, has only six digits on its display and uses RPN – Reverse Polish Notation. This means you put in the numbers first, hence the ENTER key, then the operation, so for example 4-3 would be typed in as 4 <ENT>3<ENT>-<ENT>. There is notably no “=” key because it isn’t necessary. I don’t know if you could put in a series of numbers and have the machine work its way through them, but two advantages of this method are that you don’t need to worry about priority (BODMAS – Brackets, Division, Multiplication, Addition, Subtraction – the order in which you work out the calculation) and you don’t need brackets because of the absence of priority.

This calculator is so simple that I could almost design it myself. However, since I set my aim even lower, if I designed it, it would be octal or hex rather than decimal. I remember this calculator being reviewed in ‘Which?’ magazine and thinking two things:

  1. It’s crap, and
  2. I could do that.

The fact I can remember that review and had that thought suggests it was on the market for a number of years.

The big difficulty would be how to implement division. Division is often achieved by shifting and adding if I recall correctly, just as multiplication is achieved by that method but in the opposite direction. Multiplication is shifting to the left, i.e. to more significant bits, and division shifts to the right, i.e. to less significant ones. If a bit got shifted past the most significant bit in the register holding the result, it would activate the overflow flag and the letter “E” would presumably show on the display.

This “Mathbox” has something in common with the ZX81. Although the home computer in question was very primitive, it was also technologically advanced in terms of hardware because in its basic form it had only five chips, as opposed to the ZX80 which had twenty-one. A ZX80 is upgradable to a ZX81 simply by changing the system software, so all those chips are functionally equivalent in the two machines. However, this is only possible because of the low system specs. It has 1K of RAM, so it uses 2 x 2114 static RAM chips, each storing four bits, and all the discrete logic (integrated circuits) are bundled together into the Uncommitted Logic Array. It also uses system software to generate the screen for three-quarters of the time, so it’s very slow, although the 1802-based COMX 35 which came out two years later was even slower, probably because the 1802 CPU is bloody weird. So there’s a trade-off between what can be done cheaply with the technology available at the time and the low specifications which make that possible at all, driving down manufacturing costs. The Mathbox has a single integrated circuit inside which isn’t even set in a neat rectangular or square package but is just in a blob, which apparently is also the case in the Furby, but this makes it cheap. The only other component was a resistor.

One thing I don’t understand about old pocket calculators is whether they performed more advanced mathematical functions in hardware or software. Integer addition and subtraction is straightforward and can easily be done with a few logic gates plus the convention that half of the numbers available are understood to be representations of negative numbers, which allows subtraction to be carried out by complementary addition. This is also achievable by inverting the bit pattern. Likewise, if there’s no exponent notation or representation involved, “floating” point numbers can be added and subtracted as if they’re fixed point or integer numbers. A fixed point number is something like a monetary value in pounds and pence or dollars and cents: it doesn’t really have a decimal point but you can just pretend it has. You might think you’re working out £89.95 plus fifteen percent VAT (£13.49 rounded down, £13.50 rounded up) but really you’re just adding 8995 and 1350.

One odd thing about this Mathbox, and in fact many calculators, is that they seem to use Binary-Coded Decimal (BCD) rather than actual binary, which is literally a bit of a waste of space in some ways. I shall explain. In binary, the integer 170 is 10101010 and 85 is 01010101. These are easy to remember because they alternate ones and zeros and are complementary. In BCD, each nybble (four-bit value) is confined to the integers between zero and nine, so one is 0001 and so on up to nine, which is 1001. This makes 85 10000101 in this system, and also gives 170 twelve bits as it would be 000101110000. This calculator has six figures, meaning it can represent integers up to 999 999, which is twenty-four bits of BCD, but two dozen bits can represent values up to 33 554 431, so the power of the device is somewhat curtailed. This curtailment also makes the design of the hardware more complex in terms of actual calculations, but easier in terms of producing a display. With an eight-figure display the discrepancy is greater, and with a ten-digit one greater still. A similar handicap also occurs with the use of standard form, which tends to run up to 99 places either side of the decimal point, when it could go up to 255. Moreover, using hex or octal wouldn’t induce this problem, so although it might be harder to read, the calculator would be more powerful and somewhat faster.

That’s the real Mathbox then, and it is the size of a matchbox. Coincidentally, the ZX81 was nicknamed a “matchbox”, because it was oddly small for the time and also rather featureless. Such devices are much more common today, such as the Raspberry π and the relatively minute stick PCs which come along from time to time. There’s also the Intel Compute Stick and the Gigabyte Brix GB-BACE-3160, whose 56.1 mm x 107.6 mm x 114.4 mm dimensions I find oddly appealing.

Although they can be said to be boxes which do maths, they don’t visibly do what most people tend to think of as maths. For instance, they decompress video imagery and I imagine have some kind of 3-D hardware acceleration, but although they could do so via an application, they generally wouldn’t respond to someone typing in formulas and equations on a keyboard.

I’d like to talk about two experiences of mine which probably say more about me than anything else, but which illustrate my preferences quite well. In 1968, a statistical software package was released called SPSS, which aimed to automate batch processing of statistical data. When I studied statistics as part of my first degree in ’85/’86, it was fairly usable, with a command line and a strong connection to the equations I held in my head for many different statistical tests. I then dropped out of psychology, mainly in fact due to the emphasis it placed on statistics which still seems totally wrong-headed to me, but it is exceedingly useful to understand how statistics works. I went back to studying psychology in the ‘noughties, when I attempted a psychology degree with the Open University, and found that SPSS was kind of completely sealed off to me. In an attempt to make it more user-friendly, it now resembled nothing so much as a spreadsheet, there wasn’t an obvious way to input equations and I had the same problem with it as I do with spreadsheets. I still feel that if the user is brought up on using the GUI version of SPSS that exists today, she will be distanced from what she’s actually doing when she attempts to carry out statistical operations on data, and it’s vitally important that we don’t get distanced from statistics, not least for civic reasons.

And then there are spreadsheets themselves. I struggle with these because I perceive them as having the same kind of distancing effect as a GUI. I feel like I don’t know what I’m doing when I use a spreadsheet, and that the data are trapped inside the computer in a little virtual box.

Taking this outside the realm of computers for a second, I have once ridden a scooter and once ridden a motorbike, both on private ground, and I felt the same disturbing abstraction from physical reality that I feel with this. This is even more pronounced when driving cars, which I can just about manage in terms of the technicalities of driving but are really not a good idea for me. A push bike has a connection with the ground, which is of course reduced by the artifice of gears, but if you stop pedalling on a flat surface you slow down, if you start pedalling you accelerate, and so on. You can also see everything happening. For this reason I’m not even keen on cable brakes because they feel like magic to me – I can’t see what’s going on inside them and I still don’t get them. I’d also link this to I-Tal, but not yet.

Getting back to computers, there’s a funny essay about this called ‘Real Programmers Don’t Use PASCAL‘, written in about 1980 as a parody of ‘Real Men Don’t Eat Quiche’.

As you probably know, I’m technically a sole trader and have had the need to crunch numbers, and have also done so in research projects I’ve pursued for CPD (Continuing Professional Development). Most of this I’ve done using pen and paper because that way I feel more in control. When I’ve felt the need to do the same in a more automated way, I’ve used a programming language called APL – “A Programming Language”. This was formerly known as Iverson Notation, and consists of terse strings of symbols, letters and numbers which can be used to perform remarkably involved calculations on large sets of numbers. It’s also been described as a “write-only language” because although the programmer might understand what the sequent does, the chances are nobody else will. In this respect it resembles a particular style of writing FORTH, another language which appeals strongly to me. However, FORTH can be easily rendered more legible by choosing meaningful names for words, constants and variables, and commenting more extensively within the definitions. The same doesn’t apply to APL. This sequent is taken from the Wikipedia article on the language:

x[⍋x6?40]

It generates six pseudo-random integers between one and forty which are guaranteed not to repeat. If I wanted to do that in BASIC, it would take many lines of code. All reserved words, if that’s the right way to describe them, in APL consist of a single character although many of those are achieved by overstriking – you type a character, backspace back onto it and type another which is then superimposed. APL uses symbols now available in Unicode but usually not in ASCII, so it was necessary to use both a special keyboard and character set for it. One idiosyncracy of APL is that, like the RPN calculator introduced above, it doesn’t have operator precedence. Sequents are just interpreted from right to left, as I recall.

APL used to have a reputation for using up a lot of memory, as it was oriented towards vector processing and this involved dealing with large data sets. This is of course by the standards of the day, and since it actually dates from 1957 and wasn’t originally designed as a computer programming language, this is no longer an issue. A replacement ROM and keyboard plus 64K RAMPack was offered for the ZX81 to enable APL to be used relatively inexpensively (£300 in 1982), but this, like the Arabic language ROM, seems to have been lost to history. On the subject of foreign languages, APL is also a leveller because it isn’t “in English”, unlike most other programming languages. Someone with no written knowledge of the English language would not be at a disadvantage compared to someone who was fluent.

What I’m working up to saying is that this is another possible route to a device which could be called a “Mathbox”. Today’s devices are highly overpowered compared to the kind of machine needed to run APL. They don’t need high resolution graphics, sound, a pointing device or all sorts of other things to do it. All they need is a keyboard, storage, a power supply and a monitor, and their size would be dictated by the size of the interfaces rather than the hardware. A standard keyboard could be used, wireless or USB, although clearly a specialised APL keyboad would be better. These do exist. A micro-SD card could be used for storage and a mini-HDMI socket for display. I envisage this last providing a green on black eighty column display on a 1920 x 1080p monitor, which would in proportion be 80×45, occupying only 3.5Kb of video RAM. Many of the functions could be realised in hardware, perhaps via a FPGA. A limited degree of firmware might be desirable but it wouldn’t need to be very extensive, particularly since the machine code could be optimised for APL. It would also be ludicrously fast because it wouldn’t get bogged down using system resources to run the GUI or multitask.

This, then, is the other, modern Mathbox. It has things in common with the original, such as the compromise between usability and simplicity, and is therefore worthy of the name.

There you go then. And to think this all came out of a simple typo I made yesterday.



Subtract One And Branch

In case you’re wondering why I’m not talking about the murder of Sarah Everard and the fallout from that, I try to avoid covering gender politics on this blog because I have another blog devoted to that alone, and I also tend to avoid saying anything (see Why I’ve Gone Quiet) because cis women need to be heard more than I do on these issues and I don’t want to shout them down. In fact the other blog is quite infrequent for the same reason. It doesn’t mean I don’t care or don’t consider it important.

You are in a living room, circa 1950. In one corner sits a bakelite box, lit up from within behind a hessian grill with a design like rays from a rising Sun at the front, and a rectangular panel carrying names like “HILVERSUM” and “LUXEMBOURG”. In the other corner sits another somewhat similar bakelite box with a series of lamps at the top and a slot at the bottom, into which cards with rectangular windows punched out of them can be inserted. There is a row of push buttons above it. This is the domestic computer. It didn’t exist, but could it have?

In this post I mentioned that some say the last computer which could be “completely understood” was the BBC Micro, released in 1981, and expressed my doubt that this was true because apart from memory size, even an upgraded IBM PC would probably be about as sophisticated. However, this got me to thinking about a tendency I have to minimalise in terms of IT, and wondering how far it could be taken and still leave useful hardware, and that also brings up the question of what counts as useful.

In this other post, I described a fictional situation where, instead of the Apple ][ being one of the first mass market microcomputers, Sinclair ended up bringing out the ZX Spectrum six years early, that is, just after its Z80 CPU was released. That isn’t quite what I said: read the post if you see what I mean. The point is that the specifications of the two computers are very similar and if the ULA in the Speccy is realised via discrete logic (smaller and simpler integrated circuits), all the hardware necessary to construct a functional equivalent to it except for the slightly faster microprocessor was available already, and if someone had had the idea, they could’ve made one. Then a similar mind game arises: how much further back is it possible to go and still manufacture a reasonaly small but workable computer? Could you, for example, even push it back to the valve era?

Apologies for already having said things which sound off-putting and obscurantist. As an antidote to this, I want to explain, just in case you don’t know, how digital computers work. Basically there’s a chip which fetches codes and data from memory. The codes tell the computer what to do with the data and the chip can make decisions about where to look next from the results of what it’s done with those data. It’s kind of like a calculator on wheels which can read instructions from the path its travelling on.

There are two basic design philosophies taken with microprocessors, which for the sake of brevity I’m going to call Central Processing Units, or CPUs. One is to get it to do lots of sophisticated things with a large number of instructions. This is called the CISC approach – Complex Instruction Set Computer. The CPUs in most Windows computers are CISCs. The other is to get it to do just a few things with a small number of instructions, and this is called a RISC – Reduced Instruction Set Computer. Chips in tablets and mobile phones are usually RISCs. In particular they’re probably going to be using one of the CPUs in the ARM series, the Acorn RISC Machine. The Pentium and the like are kind of descended from the very early Intel 8080, which had a large number of instructions, but the ARMs are the spiritual successors of the 6502.

The 6502, which was the CPU used in the Apple ][, BBC Micro and the similar 6510 used in the Commodore 64, were designed with simplicity in mind, and this involved using far fewer transistors than the Z80(A) found in the Spectrum. The latter had 694 instructions, many of which could only be accessed by prefix bytes acting a bit like shift keys on a typewriter. By contrast, not taking address modes into consideration, the 6502 only had fifty-six. Processors usually have single or double cells of memory used to store numbers to work on or otherwise use internally called registers. The Z80 had nineteen, I think, but the 6502 only had six. To be honest, I was always a lot more comfortable using the Z80 than the 6502, and as you may be aware this was a constant source of debate between nerds in the early ’80s, and to some extent still is. The 6502 always seemed really fiddly to me. However, it was also faster than its competitor, and comprised fewer transistors because of the corners which had been cut. Unlike the Z80, it used pipelining: it interpreted the next instruction while executing the previous one, but it was also faster because it used fewer instructions. It needed less time to work out what to do because the list of instructions was much shorter.

Research undertaken in the 1980s counted the proportions of instructions used in software, and it was found, unsurprisingly, that for all processors a small part of the instruction set was used for the majority of the time. This is of course the log-normal distribution which applies to a wide range of phenomena, such as the fact that eighty percent of the income from my clients used to be from twenty percent of them, and that one herb out of five is used in four out of five prescriptions, and so on. In the case of CPUs this meant that if the jobs done by the majority of instructions could be performed using the minority of often-used instructions, the others could be dispensed with at the cost of taking up more memory. An artificial example of this is that multiplying six by seven could be achieved by adding seven to itself six times, and therefore that an integer multiply instruction isn’t strictly necessary. Of course, this process of performing six additions could take longer than the multiplication would in the first place, but choosing the instructions carefully would lead to an optimal set, which would all be decoded faster due to the smaller variety.

The question therefore arises of the lowest possible number of instructions any CPU could have and still be Turing-complete. A Turing-complete machine is a computer which can do anything a theoretical machine Turing thought of in 1936 which consisted of an infinitely long strip of tape and a read-write head, whose behaviour can be influenced by the symbol underneath the head. It more or less amounts to a machine which, given enough time, can do anything any digital computer could do. My description of the calculator on wheels I mentioned above is effectively a Turing machine. What it means, for example, is that you could get a ZX80, rewire it a bit and have it do anything today’s most expensive and up-to-date PC could do, but of course very, very slowly. But it can’t be just any digital computer-like machine. For instance, a non-programmable calculator can’t do it, nor can a digital watch. The question is, as I’ve said, which instructions would be needed to make this possible.

There used to be a very simple programming language used in schools called CESIL (I created and wrote most of that article incidentally – there are various deeply obscure and useless articles on Wikipedia which were originally mine). As you can see from the link, there are a total of fourteen instructions, several of which are just there for convenience to enable input and output such as LINE and PRINT. It’s a much smaller number than the 6502 but also rather artificial, since in reality it would be necessary to come up with code for proper input and output. Another aspect of redundancy is the fact that there are three JUMP instructions: JUMP, JINEG and JIZERO – unconditional jump, jump if negative and jump if zero. All that’s really needed there is JINEG – jump if negative. It can be ensured that the content of the accumulator register is negative in advance, in which case JINEG can perform the same function as an unconditional jump, and since zero is only one greater than a negative integer, simply subtracting one and then performing JINEG is the same as JIZERO. Hence the number of instructions is already down to six, since multiplication and division are achievable by other means, although as constituted CESIL would then be too limited to allow for input and output because it only uses named variables. It would be possible, though, to ensure that one variable was a dummy which was in fact an interface with the outside world via some peripherals.

It has in fact been determined that a machine can be Turing-complete even if it has only one instruction, namely “Subtract One And Branch If Negative”. Incidentally, with special arrangements it can even be done with Intel’s “MOV” instruction, but it needs extensive look up tables to do this and also requires special address modes. Hence there can be a SISC – Single Instruction Set Computer. It isn’t terribly practical because, for example, to add two numbers in the hundreds this instruction would need to be executed hundreds of times. It depends, of course, on the nature of the data in the memory, and this has an interesting consequence. In a sense, this is a zero instruction set computer (which is actually something different officially) because it can be assumed that every location pointed to by the program counter has an implicit “SOBIN” instruction. The flow and activities of the program are actually determined by a memory location field which tells the CPU where to go next. This means that the initial value of the data in the accumulator needs to be fixed, probably as an entirely set series of bits equivalent to the word length.

This would all make for a very difficult machine to work with, but it is possible. It would be very inefficient and slow, but it would also reduce the number of logic gates needed to a bare minimum. It would simply consist of an arithmetic unit which would in fact be a binary adder because of two’s complement. Negative integers can be represented simply by treating the second half of the interval between zero and two raised to the power of the word length as if those numbers are negative.

This is a one-bit binary adder:

It works like this: 0+0=0 with no carry, 0+1=1 with no carry, 1+0=1 with no carry and 1+1=0 with carry. The symbols above, by the way, are XOR – “either – or -” at the top, AND in the middle and OR (actually AND/OR) at bottom right. These conjunctions describe the inputs and outputs where one, i.e. a signal, is truth and zero, i.e. no signal, is falsehood. Incidentally, you probably realise this but this logic is central to analytical philosophy, meaning that there are close links between philosophy, mathematics and computer science and if you can understand logic you can also understand the basis of much of the other two.

Most processors would do all this in parallel – an eight-bit processor would have eight of these devices lined up, enabling any two integers between zero and two hundred and fifty-five to be added and any two integers between -127 and 128 to be added or subtracted, either way in one go. But this could also be done in series if the carry is stored and the serial format is converted to and from parallel to communicate outside the processor. This reduces the transistor count further. All logic gates can be implemented by NAND gates, or if preferable by NOR gates. A NAND gate can be implemented by two transistors and three resistors and a NOR gate with the same components connected differently. Transistors could also be replaced by valves or relays, and it would also be possible to reuse some of the components in a similar manner to the serial arrangement with the adder, although I suspect the point would come when the multiplication of other components would mean this was no longer worthwhile.

I can’t really be precise as to the exact number of components required, but clearly the number is very low. Hence some kind of computer of a reasonable size could been implemented using valves or relays, or discrete transistors. ROM could be realised via fuses, with blown fuses as zeros and working fuses as ones, and RAM by capacitors, Williams tubes, which are cathode ray tubes with plates to feed the static charge back into the cathodes, or with rings threaded onto wires, all methods which have been used in the past. Extra parts would of course be needed but it is nonetheless feasible to build such a computer.

I feel like I’m on the brink of being able to draw a logic diagram of this device, but in practical terms it’s beyond me. I haven’t decided on input and output here, but that could be achieve via arrays of switches and flashing lights. And if this could be done with 1950s technology, who knows what the limit would be?