System level thinking

Unix is widely recognised as an elegant system.  An important part of this are the traits shared by the shell, shell scripts and compiled programs.  The command line and shell scripts use the same language.  "Proper" progams and shell scripts behave the same way from the users' point of view.  But I think we could do better still.

First, let's recap what we have.  We have several scripting languages with interactive modes, very similar to the shell.  At the same time, these languages have the features necessary for writing larger programs.  However, they have one crucial difference to "traditional" compiled languages—they have the toplevel.

Entry points

Some programming languages have few reserved words (C has 32), some have many (C++11 has 84, COBOL has approximately 400).  But most of them have main or some variation thereof in the list.  The obvious reason is, this denotes the entry point into the program.  How could it be otherwise?

For a start, it could be the way scripting languages deal with the question—by evaluating any expressions found in the toplevel, outside any functions.  Another possibility is the way libraries deal with the question.  Normally, multiple functions are available, and the user of the library can freely choose which one(s) to invoke.

Namespaces and virtual memory

At first sight, these two topics have nothing to do with each other.  But they are both implemented in the file system, as well as in the programming languages.  Directories mainly exist for the purpose of creating namespaces.  Automated swapping by the OS gives access to nearly unlimited amounts of "memory", only paralleled by manual handling of files.

Parallel conceptual spaces

The above examples are the result of there being two separate spaces, one on the level of the OS, the other on the programming language.  Programs and functions; variables and files.  Let me invite you to a thought experiment where we unify them.

The first step is very easy: the shell and existing scripting languages are already very similar, and there would be no conceptual difficulty in merging them.  However, this allows the less trivial pairs to be folded.  We could have a tree of namespaces no different from the file system.  In the place of programs or libraries, we could have collections of functions calling each other.  And instead of hard-to-handle files, we could have datastructures written to disk in the same way they are swapped when memory gets tight.  They would be just another variable that just so happens to reside on the disk, not in RAM.

Incidentally, this means the difference between disk partitions for use by the file system, and the swap partition could merge, too.


If our command line also serves as the programming environment, it is very important to get it right.  Above and beyond normally desirable language features, we also need to make it interactive.  Probably the best way to do it is to take the interpreter apart, into three functions: read, eval and print, running in an infinite loop.  Hence the name for this solution, read-eval-print loop.

Read takes a string as input, and returns the AST (abstract syntax tree) of the code as output.  Eval takes an AST as input, and evaluates/executes it.  Print is essentially the inverse of read, taking an object and printing it on the screen in the same format as the input of read.

The crucial idea here is that the exchange between these three functions uses the datastructures of the programming language.  It is fully possible to programmatically modify an AST, since it consists of the same kind of lists and arrays as any collection of data.  This concept is called homoiconicity: the program is expressed in the same kind of structure as it operates on.