Instruction sets


Once a substantial amount of software had been written, it was realised that portability would be a good thing.  This concept called to life the instruction set architecture in turn.  But since at that time software was still written in assembly (or even macine code), the instructions were chosen so they could do the most work, to avoid placing an undue burden on the programmers.  This approach later would be called CISC.

However by the mid-70s, most software was written in higher-level languages, and compiled.  But the compiler could rarely use all possible instructions with all possible addressing modes.  Thus another approach - RISC - appeared, characterised by:


Recently we have hit a wall: Dennard scaling has broken down, while Moore's law still holds on (not for much longer, though).  This lead to multicore CPUs and lots of special-purpose circuitry on-die.  Notably, Intel chips have random number generators and encryption built in.  At the same time, these processors are heavily microcoded and do a lot of pipelining, branch prediction, register renaming and superscalar execution - all of these put together means the CPU has a JIT compiler inside.  It performs code analysis (the pinnacle being superscalar execution and putting bubbles in the pipeline) and translates the x86 instructions into microcode for the pipelined execution units.

I must ask, why is this any better than a software compiler creating what would be the microcode ahead of time?  For one thing, it would have an opportunity, and much more time, to analyse the program as a whole, and could thus make better decisions about optimization.  For another, the chip area freed by throwing out the JIT compiler hardware could be put to other uses.  More cache space would be a good bet, as a nice parallel to RISC processors having more registers.

Going forward

So where does this put us?  First, RISC generally appears to be a good idea.  Secondly, we still have a lot of transistors we don't know what to do with.  Some options:

The last option does not necessarily mean fixed-function hardware (e.g. random number generator).  Most interestingly, programmability is not impossible.  Think about it - in optimisation, the general experience says that 1% of the code takes 99% of the runtime.  What if such a hot spot could be implemented not in software, but hardware?  Following by the example of the random number generator and SHA encryption, a speedup of over an order of magnitude is possible.  Except this could happen where profiling shows to be the most appropriate.