An Introduction to Compilers
[Computer]

Most processors only understand a small set of very basic instructions. That set is known as "machine language", and typically only has instructions for simple arithmatic and performing comparison between numbers. Writing programs that do something "interesting" in machine language is very tedious. It would be like explaining how to play Basketball by stating what electrical signals to send through your nervous system to activate the proper muscles.

So computer scientists wrote program called a compiler. A compiler takes some high level instructions (e.g. "dribble the ball") and translates it into low level instructions that the computer can understand (e.g. "contract this muscle for 0.84 seconds, and then relax it while contracting that muscle"). Now programmers can write programs in the higher level language, and it'll get translated into something that the computer can understand.

Here's a concrete example: Let's say we want to write a program that'll calculate the square root of a number (let's call it "A") using the trick that Newton invented. The very high level English algorithm might look something like this:

1: Choose a rough approximation for the square root of A and call it G;
2: Divide A by G and then take the average of the quotient with G, call this value G';
3: If G' is sufficiently accurate, stop. Otherwise, let G = G' and go back to step 2.

This is written in English, with the convention that I've put labels before every instruction, and I've separated each instruction with a semicolon. It'll be very useful later on to follow this convention of separating every instruction with a semicolon, but labelling every instruction isn't necessary. I could have rewrote the instructions as follows:

Choose a rough approximation for the square root of A and call it G;
1: Divide A by G and then take the average of the quotient with G, call this value G';
If G' is sufficiently accurate, stop. Otherwise, let G = G' and go back to step 1.

This is the exact same algorithm, except only one of the three instructions was labelled. These instructions however are too high level for any currently existing compiler to understand. If I were to actually write a program that carried out this algorithm, it'd probably look something like this:

real function sqrt(real A) {
  real G, G', tolerance;
  tolerance = 0.0001;
  G = 1.0;
  label1:
  G' = ((A / G) + G) / 2.0;
  if ((G - G') < tolerance) {
    return G';
  } else {
    G = G';
    goto label1;
  }
}

This program essentially says the same thing as the "plain-English" algorithm, except it makes explicit a lot of the information that was merely implied in the plain English description. For example, I specified that this algorithm is a function that expects 1 real number (called A) as input, and which will give a real number as its output or answer. Furthermore, I had to explicitly say what I meant by "choose a rough approximation for the square root of A". In this case, I just always approximated the square root to be equal to 1. I also had to say exactly what the criteria were for the root being "sufficiently accurate". In this case, if the approximation improved by less than a margin of 0.0001, then I declare the approximation is close enough, and I just return the answer to whoever it was that asked the question.

That code is typical for many programming languages, and while a compiler can understand it, it's still too high level for an actual CPU chip made by Intel of AMD to understand. The compiler will read in programs like the above, and translate it into a form which the CPU can understand. This might look something like this (don't worry about some of the lines appearing in bold, I highlighted those lines for later discussion):

LOAD  CONST 0.0001, fp_reg1;
STORE fp_reg1,   fp_var1;
LOAD  CONST 1.0, fp_reg1;
STORE fp_reg1,   fp_var2;
LABEL1:
LOAD  fp_var3,   fp_reg1;
LOAD  fp_var2,   fp_reg2;
DIV   fp_reg1,   fp_reg2, fp_reg1;
LOAD  fp_var2,   fp_reg2;
ADD   fp_reg1,   fp_reg2, fp_reg1;
LOAD  CONST 2.0, fp_reg2;
DIV   fp_reg1,   fp_reg2, fp_reg1;
STORE fp_reg1,   fp_var4;
LOAD  fp_var2,   fp_reg1;
LOAD  fp_var4,   fp_reg2;
SUB   fp_reg1,   fp_reg2, fp_reg1;
LOAD  fp_var1,   fp_reg2;
JGE   fp_reg1,   fp_reg2, LABEL2;
LOAD  fp_var4,   fp_reg1;
RET   fp_reg1;
LABEL2:
LOAD  fp_var4, fp_reg1;
STORE fp_reg1, fp_var2;
JMP   LABEL1;

Pretty nightmarish, huh? Machine code doesn't let you name your variables. Rather, you have to refer to them by purpose and number. "fp_var" means that the values should be stored somewhere in memory (these are analogous to the variables used in the previous program), while "fp_reg" is where scratch work is done for calculation. The first line says to load the value "0.0001" into fp_reg1. The second line says to store the value that's in fp_reg1 into fp_var1 (so we can assume that fp_var1 represents the variable "tolerance"). This is necessary because Intel chips don't allow you to load values directly into variables. First you gotta load it into the scratch work area, and then copy it from the scratch area into the actual variables.

Also note that in the machine code, there's two labels, even though in the higher level program, there was only one label. What's the second label for? Well, it's used in the instruction "JGE", which stands for "Jump if Greater or Equal". Remember in the high level program we had an instruction that basically said "if ((G - G') < tolerance) then do THIS, else do THAT"? This is equivalent to saying "if ((G - G') >= tolerance) then goto label 2; Do THIS; Label 2: do THAT;" Notice that the comparison operator was reversed (to say greater than or equal instead of less than). This is also due to a the fact that there's no instruction in the Intel chip to say "else".

So if you actually read through the machine code (though I'm sure most of you won't bother), you'll see that it does indeed calculate the square root of a number stored in fp_var3 using Newton's algorithm. I want you to look at the JGE instruction. Notice that if it does perform the jump and goes to "LABEL2", then the next instruction it'll execute is "LOAD fp_var4, fp_reg1;". Notice now that if it doesn't perform the jump, and instead just proceeds to the next instruction, that next instruction is also "LOAD fp_var4, fp_reg1;". That means no matter whether the test is true or false, the same action is performed. So couldn't we save a line of code by moving this instruction before the test, and thus only have it appear once in the program, instead of having it appear twice? The resulting code would look like this:

LOAD  CONST 0.0001, fp_reg1;
STORE fp_reg1,   fp_var1;
LOAD  CONST 1.0, fp_reg1;
STORE fp_reg1,   fp_var2;
LABEL1:
LOAD  fp_var3,   fp_reg1;
LOAD  fp_var2,   fp_reg2;
DIV   fp_reg1,   fp_reg2, fp_reg1;
LOAD  fp_var2,   fp_reg2;
ADD   fp_reg1,   fp_reg2, fp_reg1;
LOAD  CONST 2.0, fp_reg2;
DIV   fp_reg1,   fp_reg2, fp_reg1;
STORE fp_reg1,   fp_var4;
LOAD  fp_var2,   fp_reg1;
LOAD  fp_var4,   fp_reg2;
SUB   fp_reg1,   fp_reg2, fp_reg1;
LOAD  fp_var1,   fp_reg2;
LOAD  fp_var4,   fp_reg1;
JGE   fp_reg1,   fp_reg2, LABEL2;
RET   fp_reg1;
LABEL2:
STORE fp_reg1, fp_var2;
JMP   LABEL1;

However, this code is not correct. Notice that the line that we've moved up loads a value from variable 4 into register 1, and then the JGE reads from register 1 as part of its comparison. We've change what the program is comparing! Oops. We can fix that by changing the line so it loads the value into another register, like so:

LOAD  CONST 0.0001, fp_reg1;
STORE fp_reg1,   fp_var1;
LOAD  CONST 1.0, fp_reg1;
STORE fp_reg1,   fp_var2;
LABEL1:
LOAD  fp_var3,   fp_reg1;
LOAD  fp_var2,   fp_reg2;
DIV   fp_reg1,   fp_reg2, fp_reg1;
LOAD  fp_var2,   fp_reg2;
ADD   fp_reg1,   fp_reg2, fp_reg1;
LOAD  CONST 2.0, fp_reg2;
DIV   fp_reg1,   fp_reg2, fp_reg1;
STORE fp_reg1,   fp_var4;
LOAD  fp_var2,   fp_reg1;
LOAD  fp_var4,   fp_reg2;
SUB   fp_reg1,   fp_reg2, fp_reg1;
LOAD  fp_var1,   fp_reg2;
LOAD  fp_var4,   fp_reg5;
JGE   fp_reg1,   fp_reg2, LABEL2;
RET   fp_reg1;
LABEL2:
STORE fp_reg1, fp_var2;
JMP   LABEL1;

Now we load whatever was in variable 4 into register 5. This code still isn't correct, though! The whole reason that line loaded the value into register 1 in the first place was so that it could be copied from register 1. If the comparison condition were true, then we were supposed to go to the line after LABEL2, which says "STORE fp_reg1, fp_var2;", thus storing the loaded value into variable 2. If the comparison were false, then we'd perform the instruction "RET fp_reg1;" which returns whatever was in register 1 as the answer to the problem. The desired value is no longer in register 1, but in register 5, so we have to update the references. The code should look something like this:

LOAD  CONST 0.0001, fp_reg1;
STORE fp_reg1,   fp_var1;
LOAD  CONST 1.0, fp_reg1;
STORE fp_reg1,   fp_var2;
LABEL1:
LOAD  fp_var3,   fp_reg1;
LOAD  fp_var2,   fp_reg2;
DIV   fp_reg1,   fp_reg2, fp_reg1;
LOAD  fp_var2,   fp_reg2;
ADD   fp_reg1,   fp_reg2, fp_reg1;
LOAD  CONST 2.0, fp_reg2;
DIV   fp_reg1,   fp_reg2, fp_reg1;
STORE fp_reg1,   fp_var4;
LOAD  fp_var2,   fp_reg1;
LOAD  fp_var4,   fp_reg2;
SUB   fp_reg1,   fp_reg2, fp_reg1;
LOAD  fp_var1,   fp_reg2;
LOAD  fp_var4,   fp_reg5;
JGE   fp_reg1,   fp_reg2, LABEL2;
RET   fp_reg5;
LABEL2:
STORE fp_reg5, fp_var2;
JMP   LABEL1;

Now this code is correct (I think). What we just did is called "optimizing". A typical compiler translates the high level program verbatim into the low level one, and then, in a second pass, goes over the code it just produced and tries to optimize it, so that it's smaller and faster. In the above example, we've saved a line, but I'm sure we could save many more: a lot of the loading and storing are counterproductive, for example, and a better program could be written which just loads all the values into registers once, and then works with the values in register, without storing them back into variables.

All of this was to say that writing compilers is difficult, and writing optimizers for compilers is even more difficult! It's difficult to be sure that the optimization you thought of won't 'cause a bug somewhere because it changed the meaning of the program. That's why a lot of compilers in real life have two options: "Safe" and "Fast". In "Safe" mode, it performs only the optimizations it's absolutely sure that'll work. In "Fast" mode, it'll try it's best to produce fast code, but that code might have errors in it, 'cause your program to crash or do unexpected things, even though the program you typed in, in the high level language, was completely error free: The compiler simply made a mistake while translating it.

Now why did I bother to explain to you what compilers and compiler-optimizers were? In the fall/winter period of 2004, I took a compiler course at McGill University. The teacher held a little competition for fun. She gave us a simple Java compiler that she wrote (that is, a program which takes programs written in the "Java" programming language, and which translate it into something like the machine code you've seen up there), and then asked us to write an optimizer for it. We were split into groups of two, and the group with the best optimizer would win a t-shirt.

Just today, at 7:15PM, I got an e-mail from the teacher informing me that I won the competition. Not only did we beat every other group in our class, but we also beat "javac", which is the Java compiler written by Sun, the company that invented the Java language. The fact that we beat javac came to no surprise to me, because night after night, I poured over the machine code javac produced, trying to figure out what the engineers at Sun were doing. In the process, I found out noticed a lot of optimization tricks that javac didn't take advantage of, such as noticing when an instruction "doesn't do anything", and how it handles strings of text. Javac, for example, won't realize that this program:

integer function(integer A) {
  integer G;
  G = A / 3;
  if (G > 2) {
    G = 1;
  } else {
    G = -1;
  }
  return A;
}

does exactly the same thing as this program:

integer function(integer A) {
  return A;
}

So yes, I knew we had beaten javac. In fact, when my partner and I communicated our progress to each other, we stated it in terms of how many bytes above javac we were. If you just took the compiler the teacher gave, you started at a score of 1134 (lower scores are better), and we spent a few nights, edging closer and closer to zero. I remember at one point, we were at a score of something like 13, and we spent an hour, just to get it down to 11. At that point, we figured we would never beat javac, and we were reaching the limits of optimization. That's when I had the "detect useless instruction" epiphany above, which knocked us to something like -17. I was overjoyed when I realized that we had finally beat javac.

Anyway, we pushed on, and our final score, when we submitted it to the teacher, was -293. I remember seeing maybe 10 more points we could have shaved off, 10 more spots were I knew what the optimization was, and all I had to do was actually type them in, but we had ran out of time.

What I didn't expect, was to beat all the other groups. The teacher said that if you can beat javac (and that it has happened in the past), then you've got a good chance of winning. However, my partner and I had procrastinated and started late on this contest. While the other groups had a week to work it, we had spent a weekend. I figured if I got to -293 after 2 days, the people who spent 7 days on it must be at something like -600 or -800. It turns out, of the 12 groups, only 3 of them beat javac (including us). The 2nd place group was at -258 and the 3rd place group was at -232. The 4th place quickly falls off at 105 (that's a non-negative score), and so on.

In short, I've very pleased with myself, and now I am trying to think of a way of elegantly slipping in a summary of this post into my résumé.

 
E-mail this story to a friend.

You must be logged in to post comments.