When you write code, the computer can't understand it directly. A compiler translates your code into machine language — but it does it in stages. Watch each stage happen.
CS GCSE §1.6CS A-Level Unit 3
🤔 What is a compiler?
A compiler is a program that translates your entire source code (like Python or C) into machine code (binary — 1s and 0s) that the CPU can execute. It does this before the program runs.
This is different from an interpreter, which translates and runs code one line at a time (like Python normally does).
🍕 Analogy: A compiler is like translating an entire recipe book from Welsh to English before you start cooking. An interpreter is like having a translator read each step aloud as you cook — slower, but you can start immediately.
📝
Source Code
Type or select some code to compile. Watch what the compiler does at each stage.
1️⃣
Lexical Analysis (breaking code into tokens)
The lexer (also called a tokeniser) reads your code character by character and breaks it into meaningful pieces called tokens. It's like splitting a sentence into individual words.
Tokens produced:
📝 Exam note: The lexer also removes whitespace and comments — they're for humans, not the compiler. Each token has a type (keyword, identifier, number, operator) and a value.
2️⃣
Syntax Analysis (checking grammar + building a tree)
The parser checks that the tokens follow the language's grammar rules — like checking a sentence makes sense in English. "I ate pizza" ✅ but "pizza ate I" ❌ (wrong order, even though the words are real).
It builds a tree diagram showing how the parts of your code connect. The exam calls this an Abstract Syntax Tree (AST) — think of it as a family tree for your code, showing which parts belong together.
🌲 Your code as a tree:
📝 Exam term: AST = Abstract Syntax Tree. "Abstract" means it ignores unnecessary details (like brackets and semicolons) and keeps only the important structure. If the parser finds a syntax error (like a missing bracket), compilation stops here and you see an error message.
3️⃣
Semantic Analysis (checking meaning)
The grammar might be correct, but does the code actually make sense? This stage checks for logical problems — like using a variable you never created, or trying to add a number to a word.
Think of it like spell-check vs grammar-check: Stage 2 checks the grammar, Stage 3 checks the meaning.
📝 Exam term: Symbol Table — the compiler keeps a list of every variable name, what type of data it holds (number, text, etc.), and where it's stored in memory. Like a contacts list for your variables.
4️⃣
Code Generation (translating to machine code)
Now the compiler translates your code into assembly language — simple instructions that the CPU understands. Your one line of Python becomes several CPU instructions, because the CPU can only do very basic things: move a value, add two numbers, store a result.
🐍 Your code (human-readable)
→
🔧 Assembly (CPU instructions)
🔑 What do these instructions mean?
MOV R0, 5
Move the value 5 into Register 0 (a tiny storage slot in the CPU)
ADD R0, 3
Add 3 to whatever is in Register 0
STR R0, [x]
Store the value from Register 0 into the memory location called "x"
LDR R0, [x]
Load the value of "x" from memory into Register 0
CMP R0, R1
Compare two values (used for if statements and loops)
📝 Key point: One line of Python = 3-5 CPU instructions. That's why we use high-level languages — writing in assembly would take forever! The CPU can't read Python directly, so the compiler does the translation for us.
5️⃣
Optimisation (making it faster/smaller)
The compiler looks for ways to make the code run faster or use less memory — without changing what it does. Like a teacher marking your essay and saying "you can say the same thing in fewer words".
📝 Exam note: Common optimisations include constant folding (calculating 5+3=8 at compile time instead of runtime), dead code elimination (removing code that never runs), and register allocation (keeping frequently-used values in fast CPU registers).