A loader is a program that loads executable files (programs) from secondary storage into primary memory (RAM) for execution by the CPU. Its primary function is to take an executable file produced by a compiler or an assembler and place it into memory so that it can be run.
The loader is an essential component of the operating system responsible for managing the execution of
programs.
Basic function of a loader
๐น 1. Allocation
Reserves or allocates memory space in main memory for the program.
Coordinates with the operating system to assign space for code, data, and stack segments.
๐น 2. Linking
Resolves symbolic references (e.g., function calls, variable addresses) between program modules or libraries.
Combines multiple object files and resolves inter-dependencies.
๐น 3. Relocation
Adjusts address-sensitive instructions so that the program can be executed from a memory location different from the one originally specified.
Uses relocation bits to identify which instructions need modification.
๐น 4. Loading
Transfers the executable machine code into the allocated memory.
Initializes registers and control and then passes control to the starting point of the program for execution.
Types of Loader / Loader Schemas
Loader Type
Description
1. Compile and Go Loader
The assembler loads the machine code directly into memory; no object file is created.
A Two Pass Assembler translates an assembly language program into machine code using two passes over the source code:
Pass 1: Analyzes and collects data like labels, symbols, and literal addresses.
Pass 2: Uses the collected data to generate actual machine instructions.
๐งพ Pass 1 of the Assembler
The purpose of Pass 1 is to assign memory locations to each instruction and data-defining pseudo-instruction, thereby defining values for symbols that appear in the label fields of the source program.
Initially, the Location Counter (LC) is set to the starting address of the program (usually relative address 0). The assembler then reads a source statement.
๐ Instruction Processing
The operation-code field is examined to determine whether it is a pseudo-op.
If it is not a pseudo-op, the Machine Op-Code Table (MOT) is searched to match the op-code.
The matching MOT entry provides the length of the instruction (typically 2, 4, or 6 bytes).
The operand field is scanned for literals. If a new literal is found, it is added to the Literal Table (LT) for later processing.
The label field is then examined:
If a symbol is present, it is entered into the Symbol Table (ST) along with the current value of the location counter.
The location counter is incremented by the instruction length.
A copy of the source line is saved for use by Pass 2.
This process is repeated for each source statement.
โ๏ธ Handling Pseudo-Operations (Pseudo-Ops)
Pass 1 handles only those pseudo-ops that either:
Define symbols, or
Affect the location counter
โ USING and DROP
These are simply saved for Pass 2.
Pass 1 does not perform any processing beyond saving the cards.
โ EQU
Pass 1 uses this to define a symbol in the label field.
It evaluates the expression in the operand field to determine the symbolโs value.
โ DS and DC
These pseudo-ops can:
Define symbols, and
Affect the location counter
The operand field is examined to determine the amount of storage required.
Due to alignment requirements (e.g., full words must start at addresses that are multiples of 4), the location counter may be adjusted before assigning the address.
๐ End of Pass 1
When the END pseudo-op is encountered, Pass 1 is terminated.
Before transferring control to Pass 2, the assembler performs housekeeping tasks, such as:
Assigning locations to literals collected during Pass 1.
This is similar to processing DC pseudo-ops.
Reinitializing conditions and data structures required for Pass 2.
๐งพ Pass 2 of the Assembler: Code Generation
After Pass 1 has defined all the symbols, Pass 2 completes the assembly process by:
Generating machine code for each instruction.
Formatting the code for loader compatibility.
Printing an assembly listing that includes both the source statement and the corresponding hexadecimal machine code.
๐ Initialization
The Location Counter (LC) is re-initialized, just like in Pass 1.
Processing starts by reading each source statement saved during Pass 1.
๐ Instruction Processing
Identify Pseudo-ops:
If the statement is a pseudo-op, it is handled separately (explained below).
If it is not a pseudo-op, the Machine Op-Code Table (MOT) is searched to:
Identify the instruction format (e.g., RR, RX),
Get the binary op-code, and
Determine the length of the instruction.
Operand Processing:
RR-format (Register-to-Register):
Both register fields are evaluated and inserted into their respective 4-bit fields.
RX-format (Register and Storage):
Register and index fields are evaluated similarly.
The effective address (EA) is calculated from the operand field.
The base and displacement fields are computed using the Base Table (BT).
Code Generation:
The assembled instruction is formatted for the loader.
Several machine instructions may be grouped into a single output card.
A listing line is printed that includes:
The source statement,
Its assigned memory location,
And the generated machine code in hexadecimal.
The Location Counter is then incremented accordingly.
โ๏ธ Pseudo-op Processing
EQU:
Requires minimal processing.
Symbol definition was already completed in Pass 1.
Simply printed in the assembly listing.
USING and DROP:
Were ignored in Pass 1 but are important in Pass 2.
The operand fields are evaluated.
The corresponding Base Table (BT) entry is:
Marked as available if USING,
Marked as unavailable if DROP.
DS and DC:
As in Pass 1, the location counter is updated based on required storage.
DC (Define Constant) also requires actual code generation in Pass 2:
May involve conversions (e.g., character โ binary, float โ machine format),
May include evaluations of address constants or symbolic values.
๐งพ Handling Literals
At the end of the source program (upon encountering the END pseudo-op):
Remaining literals in the Literal Table (LT) are processed.
Code is generated for each literal, similar to DC.
โ Final Output
All machine instructions are formatted for the loader.
The complete assembly listing is printed.
The program is now ready for loading and execution.
Three Address Code (TAC) is an intermediate code representation used in compilers during the translation of high-level language into machine code. It breaks complex expressions into simple statements that involve at most three operands.
1. Structure:
Each instruction typically has the form:
x = y op z
Where:
x = result (target)
y, z = operands
op = operator (arithmetic, logical, relational, etc.)
2. Features:
Simplifies code generation and optimization.
Uses temporary variables to hold intermediate results.
Makes control flow (like if, goto) and expressions easy to represent.
3. Example:
For the expression:
a + b * c + d
The TAC would be:
t1 = b * c t2 = a + t1 t3 = t2 + d
4. Forms of TAC Representation:
a. Quadruples:
It is structure with consist of 4 fields namely op, arg1, arg2 and result.
Op denotes the operator and arg1 and arg2 denotes the two operands and result is used to store the result of the expression.
Index
Op
Arg1
Arg2
Result
1
*
b
c
t1
2
+
a
t1
t2
3
+
t2
d
t3
Advantages
Easy to rearrange code for global optimization.
One can quickly access value of temporary variables using symbol table.
Disadvantages
Contain lot of temporaries.
Temporary variable creation increases time and space complexity.
b. Triples:
This representation doesnโt make use of extra temporary variable to represent a single operation instead when a reference to another tripleโs value is needed, a pointer to that triple is used.
So, it consist of only three fields namely op, arg1 and arg2.
Index
Op
Arg1
Arg2
1
*
b
c
2
+
a
(1)
3
+
(2)
d
5. Applications:
Intermediate step in code optimization.
Used in target code generation phase of the compiler.
In Summary:
Three Address Code is a simple, flexible, and powerful way to represent high-level expressions and control structures, making it easier for compilers to optimize and generate machine-level code.
(Left recursion eliminate and Left factoring , first and follow, Parsing table , parser the string)
Test whether the given Grammer is LL1 or not and construct LL(1) parsing table.
Two Pass Macro Assembler
Types of Intermediate Code
Types of Intermediate Code
Intermediate Code (IC) is an abstract representation of the source program generated after syntax and semantic analysis. It is independent of both the source and target machine, making the compilation process more flexible and portable.
Code generation is the final phase of a compiler, where intermediate representation (IR) is translated into target machine code. The primary goal is to generate correct, efficient, and optimized code for the target architecture.
Below are the main issues and challenges faced during code generation:
๐น 1. Input to Code Generator
The input is typically an intermediate representation like:
Abstract Syntax Tree (AST)
Three Address Code (TAC)
Control Flow Graph (CFG)
The quality and structure of IR affect the efficiency of the final machine code.
๐น 2. Target Programs
The code generator must output code that conforms to the Instruction Set Architecture (ISA) of the target machine.
This includes:
Instruction formats
Register sets
Memory access model (stack vs. heap)
Calling conventions
๐น 3. Memory Management
Manages allocation and access to:
Local/global variables
Static, stack, and heap memory
Needs to handle:
Memory fragmentation
Alignment
Lifetime of variables
๐น 4. Instruction Selection
Choose the best matching machine instruction for each high-level operation.
Consider:
Speed
Instruction size
Special instruction availability (like inc instead of add 1)
Example: z = x + y; โ May become ADD R1, R2, R3 if x, y in R2 and R3, and z in R1.
๐น 5. Memory Allocation
Assign memory addresses to variables and data.
Types:
Static Allocation (compile time)
Stack Allocation (LIFO, function calls)
Heap Allocation (dynamic memory, malloc)
Efficient allocation reduces memory usage and improves performance.
๐น 6. Register Allocation
Registers are faster than memory.
Goal: Keep frequently used variables in registers.
When registers are full, variables are spilled to memory.
Techniques:
Graph Coloring
Linear Scan Allocation
Register Renaming
๐น 7. Choice of Evaluation Order
Affects both correctness and performance:
Function arguments: Order of evaluation (left-to-right or right-to-left)
Short-circuit logic: A && B, avoid computing B if A is false
Arithmetic associativity: (a + b) + c vs. a + (b + c)
Incorrect ordering can lead to logic errors or performance drops.
๐น 8. Approaches to Code Generation
Syntax-Directed Translation:
Code is generated using grammar-based actions during parsing.
IR-Based Translation:
High-level IR (like TAC) is mapped to machine code using pattern matching.
Template-Based Generation:
Predefined templates for expressions/statements are substituted during code gen.
๐ Example to Illustrate Issues
int sum(int a, int b) { return a + b;}int main() { int x = 5, y = 10, z; z = sum(x, y); return 0;}
Issue
Explanation
Input to Code Generator
IR like: t1 = a + b, z = t1
Target Program
Output must match machine architecture (e.g., RISC-V, x86)
Memory Management
Allocate space for x, y, z, parameters, return value
Instruction Selection
ADD instruction for a + b
Memory Allocation
Place x, y, z in stack or registers
Register Allocation
Assign a, b, z to CPU registers if available
Evaluation Order
Order of argument evaluation in sum(x, y) must be consistent
Code Generation Approach
Likely uses IR-based or template-based translation
A Direct Linking Loader is a relocatable loader that performs both loading and linking at runtime. It reads object modules, links external references, and loads code into memory by resolving relative addresses.
๐ Working of DLL:
The loader does not access the source code directly. Instead, it uses object modules generated by the assembler, which contain metadata required for loading and linking. The object code may use relative addresses, which the loader must convert to absolute addresses during loading.
๐น The assembler provides the loader with:
Length of the object code.
USE Table: List of external symbols used but not defined in the segment.
DEFINITION Table: List of symbols defined in the current segment and referred by other segments.
๐๏ธ Databases Built and Used by DLL
1. External Symbol Dictionary (ESD)
Contains symbol name, segment type (SD, LD, ER), and its relative/absolute address.
Example:
Symbol
Type
Location
JOHN
SD
0
RESULT LD
LD
64
SUM
ER
โ
2. Text (TXT) Records
Contains actual object code, i.e., machine instructions.
Example:
LOAD 1, POINTER ; 1,48LD-ADDR 15, SUM ; 15,56BALR 14, 15 ; call subroutineSTORE 1, RESULT ; 1,52HLT ; halt
3. Relocation and Linkage Directory (RLD) Records
Contains information about where relocation is needed and what external references should be resolved.
Example:
Symbol
Flag
Length
Relative Location
JOHN
+
4
48
SUM
+
4
56
4. END Record
Indicates end of the program and entry/start point for execution.
Code optimization is a phase in compiler design aimed at improving the intermediate code to make the final executable more efficient, without changing the programโs output or logic.
Optimization can target:
Execution speed
Memory usage
Power consumption (in embedded systems)
There are two types:
Machine-independent optimization (applied on IR)
Machine-dependent optimization (applied during or after code generation)
๐น 1. Constant Folding
Description: Compile-time evaluation of constant expressions.
Benefit: Saves runtime computation.
Example:
int x = 3 * 5;
Optimized to:
int x = 15;
๐น 2. Constant Propagation
Description: Replaces variables with known constant values.
Benefit: Simplifies expressions and exposes more optimization opportunities.
In an absolute loader the assembler generates the object file which can be stored on secondary storage instead of being directly loaded into the memory.
Along with each object file, the assembler also gives information about starting address and length of that object file.
Here the programmer does the allocation and linking functions explicitly for the loader
The loader simply does the task of loading the object file into the memory and initiating the execution.
Advantages
Since the assembler does not always reside in the main memory, it leaves more main memory space available to the user.
Disadvantages
The programmer must do memory management since he explicitly does the allocation and linking for the loader.