Difference between Application Software and System Software

Difference between Application Software and System Software

AspectSystem SoftwareApplication Software
DefinitionSoftware that manages and controls computer hardware.Software that helps users perform specific tasks or applications.
PurposeSupports the operation of the computer.Solves user-specific problems or performs desired tasks.
FocusEfficiency and resource management of the system.Functionality for end-users.
ExamplesOperating systems, Compilers, Interpreters, Loaders, Device DriversMS Word, Photoshop, Excel, Tally, Web browsers
User InteractionWorks mostly in the background; minimal direct interaction.User-centric; interacts directly with the user.
Machine DependencyMachine-dependent.Generally machine-independent.
PortabilityPortable through bootstrapping.Portable through cross-compilers.
Developer Skill NeededRequires low-level programming knowledge (e.g., system architecture).Requires knowledge of high-level programming languages.

Link to original

Loader

Loader

A loader is a program that loads executable files (programs) from secondary storage into primary memory (RAM) for execution by the CPU. Its primary function is to take an executable file produced by a compiler or an assembler and place it into memory so that it can be run.

The loader is an essential component of the operating system responsible for managing the execution of programs.

Basic function of a loader

๐Ÿ”น 1. Allocation

  • Reserves or allocates memory space in main memory for the program.
  • Coordinates with the operating system to assign space for code, data, and stack segments.

๐Ÿ”น 2. Linking

  • Resolves symbolic references (e.g., function calls, variable addresses) between program modules or libraries.
  • Combines multiple object files and resolves inter-dependencies.

๐Ÿ”น 3. Relocation

  • Adjusts address-sensitive instructions so that the program can be executed from a memory location different from the one originally specified.
  • Uses relocation bits to identify which instructions need modification.

๐Ÿ”น 4. Loading

  • Transfers the executable machine code into the allocated memory.
  • Initializes registers and control and then passes control to the starting point of the program for execution.

Types of Loader / Loader Schemas

Loader TypeDescription
1. Compile and Go LoaderThe assembler loads the machine code directly into memory; no object file is created.
2. Absolute LoaderLoads programs at specified memory addresses. Allocation and linking are manual.
3. Relocating LoaderSupports relocation, enabling flexible memory assignment.
4. Direct Linking Loader (DLL)Uses two passes to resolve external references and relocation. Builds a load module.
5. Dynamic Loading LoaderLoads only required segments at runtime, using an overlay structure for memory optimization.
Link to original

Macro

Macro

Link to original

Macro Data Structure

1. Macro Definition Table (MDT)

  • Purpose: Stores the body of all macro definitions (i.e., instructions between MACRO and MEND).

  • Each line of a macro definition is stored in sequence.

Example:

MACRO
INCR &ARG
   ADD AREG, &ARG
MEND

MDT:

IndexInstruction
0INCR &ARG
1ADD AREG, &ARG
2MEND

2. Macro Name Table (MNT)

  • Purpose: Holds the names of defined macros and their starting index in MDT.

  • Used to quickly locate macro bodies during expansion.

MNT:

IndexMacro NameMDT Index
0INCR0

3. Macro Definition Table Counter (MDTC)

  • Purpose: Points to the next available entry in the MDT.

  • Updated as macro lines are inserted.

  • Starts from 0 and increments with every new line added.

Example:

  • Before macro insertion: MDTC = 0

  • After above macro: MDTC = 3


4. Macro Name Table Counter (MNTC)

  • Purpose: Points to the next available entry in the MNT.

  • Incremented each time a new macro is defined.

Example:

  • After defining INCR, MNTC = 1

5. Argument List Array (ALA)

  • Purpose: Maintains a list of formal parameters during macro definition and replaces them with actual parameters during macro expansion.

  • Ensures proper parameter substitution.

Macro Call:

INCR NUM1

ALA During Expansion:

IndexFormal ArgActual Arg
0&ARGNUM1
  • During expansion, &ARG is replaced by NUM1 in the instructions.

Summary Table:

StructurePurpose
MDTStores macro body (excluding MACRO keyword)
MNTStores macro names and corresponding MDT index
MDTCTracks next available index in MDT
MNTCTracks next available index in MNT
ALAManages argument substitution
Link to original

Two Pass Assembler

Two Pass Assembler

A Two Pass Assembler translates an assembly language program into machine code using two passes over the source code:

  • Pass 1: Analyzes and collects data like labels, symbols, and literal addresses.
  • Pass 2: Uses the collected data to generate actual machine instructions.

๐Ÿงพ Pass 1 of the Assembler

The purpose of Pass 1 is to assign memory locations to each instruction and data-defining pseudo-instruction, thereby defining values for symbols that appear in the label fields of the source program.

Initially, the Location Counter (LC) is set to the starting address of the program (usually relative address 0). The assembler then reads a source statement.

๐Ÿ” Instruction Processing

  • The operation-code field is examined to determine whether it is a pseudo-op.

  • If it is not a pseudo-op, the Machine Op-Code Table (MOT) is searched to match the op-code.

  • The matching MOT entry provides the length of the instruction (typically 2, 4, or 6 bytes).

  • The operand field is scanned for literals. If a new literal is found, it is added to the Literal Table (LT) for later processing.

  • The label field is then examined:

    • If a symbol is present, it is entered into the Symbol Table (ST) along with the current value of the location counter.
  • The location counter is incremented by the instruction length.

  • A copy of the source line is saved for use by Pass 2.

This process is repeated for each source statement.


โš™๏ธ Handling Pseudo-Operations (Pseudo-Ops)

Pass 1 handles only those pseudo-ops that either:

  • Define symbols, or

  • Affect the location counter

โœ… USING and DROP

  • These are simply saved for Pass 2.

  • Pass 1 does not perform any processing beyond saving the cards.

โœ… EQU

  • Pass 1 uses this to define a symbol in the label field.

  • It evaluates the expression in the operand field to determine the symbolโ€™s value.

โœ… DS and DC

  • These pseudo-ops can:

    • Define symbols, and

    • Affect the location counter

  • The operand field is examined to determine the amount of storage required.

  • Due to alignment requirements (e.g., full words must start at addresses that are multiples of 4), the location counter may be adjusted before assigning the address.


๐Ÿ”š End of Pass 1

When the END pseudo-op is encountered, Pass 1 is terminated.

Before transferring control to Pass 2, the assembler performs housekeeping tasks, such as:

  • Assigning locations to literals collected during Pass 1.

    • This is similar to processing DC pseudo-ops.
  • Reinitializing conditions and data structures required for Pass 2.

๐Ÿงพ Pass 2 of the Assembler: Code Generation

After Pass 1 has defined all the symbols, Pass 2 completes the assembly process by:

  • Generating machine code for each instruction.

  • Formatting the code for loader compatibility.

  • Printing an assembly listing that includes both the source statement and the corresponding hexadecimal machine code.


๐Ÿ Initialization

  • The Location Counter (LC) is re-initialized, just like in Pass 1.

  • Processing starts by reading each source statement saved during Pass 1.


๐Ÿ” Instruction Processing

  1. Identify Pseudo-ops:

    • If the statement is a pseudo-op, it is handled separately (explained below).

    • If it is not a pseudo-op, the Machine Op-Code Table (MOT) is searched to:

      • Identify the instruction format (e.g., RR, RX),

      • Get the binary op-code, and

      • Determine the length of the instruction.

  2. Operand Processing:

    • RR-format (Register-to-Register):

      • Both register fields are evaluated and inserted into their respective 4-bit fields.
    • RX-format (Register and Storage):

      • Register and index fields are evaluated similarly.

      • The effective address (EA) is calculated from the operand field.

      • The base and displacement fields are computed using the Base Table (BT).

  3. Code Generation:

    • The assembled instruction is formatted for the loader.

    • Several machine instructions may be grouped into a single output card.

    • A listing line is printed that includes:

      • The source statement,

      • Its assigned memory location,

      • And the generated machine code in hexadecimal.

    • The Location Counter is then incremented accordingly.


โš™๏ธ Pseudo-op Processing

  • EQU:

    • Requires minimal processing.

    • Symbol definition was already completed in Pass 1.

    • Simply printed in the assembly listing.

  • USING and DROP:

    • Were ignored in Pass 1 but are important in Pass 2.

    • The operand fields are evaluated.

    • The corresponding Base Table (BT) entry is:

      • Marked as available if USING,

      • Marked as unavailable if DROP.

  • DS and DC:

    • As in Pass 1, the location counter is updated based on required storage.

    • DC (Define Constant) also requires actual code generation in Pass 2:

      • May involve conversions (e.g., character โ†’ binary, float โ†’ machine format),

      • May include evaluations of address constants or symbolic values.


๐Ÿงพ Handling Literals

  • At the end of the source program (upon encountering the END pseudo-op):

    • Remaining literals in the Literal Table (LT) are processed.

    • Code is generated for each literal, similar to DC.


โœ… Final Output

  • All machine instructions are formatted for the loader.

  • The complete assembly listing is printed.

  • The program is now ready for loading and execution.

Link to original

Three Address Code

Three Address Code

Three Address Code (TAC) is an intermediate code representation used in compilers during the translation of high-level language into machine code. It breaks complex expressions into simple statements that involve at most three operands.


1. Structure:

Each instruction typically has the form:

x = y op z

Where:

  • x = result (target)

  • y, z = operands

  • op = operator (arithmetic, logical, relational, etc.)


2. Features:

  • Simplifies code generation and optimization.

  • Uses temporary variables to hold intermediate results.

  • Makes control flow (like if, goto) and expressions easy to represent.


3. Example:

For the expression:

a + b * c + d

The TAC would be:

t1 = b * c  
t2 = a + t1  
t3 = t2 + d

4. Forms of TAC Representation:

a. Quadruples:

It is structure with consist of 4 fields namely op, arg1, arg2 and result.

Op denotes the operator and arg1 and arg2 denotes the two operands and result is used to store the result of the expression.

IndexOpArg1Arg2Result
1*bct1
2+at1t2
3+t2dt3
Advantages
  • Easy to rearrange code for global optimization.
  • One can quickly access value of temporary variables using symbol table. Disadvantages
  • Contain lot of temporaries.
  • Temporary variable creation increases time and space complexity.

b. Triples:

This representation doesnโ€™t make use of extra temporary variable to represent a single operation instead when a reference to another tripleโ€™s value is needed, a pointer to that triple is used.

So, it consist of only three fields namely op, arg1 and arg2.

IndexOpArg1Arg2
1*bc
2+a(1)
3+(2)d

5. Applications:

  • Intermediate step in code optimization.

  • Used in target code generation phase of the compiler.


In Summary:

Three Address Code is a simple, flexible, and powerful way to represent high-level expressions and control structures, making it easier for compilers to optimize and generate machine-level code.

Link to original

Predictive LL1 parser (sum based on this )

  • (Left recursion eliminate and Left factoring , first and follow, Parsing table , parser the string)
  • Test whether the given Grammer is LL1 or not and construct LL(1) parsing table.

Two Pass Macro Assembler

Types of Intermediate Code

Types of Intermediate Code

Intermediate Code (IC) is an abstract representation of the source program generated after syntax and semantic analysis. It is independent of both the source and target machine, making the compilation process more flexible and portable.


1. Syntax Tree

  • A hierarchical tree structure where:

    • Internal nodes represent operators.

    • Leaf nodes represent operands (variables/constants).

  • More concise than a parse tree.

Example (for expression a + b * c):

      +
     / \
    a   *
       / \
      b   c

2. Postfix Notation (Reverse Polish Notation)

  • A linearized form of the syntax tree.

  • Operators follow their operands.

  • No parentheses are needed.

Infix: (a - b) * (c + d)
Postfix: a b - c d + *


3. Three Address Code (TAC)

  • Each instruction has at most three operands.

  • Simplifies complex expressions using temporary variables.

Example for a + b * c + d:

t1 = b * c  
t2 = a + t1  
t3 = t2 + d
Link to original

Issues in Code Generation

Issues in code generation

Code generation is the final phase of a compiler, where intermediate representation (IR) is translated into target machine code. The primary goal is to generate correct, efficient, and optimized code for the target architecture.

Below are the main issues and challenges faced during code generation:


๐Ÿ”น 1. Input to Code Generator

  • The input is typically an intermediate representation like:

    • Abstract Syntax Tree (AST)

    • Three Address Code (TAC)

    • Control Flow Graph (CFG)

  • The quality and structure of IR affect the efficiency of the final machine code.


๐Ÿ”น 2. Target Programs

  • The code generator must output code that conforms to the Instruction Set Architecture (ISA) of the target machine.

  • This includes:

    • Instruction formats

    • Register sets

    • Memory access model (stack vs. heap)

    • Calling conventions


๐Ÿ”น 3. Memory Management

  • Manages allocation and access to:

    • Local/global variables

    • Static, stack, and heap memory

  • Needs to handle:

    • Memory fragmentation

    • Alignment

    • Lifetime of variables


๐Ÿ”น 4. Instruction Selection

  • Choose the best matching machine instruction for each high-level operation.

  • Consider:

    • Speed

    • Instruction size

    • Special instruction availability (like inc instead of add 1)

Example:
z = x + y; โ†’ May become ADD R1, R2, R3 if x, y in R2 and R3, and z in R1.


๐Ÿ”น 5. Memory Allocation

  • Assign memory addresses to variables and data.

  • Types:

    • Static Allocation (compile time)

    • Stack Allocation (LIFO, function calls)

    • Heap Allocation (dynamic memory, malloc)

Efficient allocation reduces memory usage and improves performance.


๐Ÿ”น 6. Register Allocation

  • Registers are faster than memory.

  • Goal: Keep frequently used variables in registers.

  • When registers are full, variables are spilled to memory.

  • Techniques:

    • Graph Coloring

    • Linear Scan Allocation

    • Register Renaming


๐Ÿ”น 7. Choice of Evaluation Order

  • Affects both correctness and performance:

    • Function arguments: Order of evaluation (left-to-right or right-to-left)

    • Short-circuit logic: A && B, avoid computing B if A is false

    • Arithmetic associativity: (a + b) + c vs. a + (b + c)

Incorrect ordering can lead to logic errors or performance drops.


๐Ÿ”น 8. Approaches to Code Generation

  • Syntax-Directed Translation:

    • Code is generated using grammar-based actions during parsing.
  • IR-Based Translation:

    • High-level IR (like TAC) is mapped to machine code using pattern matching.
  • Template-Based Generation:

    • Predefined templates for expressions/statements are substituted during code gen.

๐Ÿ“Œ Example to Illustrate Issues

int sum(int a, int b) {
    return a + b;
}
 
int main() {
    int x = 5, y = 10, z;
    z = sum(x, y);
    return 0;
}
IssueExplanation
Input to Code GeneratorIR like: t1 = a + b, z = t1
Target ProgramOutput must match machine architecture (e.g., RISC-V, x86)
Memory ManagementAllocate space for x, y, z, parameters, return value
Instruction SelectionADD instruction for a + b
Memory AllocationPlace x, y, z in stack or registers
Register AllocationAssign a, b, z to CPU registers if available
Evaluation OrderOrder of argument evaluation in sum(x, y) must be consistent
Code Generation ApproachLikely uses IR-based or template-based translation

โœ… Conclusion

Effective code generation requires solving multiple challenges:

  • Correctness (program logic)

  • Efficiency (execution speed, memory usage)

  • Portability (target machine awareness)

Link to original

Direct Linking Loader

Direct Linking Loader

Definition:

A Direct Linking Loader is a relocatable loader that performs both loading and linking at runtime. It reads object modules, links external references, and loads code into memory by resolving relative addresses.


๐Ÿ” Working of DLL:

The loader does not access the source code directly. Instead, it uses object modules generated by the assembler, which contain metadata required for loading and linking. The object code may use relative addresses, which the loader must convert to absolute addresses during loading.


๐Ÿ”น The assembler provides the loader with:

  1. Length of the object code.

  2. USE Table: List of external symbols used but not defined in the segment.

  3. DEFINITION Table: List of symbols defined in the current segment and referred by other segments.


๐Ÿ—‚๏ธ Databases Built and Used by DLL

1. External Symbol Dictionary (ESD)

  • Contains symbol name, segment type (SD, LD, ER), and its relative/absolute address.

Example:

SymbolTypeLocation
JOHNSD0
RESULT LDLD64
SUMERโ€”

2. Text (TXT) Records

  • Contains actual object code, i.e., machine instructions.

Example:

LOAD 1, POINTER   ; 1,48
LD-ADDR 15, SUM   ; 15,56
BALR 14, 15   ; call subroutine
STORE 1, RESULT   ; 1,52
HLT   ; halt

3. Relocation and Linkage Directory (RLD) Records

  • Contains information about where relocation is needed and what external references should be resolved.

Example:

SymbolFlagLengthRelative Location
JOHN+448
SUM+456

4. END Record

  • Indicates end of the program and entry/start point for execution.

๐Ÿ“Œ Example Summary:

Code Fragment:

START
ENTRY RESULT
EXTERNAL SUM
LOAD 1, POINTER
LD-ADDR 15, SUM
BALR 14, 15
STORE 1, RESULT
HLT

Tables Constructed:

a) POINTER DATA:

Address (TABLE)Value
480028
520001
56???? (external SUM)

b) TABLE DC:

1, 7, 9, 10, 13 โ†’ Represent instruction opcodes and memory

c) RESULT NUM: 0

d) ASSUM DATA: ADDR(SUM) โ†’ 56 (external)


โœ… Advantages of Direct Linking Loader:

  1. Supports multiple procedures and data segments.

  2. Allows independent translation of program modules.

  3. Enables modular design, supports subroutines, data sharing, and access control.

  4. Provides relocation and linking flexibility.

Link to original

Phases of Compiler

Compiler

Link to original

Code Optimization Techniques

Code Optimization Techniques

Code Optimization Techniques

Code optimization is a phase in compiler design aimed at improving the intermediate code to make the final executable more efficient, without changing the programโ€™s output or logic.

Optimization can target:

  • Execution speed
  • Memory usage
  • Power consumption (in embedded systems)

There are two types:

  • Machine-independent optimization (applied on IR)
  • Machine-dependent optimization (applied during or after code generation)

๐Ÿ”น 1. Constant Folding

  • Description: Compile-time evaluation of constant expressions.

  • Benefit: Saves runtime computation.

Example:

int x = 3 * 5;

Optimized to:

int x = 15;

๐Ÿ”น 2. Constant Propagation

  • Description: Replaces variables with known constant values.

  • Benefit: Simplifies expressions and exposes more optimization opportunities.

Example:

int a = 10;
int b = a + 5;

Optimized to:

int b = 10 + 5;

๐Ÿ”น 3. Common Sub-expression Elimination (CSE)

  • Description: Eliminates duplicate expressions already computed.

  • Benefit: Reduces redundant calculations.

Example:

t1 = a + b;
t2 = a + b;

Optimized to:

t1 = a + b;
t2 = t1;

๐Ÿ”น 4. Dead Code Elimination

  • Description: Removes code that does not affect program output.

  • Benefit: Reduces program size and increases speed.

Example:

int x = 10;
x = 20;   // x = 10 is dead code

Optimized to:

int x = 20;

๐Ÿ”น 5. Strength Reduction

  • Description: Replaces expensive operations with cheaper equivalents.

  • Benefit: Improves execution time.

Example:

x = a * 2;      // multiplication

Optimized to:

x = a + a;      // faster addition

๐Ÿ”น 6. Loop Invariant Code Motion

  • Description: Moves code that computes the same result in every iteration outside the loop.

  • Benefit: Reduces redundant computations inside loops.

Example:

for (int i = 0; i < n; i++) {
    x = a * b;
    arr[i] = x + i;
}

Optimized to:

x = a * b;
for (int i = 0; i < n; i++) {
    arr[i] = x + i;
}

๐Ÿ”น 7. Copy Propagation

  • Description: Replaces occurrences of variables that are copies of others.

  • Benefit: Enables further optimizations like dead code elimination.

Example:

b = a;
c = b + 1;

Optimized to:

c = a + 1;

๐Ÿ”น 8. Peephole Optimization

  • Description: Looks at small sliding windows (โ€œpeepholesโ€) of instructions to perform local optimizations.

  • Benefit: Simple and effective for final-stage optimization.

Example:

MOV R1, R2
MOV R2, R1

Optimized to: (No effect)

// Removed (they cancel each other)

Summary Table

TechniquePurposeKey Benefit
Constant FoldingEvaluate expressions at compile timeFaster execution
Constant PropagationReplace vars with constantsExposes more optimizations
Common Sub-expression Elim.Avoid recomputationReduces redundant code
Dead Code EliminationRemove unused codeSmaller, faster code
Strength ReductionUse cheaper operationsImproves performance
Loop Invariant Code MotionMove code out of loopsReduces loop overhead
Copy PropagationReplace variable copiesSimplifies expressions
Peephole OptimizationSmall local improvementsRefines final output
Link to original

Forward Reference

Forward Reference

Link to original

Absolute Loader

Absolute Loader

In an absolute loader the assembler generates the object file which can be stored on secondary storage instead of being directly loaded into the memory.

Along with each object file, the assembler also gives information about starting address and length of that object file.

Here the programmer does the allocation and linking functions explicitly for the loader

The loader simply does the task of loading the object file into the memory and initiating the execution.

Advantages

  • Since the assembler does not always reside in the main memory, it leaves more main memory space available to the user.

Disadvantages

  • The programmer must do memory management since he explicitly does the allocation and linking for the loader.
Link to original