Thilan Dissanayaka Low level Development Mar 23

Understanding Assembly Language: Purpose and Structure

Assembly language is a low-level programming language that provides a human-readable representation of a computer's binary instructions. Unlike high-level languages like C, C++, or Python, which are abstracted for ease of use, assembly language closely mirrors the machine code executed by a processor. This article explores why assembly language is essential, its structure, and practical examples to illustrate its use.

What is Assembly Language?

At its core, assembly language maps machine code instructions (e.g., binary 10111000 or hexadecimal b8) to mnemonic codes that are easier for humans to understand. For instance, the instruction b8 might represent moving data to a register, which can be written as MOV in assembly. This abstraction makes programming at the machine level more manageable than working with raw binary or hexadecimal values.

Here’s an example of an assembly instruction:

int 0x80

This instruction triggers a system call, which we’ll explore later.

Why Learn Assembly Language?

Assembly language is critical in several domains, particularly in performance-critical applications and cybersecurity. Here are key reasons to learn it:

Reverse Engineering: In cybersecurity and ethical hacking, assembly is indispensable. Reverse engineering involves analyzing a program’s binary without access to its source code. A disassembler translates the binary into assembly instructions, allowing you to infer the program’s logic and structure. Proficiency in assembly enables you to understand and manipulate these instructions effectively.
Shellcode Development: Shellcode is a sequence of machine instructions used to deliver payloads in exploits. Since shellcode runs directly on the CPU without compilation, it’s typically written in assembly and converted to opcodes (machine code). Writing shellcode in assembly is far more practical than using raw opcodes.
High-Performance Programming: Assembly is used for low-level programming in resource-constrained environments, such as embedded systems or real-time monitoring devices. Programs written in assembly are optimized for specific hardware, offering superior performance and speed compared to high-level languages.
Deep Hardware Understanding: Learning assembly provides insight into how CPUs and hardware operate, enhancing your ability to write efficient code in any language.

Structure of an Assembly Program

An assembly program is organized into sections that separate data and instructions. Below is an example of a simple program using Intel syntax, followed by its equivalent in AT&T syntax.

Example: Intel Syntax

.global _start .intel_syntax noprefix

section .data section .bss

section .text _start: mov eax, 0x1 mov ebx, 0x5 int 0x80

Example: AT&T Syntax

.globl _start .section .data .section .bss .section .text _start: movl $1, %eax movl $5, %ebx int $0x80

Key Differences Between Intel and AT&T Syntax

Register Naming: AT&T syntax prefixes registers with % (e.g., %eax), while Intel syntax does not (e.g., eax).
Operand Order: In Intel syntax, the destination operand comes first (e.g., mov eax, 0x1), while AT&T places the source first (e.g., movl $1, %eax).
Memory Addressing: AT&T uses -0x4(%ebp) for indirect memory access, while Intel uses DWORD PTR [ebp - 0x4].
Instruction Suffixes: AT&T often appends suffixes like l (long) to instructions (e.g., movl), while Intel does not.

Intel syntax is often preferred for its cleaner formatting, while AT&T is the default in tools like GDB and objdump, as well as many textbooks.

Program Sections

.data: Stores initialized data, such as variables, constants, or strings.
- DB: Allocates 1 byte.
- DW: Allocates 2 bytes (word).
- DD: Allocates 4 bytes (doubleword).
- DQ: Allocates 8 bytes (quadword).
- DT: Allocates 10 bytes.
.bss: Reserves space for uninitialized data.
- RESB: Reserves 1 byte.
- RESW: Reserves 2 bytes.
- RESD: Reserves 4 bytes.
- RESQ: Reserves 8 bytes.
- REST: Reserves 10 bytes.
.text: Contains the program’s executable instructions.

System Calls in Assembly

Assembly programs often interact with the operating system via system calls, which are handled by the kernel. For example, Linux uses system calls to perform tasks like printing to the screen or exiting a program. Each system call has a unique number, and the process involves three steps:

Load the System Call Number: Store the system call’s unique number in the eax register. For Linux, these numbers are listed in /usr/include/asm/unistd.h.
Load Arguments: Place arguments in registers like ebx, ecx, edx, esi, or edi, or use the stack for additional arguments.
Invoke the Kernel: Use the int 0x80 instruction to transfer control to the kernel, which executes the system call.

Example 1: Exit System Call

The following program exits with a status code of 5:

.global _start .intel_syntax noprefix

section .data section .bss

section .text _start: mov eax, 0x1 # System call number for exit mov ebx, 0x5 # Exit status code int 0x80 # Invoke kernel

After running this program, you can check the exit status in a Linux terminal with echo $?, which outputs 5.

Example 2: Adding Two Numbers

This program adds two numbers and exits with the result as the status code:

.global _start .intel_syntax noprefix

section .data section .bss

section .text _start: mov eax, 0x1 # Load 1 into eax mov ecx, 0x2 # Load 2 into ecx add eax, ecx # Add ecx to eax (result in eax) mov ebx, eax # Copy result to ebx for exit status mov eax, 0x1 # System call number for exit int 0x80 # Invoke kernel

This program adds 1 and 2, storing the result (3) in eax, then copies it to ebx for the exit status. Running echo $? after execution outputs 3.

Assembling and Running an Assembly Program

To convert assembly code into an executable binary, you use an assembler like NASM. Here’s how to assemble and run the above programs using NASM on a Linux system with a 64-bit Intel CPU:

List Files:
```
ls
# Output: file.asm
```
Assemble the Code:
```
nasm -f elf64 file.asm -o file.o
```
The -f elf64 flag specifies the 64-bit ELF format, and -o file.o names the output object file.
Link the Object File:
```
ld file.o -o file
```
The ld linker creates the executable file.

Run the Program:

./file
echo $?
# Output: 5 (for the exit program) or 3 (for the addition program)

The echo $? command displays the program’s exit status.

Conclusion

Assembly language bridges the gap between human-readable code and machine instructions, making it essential for tasks requiring direct hardware interaction, such as reverse engineering, shellcode development, and high-performance programming. By understanding its structure—sections like .data, .bss, and .text—and mastering system calls, you can write efficient programs tailored to specific hardware. Whether using Intel or AT&T syntax, assembly empowers you to work at the machine level, offering unmatched control and insight into computing processes.