Assembly is a low level programming language. You already know that low level programming languages are close to mashings and very hard to understand by humans. We have already wrote some programs with languages like C, C++, Python etc. In compiling C programs article we talked what happens when we compile a computer program. The source code is translated to a set of binary instructions. Assembly is just a representation of those binary instructions But why we need Assembly language?.
Think that there is a CPU instruction as 10111000. This instruction may do some task like moving data form a place to another place or pop off the stack etc. Lets say it's hexadecimal value is b8. We can use both of above to represent the CPU instruction. When we compare binary and hexadecimal code you can see hexadecimal value is easy to remember and use.
Even more we can map some simple words to each CPU instruction. Lets think about moving some data. Imaging that hexadecimal code for moving some data is b8. We can assign the word MOV for that instruction. So whenever we want to use b8 we can use the word MOV instead of that hexadecimal value. That's more human friendly than the binary and hexadecimal representations .(Actually above instruction is equal to moving some data to eax register. We'll talk more about this later.)
Let's see some examples of these opcodes and Assembly instructions.
This instruction will interrupt the code execution and call kernel. Hear int stands for interrupt and 0x80 is an argument. The opcode for above Assembly instruction is cd 80.
This instruction will copy value 0x1 into the eax register (0x1 is the hexadecimal form of 1). The opcode for above Assembly instruction is b8 01 00 00 00. Hear b8 is the opcode for moving something to eax register. b8 01 00 00 00 says that we are moving the value 0x1 into the eax. Let's see another example. The following assembly instruction will push the value of edi register on to the stack.
mov eax, 0x1
It's opcode is 57.
Why we need to learn Assembly?
What is the usage of learning Assembly? If you are going to learn reverse engineering you must have a great understanding about the assembly language. In RE we don't have access to the source code of a program. But we can use a disassembled and get Assembly instructions from the binary . So if you know assembly you can imagine what it does. Then you can get an idea about the high level code and it's structure.
Also assembly is very helpful when we write a shell-code . A shell-code is a set of CPU instruction those used to get a payload on a system. Since we run the shell code directly on CPU without any compiling or linking it's purely written in opcodes. But it's so hard to write it in opcodes. So what we do is write the shell-code on assembly and convert it to opcodes.
In sometimes we need to write programs directly in assembly. For some micro computers such as real time monitoring devices, micro computers etc. A great advantage of programs written in assembly is there high performance and speed. Because we write those programs for a specific device. Also we write assembly programs with the hardware architecture on our mind.
Structure of an Assembly program
So I think you got a clear idea about what's assembly and for what we use it. Now we can start the our awesome journey of Assembly language. First of all let's see the architecture of a assembly program.
.intel_syntax noprefix .section .data .section .text .global _start _start: mov eax, 0x1 mov ebx, 0x5 int 0x80
At the top of the program there is a code line as .intel_syntax noprefix This line indicates the Assembly syntax we use. Hear we have used the Intel syntax. In many times I used the Intel syntax for Assembly.
In following we can write the same program in AT&T assembly syntax.
.section .data .section .text .globl _start _start: movl $1, %eax movl $5, %ebx int $0x80
You can clearly see some different points in above two syntax. At&T uses movl for mov . Next it put a percentage symbol in-front of register names such as %eax, %ebx etc. Also operand locations are different in Intel syntax and At&T assembly syntax. For and example in Intel syntax we use the instruction mov eax, 0x1 to move 1 into eax. So we put destination first and source location second. But in AT&T we use movl $1, %eax. Hear we use source location first and destination second
Personally I prefer using Intel syntax for Assembly because it looks like a clean code.
Next in the program we can see some sections. First there is a section called data. We use that section to store our data those we use in the program. These data are variables, constants, strings etc. Since above program is a very little and simple one it doesn't use any data in the data section. We can see how to use that section in later articles.
Next there is a section as text. This is where we put our program instructions. In this section we start to write our assembly instructions like mov eax etc.
Assemble and run a program
Now let's see how we can make a binary with above code. We call this process as assembling the program. We can use a assembler like NASM/AS for this purpose. Hear I use the AS assembler that packed with GNU tool set.
Let's see how to assemble and link it.
We can just run the program by entering the following command in a terminal.
Hear you can see what happen when we run it. Since this program does not print any thin , we can use echo $? command to see the output status value.
Now if we want to see opcodes of the binary fie we can use objdump tool. This tool is also packed with GNU toolkit.
Sot that's all for this document. In next articles we are going to dive into the deep of Assembly. I hope to write articles on functions, file handling, sockets etc too.