GDB reverse engineering tutorial
Today, I selected an interesting topic to discuss. Here, we are going to disassemble a binary file and take a look at what it does. This process is called reverse engineering. Let's run the program and figure out its functionality.
user@protostar:~$ ./rev
HacksLand
user@protostar:~$
It just prints a string "HacksLand" and simply exits. Can you imagine what type of code this is? We can assume it might look like the following. We don't know for sure, but let's imagine:
#include <stdio.h>
int main(){
printf("HacksLand\n");
return 0;
}
Now let's start our actual reversing process. We can use GDB for this. If we are in a Windows environment, we can use IDA. I posted a tutorial on how to use GDB, so please take a look at it if you're not familiar with GDB.
user@protostar:~$ gdb -q ./rev
Reading symbols from /home/user/rev...(no debugging symbols found)...done.
(gdb) set disassembly-flavor intel
(gdb) disass main
Dump of assembler code for function main:
0x080483c4 : push ebp
0x080483c5 : mov ebp,esp
0x080483c7 : sub esp,0x10
0x080483ca : mov DWORD PTR [ebp-0xc],0x2
0x080483d1 : mov DWORD PTR [ebp-0x8],0x3
0x080483d8 : mov eax,DWORD PTR [ebp-0x8]
0x080483db : mov edx,DWORD PTR [ebp-0xc]
0x080483de : lea eax,[edx+eax*1]
0x080483e1 : mov DWORD PTR [ebp-0x4],eax
0x080483e4 : cmp DWORD PTR [ebp-0x4],0x7
0x080483e8 : jg 0x80483f8
0x080483ea : mov DWORD PTR [esp],0x80484d0
0x080483f1 : call 0x80482f8
0x080483f6 : jmp 0x8048404
0x080483f8 : mov DWORD PTR [esp],0x80484da
0x080483ff : call 0x80482f8
0x08048404 : mov eax,0x0
0x08048409 : leave
0x0804840a : ret
End of assembler dump.
First, I switched to Intel syntax and disassembled the main
function. You know that the first two assembly instructions are responsible for setting up the stack frame. If you missed our tutorial on Stack and Functions, please read it to get a clear idea about these instructions.
Next, there is a sub esp
instruction. It allocates space for local variables. These few instructions are very common, and you can see them in every disassembly.
Next, there are two interesting assembly lines:
0x080483ca : mov DWORD PTR [ebp-0xc],0x2
0x080483d1 : mov DWORD PTR [ebp-0x8],0x3
What do they do? The first instruction copies 0x2
to an address pointed to by ebp-0xc
. So now ebp-0xc
in the stack contains 2
(0x2 in hexadecimal is equal to 2 in decimal). The hex value 0x3
is copied to ebp-0x8
as well.
Do you know what happened here? In the stack tutorial, I explained this. The main
function copies some data to the allocated local variable's space on the stack. There should be at least two integer variables in main
. Here, I have set a breakpoint before the above two commands get executed. So we can examine the stack before and after those instructions run.
(gdb) b *0x080483ca
Breakpoint 1 at 0x80483ca
(gdb) run
Starting program: /home/user/rev
Breakpoint 1, 0x080483ca in main ()
(gdb) x/x $ebp-0xc
0xbffff7dc: 0xb7fd7ff4
(gdb) ni
0x080483d1 in main ()
(gdb) x/x $ebp-0xc
0xbffff7dc: 0x00000002
(gdb) x/x $ebp-0x8
0xbffff7e0: 0x08048420
(gdb) ni
0x080483d8 in main ()
(gdb) x/x $ebp-0x8
0xbffff7e0: 0x00000003
Now it is crystal clear that 2 and 3 were copied to the stack. Let's see what happens next. Here we have another couple of instructions:
0x080483d8 : mov eax,DWORD PTR [ebp-0x8]
0x080483db : mov edx,DWORD PTR [ebp-0xc]
These two instructions will copy 2 and 3 to registers. Yes, 2 into edx
and 3 into eax
. [ebp-0x8]
contains 3, and [ebp-0xc]
contains 2. We can see this in GDB:
(gdb) i r eax edx
eax 0xbffff894 -1073743724
edx 0x1 1
(gdb) ni
0x080483db in main ()
(gdb) i r eax edx
eax 0x3 3
edx 0x1 1
(gdb) ni
0x080483de in main ()
(gdb) i r eax edx
eax 0x3 3
edx 0x2 2
Ok. At this moment, eax
is filled with 3, and edx
contains 2. After this, we see the assembly instruction:
0x080483de : lea eax,[edx+eax*1]
lea
stands for "load effective address." It simply adds eax
and edx
. The result is saved in eax
. Let's see if this is true:
(gdb) i r eax
eax 0x3 3
(gdb) ni
0x080483e1 in main ()
(gdb) i r eax
eax 0x5 5
3 + 2 = 5. So 5 was copied to eax
. Did you notice something unclear here? We copied 2 and 3 into the stack, then copied them from the stack to registers. Why didn't we do that directly? We could have copied these two values directly into eax
and edx
. This is because those are variables. Variable values must be copied to the allocated space on the stack.
Now we can imagine the source code for these steps will be something like the following:
#include <stdio.h>
int main(){
int x=2;
int y=3;
int z;
z = x + y;
return 0;
}
Yes, at this moment, we skipped the string printing part. We only focused on the disassembly we understood so far.
0x080483e1 : mov DWORD PTR [ebp-0x4],eax
0x080483e4 : cmp DWORD PTR [ebp-0x4],0x7
In the first of the above, we save our calculated value of 2 + 3 into ebp-0x4
. After that, there is an interesting command called cmp
. It compares that value with 0x7
, actually checking 5 and 7. We know 7 is greater than 5.
Finally, let's see the instructions:
0x080483e8 : jg 0x80483f8
0x080483ea : mov DWORD PTR [esp],0x80484d0
0x080483f1 : call 0x80482f8
In the first instruction, jg
means "jump if greater." This only works if the first operand is greater than the second. We know that 5 is less than 7, so it won't jump.
It copies a memory address into esp
and makes a call to the given address. That address contains the memory address of printf
. Yes, finally, we reached our goal. Our original program is:
#include <stdio.h>
int main(){
int x=2;
int y=3;
int z;
z = x + y;
if (z < 7){
printf("HacksLand\n");
}
return 0;
}
Explorer the world of cyber security. Read some cool articles on System exploitation, Web application hacking, exploit development, malwara analysis, Cryptography etc.