Dec 26, 2024

GDB reverse engineering tutorial

Today I selected an interesting topic to discuss. Here we are going to disassemble a binary file and take a look at what it does. We call it reverse engineering. Let's run the program and figure out its outlook.
user@protostar:~$ ./rev
HacksLand
user@protostar:~$
It just prints a string "HacksLand" and simply exits. Can you imagine what type of code is this? We can think it will be like the following. We don't know it correctly. Just imagine.
#include<stdio.h>
int main(){
  printf("HacksLand\n");
  return 0;
}
Now let's start our actual reversing process. We can use GDB for this. If we are in a Windows environment we can use IDA. I posted a tutorial on how to use GDB. Please take a look at it if you know nothing about GDB.
user@protostar:~$ gdb -q ./rev
Reading symbols from /home/user/rev...(no debugging symbols found)...done.
(gdb) set disassembly-flavor intel
(gdb) 
(gdb) disass main
Dump of assembler code for function main:
0x080483c4 <main+0>:	push   ebp
0x080483c5 <main+1>:	mov    ebp,esp
0x080483c7 <main+3>:	sub    esp,0x10
0x080483ca <main+6>:	mov    DWORD PTR [ebp-0xc],0x2
0x080483d1 <main+13>:	mov    DWORD PTR [ebp-0x8],0x3
0x080483d8 <main+20>:	mov    eax,DWORD PTR [ebp-0x8]
0x080483db <main+23>:	mov    edx,DWORD PTR [ebp-0xc]
0x080483de <main+26>:	lea    eax,[edx+eax*1]
0x080483e1 <main+29>:	mov    DWORD PTR [ebp-0x4],eax
0x080483e4 <main+32>:	cmp    DWORD PTR [ebp-0x4],0x7
0x080483e8 <main+36>:	jg     0x80483f8 <main+52>
0x080483ea <main+38>:	mov    DWORD PTR [esp],0x80484d0
0x080483f1 <main+45>:	call   0x80482f8 <puts@plt>
0x080483f6 <main+50>:	jmp    0x8048404 <main+64>
0x080483f8 <main+52>:	mov    DWORD PTR [esp],0x80484da
0x080483ff <main+59>:	call   0x80482f8 <puts@plt>
0x08048404 <main+64>:	mov    eax,0x0
0x08048409 <main+69>:	leave  
0x0804840a <main+70>:	ret    
End of assembler dump.
(gdb)
First I switched to Intel syntax and disassembled the main function. You know that the first two assembly instructions are responsible for making the stack frame. If you missed our tutorial Stack and functions, please read it so you can get a clear idea about these instructions. Next, there is a sub esp. It'll allocate space for local variables. These few instructions are too common and you can see them in every disassemble. Next, there are two interesting assembly lines.
0x080483ca <main+6>:	mov    DWORD PTR [ebp-0xc],0x2
0x080483d1 <main+13>:	mov    DWORD PTR [ebp-0x8],0x3
What do they do? The first instruction will copy 0x2 to an address pointed by ebp-0xc. So now ebp-0xc in the stack contains 2. (0x2 in hexadecimal is equal to 2 in decimal). The hex value 0x3 is copied to ebp-0x8 also. Do you know what happened here? In the stack tutorial, I explained this. The main function copies some data to allocate the local variable's space on the stack. Actually, there should be a minimum of two integer variables in the main. Here I have set a breakpoint before the above two commands get executed. So we can examine the stack before and after those instructions run.
(gdb) b *0x080483ca
Breakpoint 1 at 0x80483ca
(gdb) run
Starting program: /home/user/rev 

Breakpoint 1, 0x080483ca in main ()
(gdb) x/x $ebp-0xc
0xbffff7dc:	0xb7fd7ff4
(gdb) ni
0x080483d1 in main ()
(gdb) x/x $ebp-0xc
0xbffff7dc:	0x00000002
(gdb) x/x $ebp-0x8
0xbffff7e0:	0x08048420
(gdb) ni
0x080483d8 in main ()
(gdb) x/x $ebp-0x8
0xbffff7e0:	0x00000003
(gdb)
Now it is crystal clear that 2 and 3 were copied to stack. Let's see what to do next. Here we have another couple of instructions.
0x080483d8 <main+20>:	mov    eax,DWORD PTR [ebp-0x8]
0x080483db <main+23>:	mov    edx,DWORD PTR [ebp-0xc]
Two of them will copy 2,3 to registers. Yes, 2 into edx and 3 into eax. [ebp-0x8 contains 3 and ebp-0xc contains 2]. We can see this on GDB Let's see how.
(gdb) i r eax edx
eax            0xbffff894	-1073743724
edx            0x1	1

(gdb) ni
0x080483db in main ()

(gdb) i r eax edx
eax            0x3	3
edx            0x1	1

(gdb) ni
0x080483de in main ()

(gdb) i r eax edx
eax            0x3	3
edx            0x2	2
(gdb)
Ok. At this moment eax is filled with 3 and there is 2 on edx. After this, we can see assembly instructions.
0x080483de <main+26>:	lea    eax,[edx+eax*1]
lea stands for load effective address. simply it'll add eax and edx. After the result will be saved in eax. Let's see if this is true or false.
(gdb) i r eax
eax            0x3	3
(gdb) ni
0x080483e1 in main ()
(gdb) i r eax
eax            0x5	5
(gdb)
3+2 =5. So 5 was copied to eax. Did you notice some unclear situations hear? We copied 2 and 3 into the stack. after we copied them from the stack to registers. Why we didn't do that directly? We could copy these two values into eax and edx . This is because those are variables. variable values must copy to the stack's received space. Now we can imagine the source code for these steps will be something like the following.
#include
int main(){
      int x=2;
      int y=3;
      int z;
      z = x + y;
return 0;
}
Yes at this moment we skipped the string printing We only focused on the disassembly we understand till now.
0x080483e1 <main+29>:	mov    DWORD PTR [ebp-0x4],eax
0x080483e4 <main+32>:	cmp    DWORD PTR [ebp-0x4],0x7
In the first one of the above, we save our calculated value of 2+3 into ebp-0x4. After that, there is an interesting command called cmp. It'll compare that value with 0x7. Actually, it checks 5 and 7. We know 7 is greater than 5. so the result will be saved in the eflag register. Next, we have a set of instructions to see.
0x080483e4 <main+32>:	cmp    DWORD PTR [ebp-0x4],0x7
0x080483e8 <main+36>:	jg     0x80483f8 <main+52>
0x080483ea <main+38>:	mov    DWORD PTR [esp],0x80484d0
0x080483f1 <main+45>:	call   0x80482f8 <puts@plt>
0x080483f6 <main+50>:	jmp    0x8048404 <main+64>
0x080483f8 <main+52>:	mov    DWORD PTR [esp],0x80484da
0x080483ff <main+59>:	call   0x80482f8 <puts@plt>
The jg instruction actually stands for "jump if greater" and it is used for conditional jumps. This command uses the previous command's result to decide what to do. If cmp 's first argument is greater than the second one it will jump to 0x80483f8. Actually jumping to that location means setting eip to that address. So CPU begins executing what instruction is found at that location. If that condition is not met it will do nothing and go to the next instruction. So in this situation, 5 is not greater than 7. So the condition is not met. Therefore it doesn't jump to that location. Now CPU begins to execute the next instructions. First, it will put an address to the top of the stack (esp) and call a function puts. What does puts function do? It'll print a string to the screen. It'll take an argument (a pointer to a string). So here we put the address of a string to stack. This is what we learned in the stack tutorial. Before we call a function we put arguments into the stack. Now we can use that address to determine what's string this program going to print. Let's do it.
(gdb) x/x 0x80484d0
0x80484d0:	0x6b636148

(gdb) x/15bx 0x80484d0
0x80484d0:	0x48	0x61	0x63	0x6b	0x73	0x4c	0x61	0x6e
0x80484d8:	0x64	0x00	0x53	0x72	0x69	0x4c	0x61
(gdb)
Let's see the ascii values for these raw bytes. 0x48 - H , 0x61 -a , 0x63 -c 0x6b -s ... You can imagine what it says. Wait, man. there is an easy method. We can use x/s command to examine memory in string format. examine memory in GDB Next, we want to see what this binary is expected to do if our calculated sum of two numbers were greater than 7. If that condition is true it will jump to 0x080483f8 . There is another string to push on the stack. Let's see what's that. A lovely text. Isn't it? Now our task is over. Let's analyze all of these and see what's happening here. First, it calculated the sum of 2 and 3. After this program compared that value with 8. if the calculated value is less than 8 It'll print "HacksLand".If not it will print Sri Lanka. Now we can build a C program to do all of the above work. Actually, we already coded a program to calculate the sum. Let's develop it to print strings.
#include
int main(){
        int x=2;
        int y=3;
        int z;
        z = x + y;

        if(z < 8){
                printf("HacksLand\n");
        }else{
                printf("SriLanka\n");
        }

return 0;
}
Now our reversing process is completed. Let's see the actual source of this program. We use the cat command for this. original-source-code I hope you learned a lot of things from the tutorial. In the next tutorials, we can go more deeply into this topic. I'll make tutorials for the windows environment too. Be ready to play with IDA. Also, I kept one thing for you to think about. In our disassembly, we saw assembly codes compare values with 7. But in C code we wrote if statement for 8. What's the difference? Let's discuss more this situation in our forum. We have a revere engineering section to talk about things like this. Thank you for reading. :-)

ABOUT HACKSLAND

Well explained and interesting cyber security articles and tutorials on the topics such as System exploitation, Web application hacking, exploit development, malwara analysis, Cryptography etc. Let's explorer the awesome world of computer

CATEGORIES
SOCIAL
RANDOM ARTICLES