Today I selected an interesting topic to discus. Hear we are going to disassemble a binary file and take a look at what it does. We call it reverse engineering. Let's run the program and figure out it's out look.
user@protostar:~$ ./rev
HacksLand
user@protostar:~$
It just print a string "HacksLand" and simply exit. Can you imagine what type of code is this. We can think it will be like following. We don't know it correctly. Just imagine.
#include<stdio.h>
int main(){
  printf("HacksLand\n");
  return 0;
}
Now let's start our actual reversing process. We can use GDB for this. If we are in a Windows environment we can use IDA. I posted a tutorial on how to use GDB. Please take a look at it if you know nothing about GDB.
user@protostar:~$ gdb -q ./rev
Reading symbols from /home/user/rev...(no debugging symbols found)...done.
(gdb) set disassembly-flavor intel
(gdb) 
(gdb) disass main
Dump of assembler code for function main:
0x080483c4 <main+0>:	push   ebp
0x080483c5 <main+1>:	mov    ebp,esp
0x080483c7 <main+3>:	sub    esp,0x10
0x080483ca <main+6>:	mov    DWORD PTR [ebp-0xc],0x2
0x080483d1 <main+13>:	mov    DWORD PTR [ebp-0x8],0x3
0x080483d8 <main+20>:	mov    eax,DWORD PTR [ebp-0x8]
0x080483db <main+23>:	mov    edx,DWORD PTR [ebp-0xc]
0x080483de <main+26>:	lea    eax,[edx+eax*1]
0x080483e1 <main+29>:	mov    DWORD PTR [ebp-0x4],eax
0x080483e4 <main+32>:	cmp    DWORD PTR [ebp-0x4],0x7
0x080483e8 <main+36>:	jg     0x80483f8 <main+52>
0x080483ea <main+38>:	mov    DWORD PTR [esp],0x80484d0
0x080483f1 <main+45>:	call   0x80482f8 <puts@plt>
0x080483f6 <main+50>:	jmp    0x8048404 <main+64>
0x080483f8 <main+52>:	mov    DWORD PTR [esp],0x80484da
0x080483ff <main+59>:	call   0x80482f8 <puts@plt>
0x08048404 <main+64>:	mov    eax,0x0
0x08048409 <main+69>:	leave  
0x0804840a <main+70>:	ret    
End of assembler dump.
(gdb)
First I switched to Intel syntax and disassembled the main function. You know that first two of assembly instructions are responsible for making the stack frame. If you missed our tutorial Stack and functions, please read it so you can get a clear idea about these instructions. Next there is a sub esp. It'll allocate space for local variables. These few instructions are too common and you can see them in every disassemble. Next there are two interesting assembly lines.
0x080483ca <main+6>:	mov    DWORD PTR [ebp-0xc],0x2
0x080483d1 <main+13>:	mov    DWORD PTR [ebp-0x8],0x3
What they do? First instruction will copy 0x2 to a address pointed by ebp-0xc. So now ebp-0xc in the stack  contains 2. (0x2 in hexadecimal is equal to 2 in decimal) . The hex value 0x3 is copied to ebp-0x8 also. Do you know what happened hear. In stack tutorial I explained this. Main function copies some data to allocated local variable's space on stack. Actually there should be minimum two integer variables in main. Hear I have set a break point before above two commands get executed. So we can examine stack before and after those instructions run.
(gdb) b *0x080483ca
Breakpoint 1 at 0x80483ca
(gdb) run
Starting program: /home/user/rev 

Breakpoint 1, 0x080483ca in main ()
(gdb) x/x $ebp-0xc
0xbffff7dc:	0xb7fd7ff4
(gdb) ni
0x080483d1 in main ()
(gdb) x/x $ebp-0xc
0xbffff7dc:	0x00000002
(gdb) x/x $ebp-0x8
0xbffff7e0:	0x08048420
(gdb) ni
0x080483d8 in main ()
(gdb) x/x $ebp-0x8
0xbffff7e0:	0x00000003
(gdb)
Now it is crystal clear that 2 and 3 was copied to stack. Let's see what to do next. Hear we have another couple of instructions.
0x080483d8 <main+20>:	mov    eax,DWORD PTR [ebp-0x8]
0x080483db <main+23>:	mov    edx,DWORD PTR [ebp-0xc]
Two of them will copy 2,3 to registers. Yes 2 into edx and 3 into eax. [ebp-0x8 contains 3 and ebp-0xc contains 2] . We can see this on GDB Let's see how.
(gdb) i r eax edx
eax            0xbffff894	-1073743724
edx            0x1	1

(gdb) ni
0x080483db in main ()

(gdb) i r eax edx
eax            0x3	3
edx            0x1	1

(gdb) ni
0x080483de in main ()

(gdb) i r eax edx
eax            0x3	3
edx            0x2	2
(gdb)
Ok . At this moment eax is filled with 3 and there is 2 on edx. After this we can see a assembly instruction.
0x080483de <main+26>:	lea    eax,[edx+eax*1]
lea stands for load effective address. simply it'll add eax and edx. After result will be saved in eax. Let's see this is true or false.
(gdb) i r eax
eax            0x3	3
(gdb) ni
0x080483e1 in main ()
(gdb) i r eax
eax            0x5	5
(gdb)
3+2 =5 . So 5 was copied to eax. Did you notice some unclear situation hear? We copied 2 and 3 into stack. after we copied them from stack to registers. Why we didn't do that directly? We could copy these two values into eax and edx . This is because those are variables. variable values must copy to stack's received space. Now we can imagine the source code for this steps will be something like following.
#include<stdio.h>
int main(){
      int x=2;
      int y=3;
      int z;
      z = x + y;
return 0;
}
Yes at this moment we skipped the string printing We only focused on the disassembly we understand till now.
0x080483e1 <main+29>:	mov    DWORD PTR [ebp-0x4],eax
0x080483e4 <main+32>:	cmp    DWORD PTR [ebp-0x4],0x7
In first one of above we save our calculated value of 2+3 into ebp-0x4. After that there is an interesting command called cmp. It'll compare that value with 0x7. Actually it checks 5 and 7. We know 7 is greater than 5. so result will be saved in a eflag register. Next we have a set of instructions to see.
0x080483e4 <main+32>:	cmp    DWORD PTR [ebp-0x4],0x7
0x080483e8 <main+36>:	jg     0x80483f8 <main+52>
0x080483ea <main+38>:	mov    DWORD PTR [esp],0x80484d0
0x080483f1 <main+45>:	call   0x80482f8 <puts@plt>
0x080483f6 <main+50>:	jmp    0x8048404 <main+64>
0x080483f8 <main+52>:	mov    DWORD PTR [esp],0x80484da
0x080483ff <main+59>:	call   0x80482f8 <puts@plt>
The jg instruction actually stands for "jump if greater" and it  is used for conditional jumps. This command uses previous command's result to deride what to do. If cmp 's first argument is greater than second one it will jump to 0x80483f8. Actually jumping to that location means setting eip to that address. So CPU begins executing what instruction found at that location. If that condition is not met it will do nothing and go to next instruction. So in this situation, 5 is not greater than 7. So condition is not met. Therefore it doesn't  jump to that location. Now CPU begins to execute next instructions. First it will put a address to top of the stack (esp) and call a function puts. What puts function does? It'll print a string to the screen. It'll take an argument (a pointer to string) . So hear we put address of a string to stack.This is what we learned at stack tutorial. Before we call a function we put arguments to stack. Now we can use that address to determine what's the string this program going to print. Let's do it.
(gdb) x/x 0x80484d0
0x80484d0:	0x6b636148

(gdb) x/15bx 0x80484d0
0x80484d0:	0x48	0x61	0x63	0x6b	0x73	0x4c	0x61	0x6e
0x80484d8:	0x64	0x00	0x53	0x72	0x69	0x4c	0x61
(gdb)
Let's see the ascii values for these raw bytes. 0x48 - H , 0x61 -a , 0x63 -c 0x6b -s ... You can imaging what it says. Wait man. there is an easy method. We can use x/s command to examine memory in string format. examine memory in GDB Next we want to see what this binary expected to do if our calculated sum of two numbers were greater than 7. If that condition true it will jump to 0x080483f8 . There is another string to push on stack . Let's see what's that. A lovely text . Isn't it? Now our task is over. Let's analysis all of these and see what's happening hear. First it calculated the sum of 2 and 3. After this program compared that value with 8. if calculated value is less than 8 It'll print "HacksLand" .If not it will print SriLanka. Now we can build a C program to do all of above work. Actually we already coded a program to calculate the sum. Let's develop it to print strings.
#include<stdio.h>
int main(){
        int x=2;
        int y=3;
        int z;
        z = x + y;

        if(z < 8){
                printf("HacksLand\n");
        }else{
                printf("SriLanka\n");
        }

return 0;
}
Now our reversing process is completed.Let's see the actual source of this program. We use cat command for this. original-source-code I hope you learned lot of things from the tutorial. In next tutorials we can go more deep into this topic. I'll make tutorials for windows environment too. Be ready to play with IDA. Also I kept one thing for you to think about . In our disassembly we saw assembly codes compare value with 7. But in C code we wrote if command for 8. What's the difference? Let's discuss more about this situation in our forum . We have a revere engineering section to talk things like this. Thank you for reading. :-)