# Reverse engineering example

## HacksLand | The computer science playground

Posted by Thilan Dissanayaka on Oct 17, 2019

So you want to learn Reverse engineering. That's great. RE is used in various topics such as malware analysis, exploit development, software cracking etc. In this document we are going to take a look at a reverse engineering example. First we write a simple program in C, next disassemble it and try to understand things in assembly level.

## What is reverse engineering & why we use it?

Before we continue in to reversing part, Let's clear some basics ideas of this topic. Reverse engineering is the process of disassembling a binary and understanding the structure of that program.Yo can refer "Compiling C programs" article to see what happen when we compiling a program. If we take it shortly following is the procedure.

First we write the code in a language like C, C++ etc. Let's assume we write a code to print something on a screen. We can use functions like printf(), putchar() etc. The C programming language tells us how we can use those functions and which data we should supply.

Next we use a compiler to build a binary from the source code. A binary is a collection of mashing instructions. There are various mashing instructions like MOV, SUB, ADD etc. Each of these instruction do a specific task. For an example , we use MOV instruction to move a data from one place to another place. So how we identify these mashing instructions? There is a unique number (Or a code) called opcode for every instruction.Let's take the instruction INT 0x80. This instruction is commonly used to give the control to kernel. Opcode for this instruction is cd 80. Actually cd represent the INT instruction and 80 is the argument(Or operand). Think about the MOV ECX, ESP . It'll copy data from ESP to ECX register. The opcode for this instruction is 89 el. Hear 89 represent copy data to ECX. el means we are copying data from ESP. We'll talk more on opcodes in our shellcoding tutorials. Till then just take a rough idea.

But how the compiler generate these mashing instructions? (Keep in mind a compiler is also a program writing in some language). Compiler knows how to build assembly instructions for a task. For an example if a high level program add two numbers , compiler build set of assembly instructions to do same task. It'll copy two numbers into two register and add them.

In following image you can see a memory layout. There are rows of bytes. Both CPU instructions and data are saved in the memory. In the left upper you can see some saved CPU instructions. A one byte is equal to 8 bits. In the bottom right you can see there is a set of data as 00, 6f, 6c etc. As Intel systems save data in little endian notation we can see the string is saved in reverse order.

Now the compilation process is done. What is reverse engineering? When we think about a compiled binary, it only contain mashing instructions as opcodes. So we can't get the source code from it. But a disassembler can extract those opcodes from the binary. After disassembler get assembly instructions related to those opcodes. It is so hard to understand a program by looking at opcodes. But Assembly instructions are little bit clear and close to humans. Words like MOV, ADD are readable than opcodes like 5f, 4c etc. So the disassembler generate a set of assembly instructions. We can read them and imaging what the source code does. I hope you got a basic idea about reverse engineering. So what we can do with reverse engineering? In malware analysis industry anti-virus guys use reverse engineering to understand the behavior of a malware. They don't have access to the source code of a malware. So they disassemble binary and look at what it does. In sometimes they can find vulnerabilities and week points of a malware. Then they write a patch and release it. This is a interesting topic to discuss more. Let's talk more about malware in future articles.

In exploit development we reverse a program and find vulnerabilities. If we fid a one we can write an exploit to get the advantage of that vulnerability.

## A simple program in C

First we write a simple C program. I think you can read the code and determine what it does.

``````
#include
#include
#include

int main(int argc, char const *argv[])
{
if (argc != 2)
{
");
exit(1);
}

if (atoi(argv) == 0)
{
printf("Input number is zero
");
}else{
printf("Input number is non-zero
");
}

return 0;
}
}``````

Hear you can see I used two if statements. First I checked if argc is 2 or not. If argc is not equal to two we know that user has not passed an number as a command line argument. If there is no argument provided we show an error massage and exit the program. If user has given a number as argument we continue the code.

After that checking we convert the input string to an integer using atoi() function. You may know that it stand for ASCII to integer [This is why we included stdlib.h header file]. Ok after that we check if user inputed number is equal to zero or not. Next we use printf() function to display the result. A very simple code. Now we compile it on a Linux mashing with gcc. In this example we are using a Linux distribution. But the theory is same on every OS. After we learn this we can simply understand assembly instructions on other platforms too.

``[email protected]:~/programming/c\$ gcc if.c -o if -mpreferred-stack-boundary=2``

I used an additional argument for gcc called -mpreferred-stack-boundary=2. It'll reduce some optimizations by the compiler.(Some stack padding alignments etc).

Let's run it and see what happens.

``````
[email protected]:~/programming/c\$ ./if
[email protected]:~/programming/c\$ ./if 2
Input number is non-zero
[email protected]:~/programming/c\$ ./if 0
Input number is zero
``````

It works differently when we supply different inputs.

## Disassembling the binary

Now we can use GDB to examine the inner working of the program.

``````
[email protected]:~/programming/c\$ gdb -q ./if
``````

Hear is the disassembly of main function.

``````Dump of assembler code for function main:
0x08048424 <main+0>: push ebp
0x08048425 <main+1>: mov ebp,esp
0x08048427 <main+3>: sub esp,0x4
0x0804842a <main+6>: cmp DWORD PTR [ebp+0x8],0x2
0x0804842e <main+10>: je 0x8048448 <main+36>
0x08048430 <main+12>: mov DWORD PTR [esp],0x8048540
0x08048437 <main+19>: call 0x8048350 <[email protected]>
0x0804843c <main+24>: mov DWORD PTR [esp],0x1
0x08048443 <main+31>: call 0x8048360 <[email protected]>
0x08048448 <main+36>: mov eax,DWORD PTR [ebp+0xc]
0x0804844e <main+42>: mov eax,DWORD PTR [eax]
0x08048450 <main+44>: mov DWORD PTR [esp],eax
0x08048453 <main+47>: call 0x8048340 <[email protected]>
0x08048458 <main+52>: test eax,eax
0x0804845a <main+54>: jne 0x804846a <main+70>
0x0804845c <main+56>: mov DWORD PTR [esp],0x8048556
0x08048463 <main+63>: call 0x8048350 <[email protected]>
0x08048468 <main+68>: jmp 0x8048476 <main+82>
0x0804846a <main+70>: mov DWORD PTR [esp],0x804856b
0x08048471 <main+77>: call 0x8048350 <[email protected]>
0x08048476 <main+82>: mov eax,0x0
0x0804847b <main+87>: leave
0x0804847c <main+88>: ret
End of assembler dump.``````

If you compile and disassemble the binary in a different mashing you may not see same disassembly as above. That is because compilers optimize the assembly code. But the main parts and logic is always the same.

push ebp , mov ebp,esp and sub esp,0x4 instructions are added by compiler and those are the set of function prologue instructions. I don't o to explain them in deeply because I posted separate tutorials for function prologue, function epilogue etc.

You can see a sub esp,0x4 instruction above. What it does? In our main function there is a local variable called int i. So above assembly command make a space in stack for that local variable.

## Understanding the logic of program

Let's focus on following couple of assembly instructions.

``````0x0804842a <main+6>: cmp DWORD PTR [ebp+0x8],0x2
0x0804842e <main+10>: je 0x8048448 <main+36>``````

First of all let's clear-out what is DWORD PTR [ebp+0x8]. You know main function is expecting two arguments called argc and argv. In assembly level we can access them with ebp as a offset. So ebp+0x8 is argc and ebp+0xc is argv.

Next we use cmp command with DWORD PTR [ebp+0x8] and 0x2 as arguments. the cmp instruction compare two values and save the result in EFLAGS register. As you know the EFLAGS register is 4 bytes(32 bits) long and has 32 flags. (Each bit is a flag) Each of these flag can be set or cleared. So if above two arguments of cmp instruction are equal a unique flag in EFLAGS register(ZF flag) will be set. That mean there is a flag to set if two arguments are equal, also there is another flag to set if they are not equal. If you want to learn more about EFLAGS register read this document. Now what je 0x8048448 instruction does? je stands for "Jump if equal" . This is totally depended on previous comparison. That means it will jump to given address if above two arguments are equal. But how je instruction know the result of previous instruction? . It looks in EFLAGS register and checks if corresponding flag is set or not. So if condition is met the execution jumps to given memory address (So next instruction will be in 0x8048448). If, condition is not met it will execute in normal flow (next instruction is in 0x08048430).

So what happening hear is following. If we don't supply arguments, cmp instruction tells argc is not equal to 2. so it doesn't set ZF flag in EFLAGS register. After that je instruction looks in ZF flag and when it determine the result of above cmp instruction it decide that condition is not met. So it don't jump to given address. So below four instructions will be executed.

``````0x08048430 <main+12>: mov DWORD PTR [esp],0x8048540
0x08048437 <main+19>: call 0x8048350 <[email protected]>
0x0804843c <main+24>: mov DWORD PTR [esp],0x1
0x08048443 <main+31>: call 0x8048360 <[email protected]>``````

What they do is simply exit the program with an error massage. We can find the string of error massage by examining the memory address 0x8048540.

``````
(gdb) x/s 0x8048540
``````

We push this memory adders to top of the stack and call puts function. but why? Puts function needs one argument (A pointer to a string). After that we put 0x1 in eax(This is the status value) and call exit function.

What if we supply a number as an argument to program?. Since cmp instruction set ZF flag in EFLAGS register je instruction will redirect execution to 0x8048448. So following set of instructions will be executed.

``````0x0804844b <main+39>: add eax,0x4
0x0804844e <main+42>: mov eax,DWORD PTR [eax]
0x08048450 <main+44>: mov DWORD PTR [esp],eax
0x08048453 <main+47>: call 0x8048340 <[email protected]>
0x08048458 <main+52>: test eax,eax
0x0804845a <main+54>: jne 0x804846a <main+70>
0x0804845c <main+56>: mov DWORD PTR [esp],0x8048556
0x08048463 <main+63>: call 0x8048350 <[email protected]>
0x08048468 <main+68>: jmp 0x8048476 <main+82>
0x0804846a <main+70>: mov DWORD PTR [esp],0x804856b
0x08048471 <main+77>: call 0x8048350 <[email protected]>
0x08048476 <main+82>: mov eax,0x0
0x0804847b <main+87>: leave
0x0804847c <main+88>: ret``````

So at the moment our first if statement is over. It decided the flow of program.

Now let's focus on next if command.

The following set of assembly instructions convert our input number to a integer Do you remember we learned in a our C programming tutorial that argv hold arguments in string form. So we used atoi (ASCII to Integer) function to convert it to an integer.

``````0x0804844b <main+39>: add eax,0x4
0x0804844e <main+42>: mov eax,DWORD PTR [eax]
0x08048450 <main+44>: mov DWORD PTR [esp],eax
0x08048453 <main+47>: call 0x8048340 <[email protected]>``````

Now eax holds our input in integer form. As the next step we can check whether eax is zero or not. But in assembly level how we do it? Let's see.

``````0x08048458 <main+52>: test eax,eax
0x0804845a <main+54>: jne 0x804846a <main+70>``````

What does above two instructions do? test is another assembly instruction that takes two arguments. test eax,eax instruction will set zf flag (zf flag will be 1) if eax is zero. If eax is not zero test instruction clears zf flag (it's value will be zero).

Now next instruction is jne. what it does? jne stands for Jump if not equal. You may remember that je instruction jumps to given address if zf flag is set(zf flag's value 1). jne is opposite of je. So jne will jump to given location if zf flag is not set.

Let's assume our input number is zero. Now eax holds zero. So test eax,eax will set zf flag. So jne check in zf flag and it it's value is 1. So it don't jump to given address and continues the normal flow. What happen next is execute following instructions. At the moment you can read and understand what they do.

``````0x0804845c <main+56>: mov DWORD PTR [esp],0x8048556
0x08048463 <main+63>: call 0x8048350 <[email protected]>
0x08048468 <main+68>: jmp 0x8048476 <main+82>``````

Let's examine what in 0x8048556.

``````
(gdb) x/s 0x8048556
0x8048556: "Input number is zero"
``````

Yes. It print out the string we hopped and jumps to 0x8048476. What's in 0x8048476?.

``````0x08048476 <main+82>: mov eax,0x0
0x0804847b <main+87>: leave
0x0804847c <main+88>: ret``````

Hear program exists normally.

Now what if our input is not zero?. Value of eax is not zero. So test eax,eax don't set zf flag. So jne instruction jumps to given memory address. So following set of instructions will be executed.

``````0x0804846a <main+70>: mov DWORD PTR [esp],0x804856b
0x08048471 <main+77>: call 0x8048350 <[email protected]>
0x08048476 <main+82>: mov eax,0x0
0x0804847b <main+87>: leave
0x0804847c <main+88>: ret``````

That path also print a string. Let's examine it too.

``````
(gdb) x/s 0x804856b
0x804856b: "Input number is non-zero"
``````

Yes. everything happens as expected. After printing it program exists normally.

Ok guys. Now I think you understand many things in this document. I'll write more interesting stuff in these topics. Thanks for reading.

Hi, I'm Thilan. An engineering student from SriLanka. I love to code with Python, JavaScript PHP and C. 