Hi guys, Today I'm going to explain you another tutorial on reverse engineering. Hear we are trying to understand an if statement in assembly level. First we write a simple C program. I think you can read the code and determine what it does.

#include <stdio.h>
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char const *argv[])
{
	if (argc != 2)
	{
		printf("Please input a number\n");
		exit(1);
	}

	if (atoi(argv[1]) == 0)
	{
		printf("Input number is zero\n");
	}else{
		printf("Input number is non-zero\n");
	}

	return 0;
}
}

Hear you can see I used two if statements. First I checked if argc is 2 or not. If argc is not equal to two we know that user has not passed an number as a command line argument. If there is no argument provided we show an error massage and exit the program. If user has given a number as argument we continue the code.

After that checking we convert the input string to an integer using atoi() function. You may know that it stand for ASCII to integer[This is why we included stdlib.h header file]. Ok after that we check if user inputed number is equal to zero or not. Next we use printf() function to display the result. A very simple code. Now we compile it on a Linux mashing with gcc. In this example we are using a linux distro. But the theory is same on every OS. After we learn this we can simply understand assembly instructions on other platforms too.

thilan@bt:~/programming/c$ gcc if.c -o if -mpreferred-stack-boundary=2

I used an additional argument for gcc called -mpreferred-stack-boundary=2. It'll reduce some optimizations by the compiler.(Some stack padding alignments etc).

Let's run it and see what happens.

thilan@bt:~/programming/c$ ./if
Please input a number
thilan@bt:~/programming/c$ ./if 2
Input number is non-zero
thilan@bt:~/programming/c$ ./if 0
Input number is zero

It works differently when we supply different inputs. Now we can use GDB to examine the inner working of the program.

thilan@bt:~/programming/c$ gdb -q ./if

Hear is the disassembly of main function.

Dump of assembler code for function main:
0x08048424 <main+0>: push ebp
0x08048425 <main+1>: mov ebp,esp
0x08048427 <main+3>: sub esp,0x4
0x0804842a <main+6>: cmp DWORD PTR [ebp+0x8],0x2
0x0804842e <main+10>: je 0x8048448 <main+36>
0x08048430 <main+12>: mov DWORD PTR [esp],0x8048540
0x08048437 <main+19>: call 0x8048350 <puts@plt>
0x0804843c <main+24>: mov DWORD PTR [esp],0x1
0x08048443 <main+31>: call 0x8048360 <exit@plt>
0x08048448 <main+36>: mov eax,DWORD PTR [ebp+0xc]
0x0804844b <main+39>: add eax,0x4
0x0804844e <main+42>: mov eax,DWORD PTR [eax]
0x08048450 <main+44>: mov DWORD PTR [esp],eax
0x08048453 <main+47>: call 0x8048340 <atoi@plt>
0x08048458 <main+52>: test eax,eax
0x0804845a <main+54>: jne 0x804846a <main+70>
0x0804845c <main+56>: mov DWORD PTR [esp],0x8048556
0x08048463 <main+63>: call 0x8048350 <puts@plt>
0x08048468 <main+68>: jmp 0x8048476 <main+82>
0x0804846a <main+70>: mov DWORD PTR [esp],0x804856b
0x08048471 <main+77>: call 0x8048350 <puts@plt>
0x08048476 <main+82>: mov eax,0x0
0x0804847b <main+87>: leave
0x0804847c <main+88>: ret
End of assembler dump.

If you compile and disassemble the binary in a different mashing you may not see same disassembly as above. That is because compilers optimize the assembly code. But the main parts and logic is always the same.


push ebp , mov ebp,esp and sub esp,0x4 instructions are added by compiler and those are the set of function prologue instructions. I don't o to explain them in deeply because I posted separate tutorials for function prologue, function epilogue etc.

You can see a sub esp,0x4 instruction above. What it does? In our main function there is a local variable called int i. So above assembly command make a space in stack for that local variable.

Let's focus on following couple of assembly instructions.

0x0804842a <main+6>: cmp DWORD PTR [ebp+0x8],0x2
0x0804842e <main+10>: je 0x8048448 <main+36>

First of all let's clear-out what is DWORD PTR [ebp+0x8]. You know main function is expecting two arguments called argc and argv. In assembly level we can access them with ebp as a offset. So ebp+0x8 is argc and ebp+0xc is argv.

Next we use cmp command with DWORD PTR [ebp+0x8] and 0x2 as arguments. the cmp instruction compare two values and save the result in EFLAGS register. As you know the EFLAGS register is 4 bytes(32 bits) long and has 32 flags. (Each bit is a flag)
Each of these flag can be set or cleared. So if above two arguments of cmp instruction are equal a unique flag in EFLAGS register(ZF flag) will be set. That mean there is a flag to set if two arguments are equal, also there is another flag to set if they are not equal. If you want to learn more about EFLAGS register read this document.


Now what je 0x8048448 instruction does? je stands for "Jump if equal" . This is totally depended on previous comparison. That means it will jump to given address if above two arguments are equal. But how je instruction know the result of previous instruction? . It looks in EFLAGS register and checks if corresponding flag is set or not. So if condition is met the execution jumps to given memory address (So next instruction will be in 0x8048448). If, condition is not met it will execute in normal flow (next instruction is in 0x08048430).


So what happening hear is following.
If we don't supply arguments, cmp instruction tells argc is not equal to 2. so it doesn't set ZF flag in EFLAGS register. After that je instruction looks in ZF flag and when it determine the result of above cmp instruction it decide that condition is not met. So it don't jump to given address. So below four instructions will be executed.

0x08048430 <main+12>: mov DWORD PTR [esp],0x8048540
0x08048437 <main+19>: call 0x8048350 <puts@plt>
0x0804843c <main+24>: mov DWORD PTR [esp],0x1
0x08048443 <main+31>: call 0x8048360 <exit@plt>

What they do is simply exit the program with an error massage. We can find the string of error massage by examining the memory address 0x8048540.

(gdb) x/s 0x8048540
0x8048540: "Please input a number"

We push this memory adders to top of the stack and call puts function. but why? Puts function needs one argument (A pointer to a string). After that we put 0x1 in eax(This is the status value) and call exit function.

What if we supply a number as an argument to program?. Since cmp instruction set ZF flag in EFLAGS register je instruction will redirect execution to 0x8048448. So following set of instructions will be executed.

0x0804844b <main+39>: add eax,0x4
0x0804844e <main+42>: mov eax,DWORD PTR [eax]
0x08048450 <main+44>: mov DWORD PTR [esp],eax
0x08048453 <main+47>: call 0x8048340 <atoi@plt>
0x08048458 <main+52>: test eax,eax
0x0804845a <main+54>: jne 0x804846a <main+70>
0x0804845c <main+56>: mov DWORD PTR [esp],0x8048556
0x08048463 <main+63>: call 0x8048350 <puts@plt>
0x08048468 <main+68>: jmp 0x8048476 <main+82>
0x0804846a <main+70>: mov DWORD PTR [esp],0x804856b
0x08048471 <main+77>: call 0x8048350 <puts@plt>
0x08048476 <main+82>: mov eax,0x0
0x0804847b <main+87>: leave
0x0804847c <main+88>: ret


So at the moment our first if statement is over. It decided the flow of program.

Now let's focus on next if command.

The following set of assembly instructions convert our input number to a integer Do you remember we learned in a our C programming tutorial that argv hold arguments in string form. So we used atoi (ASCII to Integer) function to convert it to an integer.

0x0804844b <main+39>: add eax,0x4
0x0804844e <main+42>: mov eax,DWORD PTR [eax]
0x08048450 <main+44>: mov DWORD PTR [esp],eax
0x08048453 <main+47>: call 0x8048340 <atoi@plt>

Now eax holds our input in integer form. As the next step we can check whether eax is zero or not. But in assembly level how we do it? Let's see.

0x08048458 <main+52>: test eax,eax
0x0804845a <main+54>: jne 0x804846a <main+70>

What does above two instructions do? test is another assembly instruction that takes two arguments. test eax,eax instruction will set zf flag (zf flag will be 1) if eax is zero. If eax is not zero test instruction clears zf flag (it's value will be zero).

Now next instruction is jne. what it does? jne stands for Jump if not equal. You may remember that je instruction jumps to given address if zf flag is set(zf flag's value 1). jne is opposite of je. So jne will jump to given location if zf flag is not set.

Let's assume our input number is zero. Now eax holds zero. So test eax,eax will set zf flag. So jne check in zf flag and it it's value is 1. So it don't jump to given address and continues the normal flow. What happen next is execute following instructions. At the moment you can read and understand what they do.

0x0804845c <main+56>: mov DWORD PTR [esp],0x8048556
0x08048463 <main+63>: call 0x8048350 <puts@plt>
0x08048468 <main+68>: jmp 0x8048476 <main+82>

Let's examine what in 0x8048556.

(gdb) x/s 0x8048556
0x8048556: "Input number is zero"

Yes. It print out the string we hopped and jumps to 0x8048476. What's in 0x8048476?.

0x08048476 <main+82>: mov eax,0x0
0x0804847b <main+87>: leave
0x0804847c <main+88>: ret


Hear program exists normally.

Now what if our input is not zero?. Value of eax is not zero. So test eax,eax don't set zf flag. So jne instruction jumps to given memory address. So following set of instructions will be executed.

0x0804846a <main+70>: mov DWORD PTR [esp],0x804856b
0x08048471 <main+77>: call 0x8048350 <puts@plt>
0x08048476 <main+82>: mov eax,0x0
0x0804847b <main+87>: leave
0x0804847c <main+88>: ret

That path also print a string. Let's examine it too.

(gdb) x/s 0x804856b
0x804856b: "Input number is non-zero"

Yes. everything happens as expected. After printing it program exists normally.

Ok guys. Now I think you understand many things in this document. I'll write more interesting stuff in these topics. Thanks for reading.