May 16, 2020

Stack architecture theory tutorial

The stack is an important concept in computer science. If you are planning to learn reverse engineering, malware analyzing, exploitation, etc this concept is a must to learn. After learning about the stack we can deep into the world of stack buffer overflows. So let's see why we use this stack concept. In c and many high-level programming languages, we use functions. A function takes some data, processes it, and returns something. So how this is possible? We use the stack to give function arguments. So in this document, we are going to learn all required theories about the stack architecture After reading this article you may refer to stack architecture demo tutorial to get a complete understanding of the stack and stack frames.

Some terminologies about the stack

The stack is always beginning from high memory and grows into low memory. there is a special pointer register called "ESP" that deals with the stack. This register always points to the top of the stack. ESP stands for "Extended stack pointer". Even its name indicates ESP is keeping track of the stack. Every function has it's own stack frame. This is where a function keeps its local variables. When we talk about the stack and stack frames the EBP is another important register. EBP is used to identify the base of a stack frame. So what we call a function's stack frame is the memory area surrounded by EBP and ESP. As ESP points to the top of the stack. when the stack grows ESP gets reduced. Because the stack is always growing into lower memory addresses. As well as whenever stack gets to back ESP gets increased.

PUSH and POP with the stack

Here we are going to talk about some two basic operations we frequently do with the stack. Those are PUSH and POP. The instruction PUSH means that we are pushing something onto the top of the stack. It is clear that, if we push something on to the stack frame the stack should grow more. As you know stack grows into lower memory addresses. So ESP (or RSP) will reduce. Think that we are pushing an integer to the stack. An integer is four bytes long. So ESP will reduce by 4 bytes. Let's understand this situation with an image. here we have a graphic layout of the stack before we push our integer value. Note that there is a stack frame already on the stack. ESP is pointing to the top of the stack.

stack-layout-before-push

Let's use PUSH instruction and push the value of eax (Assume that the value of the eax at the moment is 0x5).

push eax

You can see the space allocated for the integer in a blue box.

pushed-a-value-to-the-stack

Now we are going to see the POP instruction. What this instruction does is removing whatever found at the top of the stack and place it on another register. So after this process, the length of the stack should be reduced. So ESP will go higher. Think that we want to remove the value at the top of the stack and place it on EBP. We can just use the command pop ebp.

So, guys, we can summarize all of the above content into the following.

  • The stack starts from high memory and grows into low memory.
  • The stack pointer is always pointing to the top of the stack ( This can be esp or rsp). If we push something (using PUSH instruction)stack grow more into lower addresses so esp will reduced.
  • We can pop off the stack and copy whatever found on top of the stack into a register. After that stack length will be reduced and esp go higher.

here we have an example also.

[push ebp]  :   esp ---> esp-4 : value of ebp pushed onto the stack.
[pop ebp]   :   esp ---> esp+4 : remove value of top stack and copy it to ebp.

Function prologue

The prologue is the process that builds a stack frame for a function. The compiler is responsible for function prologue. It creates a set of instructions to allocate a space in the stack and put some 999 data on it. You know that every function has it's own stack frame.

I'm going to write a small C program to demonstrate of function prologue.

int function1(a, b){
  int x;
  x = a + b;
  return x;
}

int main(){
  function1(4, 5);
  return 0;
}

here I used a function called function1 and call it from main function. The main function also just a function. So it has a stack frame. That holds some data like argc, argv etc.

function1() receives two arguments as a and b. It has a local variable called x. At the beginning x holds nothing. in function1() a and b will be added and the result will be saved in local variable x. Finally, function1() returns the value of x as the return value. Just a simple program. Now we can draw a layout of our stack as follows.

main function stack frame

For now, we don't go to examine what inside the main function's stack frame. Let's see how function-prologue builds the function1's stack frame.

The first thing happens in the prologue is pushing arguments to function1() in the stack. In the main function, we gave 4 and 5 as arguments. So we are going to put them on top of the stack. In the Assembly level, we use PUSH instruction to do this. In the above image, you can see the stack layout after we put arguments on the stack. Now esp is not the same as the previous one. It has been reduced by eight bytes. Why 8 bytes? Because a single integer is long four bytes. 0x5 and 0x4 are the hexadecimal representations of five and four.

pushed function arguments on to the stack

You may notice a special thing that, when we push these arguments the second argument (value 5) is pushed before the first argument (value 4). I'll explain why we do like that in the next tutorial.

Let's see what happens next. We are calling function1() in the middle of the main function. So in main function, there are some other things to do after completing function1(). The structure of CPU instructions is something like the following.

cpu instructions in multiple functions

In the above three sections, there is a list of CPU instructions and the EIP register is pointing to the instruction that should be executed next. So CPU looks at EIP and decides what to execute. So when we switch to function1() from main, we set EIP to the first instruction of function1(). What happens when the CPU completes the execution of function1? There are some other instructions in the main after function1().

So how we set the EIP to the next instruction in main()?. The solution to this problem is saving the address of that instruction in the stack. Take a look at our C program. The instruction after function1(4, 5) is return 0. So we save the address of return 0 in the stack. We call it as the return address. So the return address holds the location to jump after completing function1(). here is the stack layout after we push EIP.

pushed-old-eip-value-on-the-stack

Now we are going to see the next steps. You know that we identify a function's stack frame with EBP and ESP. EBP is the beginning point of the stack frame while esp indicates the top of the stack frame. So main() function is using EBP to identify it's stack frame. But function1() also needs EBP to mark its stack frame. So what we do here is save the current value of EBP in the stack. When we finish function1() and switch to main() function we get this saved EBP value from the stack and save it again in EBP register So main() function can use it again with no errors. Following is the stack layout after pushing EBP.

stack-tutorial-pushed-old-ebp-value

You have to understand that both EIP and EBP are 4 bytes in size. So when we comparing current esp with the one we had when beginning we have deducted 16 bytes from esp.

Previous instructions didn't actually make function1's stack frame. They did some pre-operations and prepared stack. Now let's see what to do next. As the next step, we copy the value of ESP into the EBP. Now both esp and EBP are pointing to the top of the stack.

stack-theory-tutorial-mov-ebp-esp

In the next steps esp will be changed as the stack grows. But EBP will keep pointing to the current location. What do we expect by this instruction? we marked the beginning point of our new stack frame. Now we are going to make some space on the stack for the local variable x. We can do this by reducing 4 bytes from the esp. So stack will grow by 4 bytes.

here is the stack layout with the newly created stack frame.

stack-theory-tutorial-sub-esp

Accessing function arguments

An important usage of the stack is the supplying arguments to functions. In the above, we saw how we can place arguments for function1() on the stack. here we are going to see how that function accesses its arguments.

What happens here is using the EBP as an offset. We know EBP is currently pointing to the base of the function1's stack frame See it on the above image. So what's inside of the EBP. It must be the address of the base of the stack frame. If we add 4 to that address we can get the address of RET (Saved return address). Also if we add 8 to the address inside EBP, we can get the address of the first argument (0x4 in the above example.) what about EBP's value + 12 ?. Yes, it is the address of the second argument.

Le's focus on following Assembly code line

mov  edx, DWORD PTR [ebp+0x8]

this is Intel's assembly syntax So the first operand is the destination and second is the source. Can you understand what it does? DWORD PTR stands for double word pointer. first, we get the value of EBP and add 8 to it. As we discussed it should be the address of the first argument. Next, we get the value found at that address and copy it to the EDX register. You may refer Assembly moving data tutorial to learn more about this kind of assembly code.

We can access the next argument by using the following code.

mov  eax, DWORD PTR [ebp+0xc]

Then we do some calculations on these values and get the result. There is a local variable called "x" and we use it to store this calculated value.

Function epilogue

The epilogue is the opposite operation of the prologue. This resets the stack and all necessary  registers. The first thing that happens is copying the value of EBP to ESP by following assembly code.

mov	esp, ebp

After the above instruction, the locale variables of the function1 are ignored. They are not a part of the stack anymore.

stack-theory-tutorial-mov-esp-ebp

Next, we want to save the original value of EBP in the EBP register Previously we saved it on the stack. At the moment that value is on the top of the stack. So we use pop EBP instruction. It'll get whatever found at the top of the stack and copy it to the EBP.

stack-theory-tutorial-pop-ebp

Now the main function can use its original EBP. Now the function1's stack frame is gone from the stack. But those data are still in the memory. Great. what is up to next?

We use ret instruction to do the next step. It pops off the previously saved return address from the stack and saves it on the EIP register. So execution gets transferred to the main function's next instructions. The following image shows the stack layout after the above ret instruction.

stack-tutorial-ret

You can see the old value of EBP return address and function1's local variables etc are still located at the memory But those are not included in the stack. (End of the stack is indicated by ESP)

Now I think you got a clear idea about the stack and stack frames. You may read the stack architecture demo tutorial to further do experiments with the stack. In that tutorial, we write a program in C and use GDB  to disassemble and see how the stack works.

May 04
C programming strings

In our "Manipulating data with C" article we saw how we can store data in memory. We used the....

Aug 12
Linux file handling

1) Creating an empty file. We can use touch command for make a new file in Linux. Syntax is....

Apr 30
XSS overide functions

One of my friend gave me a JavaScript code and asked to trigger an alert() by changing one....

Replying to 's comment Cancel reply