vendredi 30 juillet 2010

tutorial exploitation format string


This article is a detailed tutorial about exploitation of the format string vulnerability. This vulnerability appears with a wrong implementation of the function printf() in language C.

This method can be tested on the french "Nuit du hack 2010" wargame level 8 test. A solution to this test will be proposed in a further article.

french version available

What is the format string vulnerability?

This vulnerability appears when it is possible to inject an arbitrary format string in function printf(). The format strings are identified by the notation %.

Here is an example of a correct implementation of printf(), where the format string is associated with a string of characters:
printf("%s", "hello");
Here is a wrong implementation. A bad guy could inject a format string in the following line:
printf(argv[1]);
Detail of format strings for printf can be found there: http://www.cplusplus.com/reference/clibrary/cstdio/printf/

We keep:
%x prints in hex. Take an non signed integer as argument (poped from the stack),
%s prints a string. Take a pointer to the string as argument (poped from stack),
%n does not print anything. Take an address as argument (poped from stack). Push on this address a signed integer. This integer is the number of characters printed by printf until now.

We'll see:
- %x allows us to read the stack,
- %s allows us to read anywhere in the memory,
- %n allows us to write anywhere in memory,
- $[number]%n ( associated with $[number]%x ) allows us to write anything, anywhere in memory, without poping from stack. We'll see this in deep later.

read the stack:

Before calling printf(), the arguments are pushed on the stack. For example, in the following line, 0x00000000 and 0xfffffff are pushed on the stack and poped to be printed:
printf(%x%x,0xffffffff,0x00000000)

The problem appears when we "forget" to provide the arguments to the function printf(). Then, printf() pop anyway the stack and print what it found. So, the following line will print the two words on the top of the stack (and pop them):
printf(%x%x)


read in memory:

The string given as argument to printf was pushed on the the stack from last to first character before printf() was called. Then each character is copied on the stack to be printed successively. For example:
main(){ printf("hello"); }
+---------+
 |   o....  |    <-- internal stack of printf():
+---------+          successively "hell" then "o\x00.."
 |    ....   |   
+---------+  
 |   hell  |    <- stack of main()  
+---------+  
 |   o....  |
+---------+

Tip: %[number]$s allows us to access the [number]th word of the stack without poping anything 


We could pop the stack (with %x) until we reach our string (the one pushed on the stack by main() ).

But there is a great tool provided by printf(): We can directly access the [n]th word on the stack with %[n]$s.
For example, the following line access directly the 8th word on the stack. And better: nothing is poped.

printf(%8$s)

That trick was designed to print several times the same argument. For example, you would like to print 8 times the same thing:
"oh la! la! la! la! la! la! la! la!"
You could write:
printf("oh %s %s %s %s %s %s %s %s; "la!","la!","la!","la!","la!","la!","la!","la!");
or write:
printf("oh %1$s %1$s %1$s %1$s %1$s %1$s %1$s %1$s; "la!");

This alternative notation is not used often. But it is really interesting for format string exploitation, because it allows us to read more than one time any word on the stack, without poping it!


Now, let's put an address at the beginning of our string and let's provide %[n]$s with it:
printf ( [address]%[n]$s )
Main() puts the string on the stack: first s, then $, then [n], then %, and finally on the top of the stack: [address].
Main() push then some words over that. We'll have to access our address over them.
Printf() is called and successively read the arguments of the string. Then it executes:
- [address] : prints [address],
- %[n]$s : prins the string pointed by the address placed on the [n]th word of the stack. We choosed [n] to point to  [address].

Then we could read the string pointed by [address].

write an integer in memory:

Now let's see what happen if we replace %s by %n. Then, an integer is written at the address provided. This integer is equal to the number of characters written by printf() until now. So, we'll have to print enough characters in order to have the integer we want.

Tip: %[n]$[k]x prints (printf output) [k] characters:


For example, if [k] = 20 and "word" is at the 7th place on the stack:
printf(%7$20x)
0000000000000000word

Let's use that tip to write an address in memory

write an address in memory

Our string will be:
[address+3][address+2][address+1][address]%1$[n1-16]x%[m+1]$n%1$[n2-n1]x%[m+2]$n%1$[n3-n2]x%[m+3]$n%1$[n4-n3]x%[m+4]$n[padding]
For example:
"\x43\xff\xff\xbf\x42\xff\xff\xbf\x41\xff\xff\xbf\x40\xff\xff\xbf%1$85x%8$n%1$127x%9$n%1$156x%10$n%1$408x%11$nAAA"

Remark: with notation %[nombre], nothing is poped from stack.

[m] is the number of words on the stack over [address+3]
[n1], [n2], [n3] et [n4] are decimal values of ASCII codes we want to write in each byte of memory.

Here are how the arguments are interpreted by printf(), and what happens in memory:

[address+3][address+2][address+1][address] : prints 16 caractères. memory and stack are not affected.

%1$[n1-16]x : prints [ n1 - 16 ] characters (spaces and the first word on the stack). n1 characters have been printed until now. The stack is not modified.

%[m+1]$n : does not print anything. Write the number of characters printed until now ( n1 ) at address found at the [m+1]th word of the stack ( [address+3] )


%1$[n2-n1]x : prints [ n2 - n1] characters. n2 characters have been printed until now.

%[m+2]$n : prints nothing. Write the number of characters printed until now ( n2 ) at the address found at the [m+2]nd word of the stack ( [address+2] )

%1$[n3-n2]x : prints [ n3 - n2 ] characters. n3 characters have been printed until now.
%[m+3]$n : prints nothing. Write the number of characters printed until now ( n3 ) at the address found at the [m+3]rd word of the stack ( [address+1] )
%1$[n4-n3]x : prints [ n4 - n3 ] characters. n4 characters have been printed until now.

%[m+4]$n prints nothing. Write the number of characters printed until now ( n4 ) at the address found at the [m+4]th word of the stack ( [address] )

Tip: add a padding to align the addresses on the stack


Then, we add the padding at the end of the string:
[padding]: adds 0, 1, 2 or 3 characters at the end of the string (so at the bottom of our string on the stack). This allows us to align the addresses on the stack. Indeed:

Here is the stack we want:
+----------------+
 |                 |
 | m words  |
 |                 |
+----------------+
 |address+3|
+----------------+
 |address+2|
+----------------+
 |address+1|
+----------------+
 | address    |
+----------------+
 |  %...         |
+----------------+


the format string %n takes an address from the stack as argument. But often, this address is not aligned on the stack.

For example, we have an address AAAA (\x41\x41\x41\x41). This address can be splitted in two words of the stack. 4 cases are possible:
-41000000-00414141- 
-41410000-00004141- 
-41414100-00000041- 
-41414141-     <-- we want that!

If we add the correct padding, we can align the addresses. And, we place it at the end of our string:
- the number of characters written using %n won't be affected because we place it after every %n,
- the alignement of our four addresses will be affected because we place it under them on the stack.

Tip: every arguments length is a multiple of 4 bytes


A good habit to have: every arguments length is a multiple of 4 bytes (= 4 characters, = 1 word of the stack, = 32 bits)

For example:
%1$[n1-16]x -> 8 bytes  -> %1$ and x equal 4 bytes, so  [n1-16] = XXXX (4 bytes) (4+4 = 8)   
%[m+1]$n -> 8 bytes -> %,$ and n equal 3 bytes, so [m+1] = XXXXX (5 bytes) (3+5 = 8)

Tip: add 0 before decimal integers to have an arguments length multiple of 4 bytes


We can add some 0 (zero) left to every integers in our string. For example [n1-16] = "0235" , [m+1] = "00099"
"00099" will be interpreted as "99", and we will have pushed 5 bytes on the stack.

why [address+3][address+2][address+1][address] ?

suppose :
- [source] is the address of our payload. For example: [source] = 0xaabbccdd,
- [address] is the address where we wante to write. For example [address] = 0x8049654.

aa  bb  cc  dd
                 ^address
           ^address1
      ^address2
^address3

[address+3]= 0x8049657 -> aabbccdd
[address+2]= 0x8049656  -> ??aabbcc
[address+1]= 0x8049655  -> ????aabb
[address]     = 0x8049654  -> ??????aa

Choose [n1], [n2], [n3], [n4]:

if [source] = 0xaabbccdd
aa, bb, cc and dd will be equal to numbers of characters written by printf until %n call.
So, we need aa > bb > cc > dd

Tip: add hundreds to each source address byte

For example:
[n1] = 0xdd = 221
[n2] = 0x1cc = 460
[n3] = 0x2bb = 699
[n4] = 0x3dd = 989

When the previous hex values will be successively written in memory, the result will be:
       0x000000dd
       0x0001ccdd
       0x02bbccdd (the 1 of 1cc is crushed)
0x3 0xaabbccdd (the 2 of 2bb is crushed, and the 3 of 3aa is written in the next word)

conclusion

You have seen that format strings allow bad guys to read the stack, read or write anywhere in memory.

references

tutoriel (fr) format strings : http://www.bases-hacking.org/format-strings.html
printf - http://www.cplusplus.com/reference/clibrary/cstdio/printf/

1 commentaire:

  1. in printf("oh %s %s %s %s %s %s %s %s; "la!","la!","la!","la!","la!","la!","la!","la!");
    does it push to the stacl "la!" 8 times, or only once ?

    RépondreSupprimer