Tuesday, February 28, 2012

Fun with shellcode further

I wanted to take a moment and circle back around on the previous post and explain a few things. The original post was meant to power through and get people thinking about how to use the code that you may find on the internet and put it into a binary that you can test locally.  Now I would like to explain some of my reasoning for the code that I used.

If you look back at the code that I used you will notice that it is very clean with minimal interpretation. By interpretation I mean between the syntax of AT&T and Intel assembler. This is where I meant I would explain this later when I mentioned that things looked odd and why I flipped the operands.  I am by far an expert when it comes to explaining the differences between the syntax of the two and have worked with and know several people that could explain this better but I will give it my hand.

One of the things that you will be able to point out whether or not the code is AT&T or Intel is that registers are prefixed with '%' and '$'. There are many , many other things that you need to understand the difference in how the translate and here are a few.


Source and destination for move and store
at&t = src, dest  |  intel = dest, src
designation of code
at&t= .code32 or .code16 | intel = BITS16 or BITS32
Comments
at&t = /* */   | intel = ;
appended instructions
at&t = pushb | intel = push byte

There are many other differences such as directive syntax, far jumps etc. All the differences are way beyond this blog and there is extensive documents around that I have listed in the references that you can see.

Now the previous differences explain why I did the flip also why I chose the code. If you look back there is little that you need to flip in fact only 13 times which is pretty low.

From the previous post I mentioned the one of the reasons for post was to show some of the issues that you may come across. Many, Many tutorials on assembly mention the use of "nasm" . As you can see you will come across some issues if that is all that you are familiar with. Now let us take some additional steps. So working the same example http://www.shell-storm.org/shellcode/files/shellcode-515.php . We now know that this is AT&T syntax, but what do we do with this? This is were the GNU Assembler (GAS) steps in. This assembler understands both Intel and AT&T syntax so lets just compile the code direct.

First we will remove the "<>" since these are not part of the assembly and remove all the other code:

# cat test.asm | cut -d '/' -f3 | sed 's/%//g' | sed 's/\$//g' | sed 's/<//g' | sed 's/>//g' > 515.s
Now that we have the code into pure AT&T syntax lets compile with "as" and get it working:


# as 515.s -o 515.o
# ld -o 515 515.o
 #./515
tcp        0      0 0.0.0.0:64713
Ok great, now we now how to compile AT&T syntax code and get a binary that we can execute. So many people are saying but I really , really , like Intel syntax how can I see the Intel syntax. With a compiled binary this is simple with gdb or objdump. 

With gdb you can load it into the debugger:

Load it up into the debugger:
 # gdb -q ./515

set the flavor that you want displayed :
(gdb) set disassembly-flavor intel

run then kill it and disas:

(gdb) run
Starting program: /root/blog/gas
^C
Program received signal SIGINT, Interrupt.
0x08048088 in _start ()
(gdb) disas


Dump of assembler code for function _start:
   0x08048054 <+0>:    push   0x66
   0x08048056 <+2>:    pop    eax
   0x08048057 <+3>:    xor    ebx,ebx
   0x08048059 <+5>:    push   ebx
   0x0804805a <+6>:    inc    ebx
   0x0804805b <+7>:    push   ebx
   0x0804805c <+8>:    push   0x2
   0x0804805e <+10>:    mov    ecx,esp
.
.

The second method that you can use which is much easier is using the "-M intel" switch of "objdump":

# objdump -d -M intel ./gas


08048054 <_start>:
 8048054:    6a 66                    push   0x66
 8048056:    58                       pop    eax
 8048057:    31 db                    xor    ebx,ebx
 8048059:    53                       push   ebx
 804805a:    43                       inc    ebx
 804805b:    53                       push   ebx


Great now we have 2 ways to display the Intel syntax of a AT&T compiled binary. So we have the final piece that I wanted to talk about. Code conversion without a compiled binary.

During my interwebz travels I have come across an old sed script that does a minor translation from Intel to AT&T . Now this doesn't help us entirely but it does bring what I like which is visibility. A lot of admins understand sed but maybe not understand C or python or perl so I decided to do a translation in reverse with the same script.

You can find the original sed script that will do that Intel to ATT here http://www.delorie.com/djgpp/mail-archives/djgpp/1995/06/06/05:48:34 . Below you will find a slimmed down sed script that will do the basics of what the previous sed script does just in reverse. Also this is not the most optimal method but I wanted to do a 1-1 comparison just for visibility. You can download it here as well att2intel.sed


#
#       @(#)as386.sed    1.1 - 86/11/17\
#       ^------This is from original----^
#
#    rev: 03022012
#
#  A typical way to use this sed script is:\
#
# tr "[A-Z]" "[a-z]" <infile | sed -f this-script >outfile\
#
##################################################
# spread everything out
##################################################

:cmpress
h
# this removes any comments in the file
s/;.*$//
# this squished some lines down
s/[ \t]*$/\t/
# add leading space on all lines
s/[ \t][ \t]*/\t/g
# this seperates lines with , : with space
s/\([][)(,:*/+-]\)/\t\1\t/g
s/[\t]?[\t]/\t0\t/g
s/?/_/g
s/[\t][\t]*/\t/g

:equate
s/^\([^\t]*\)[\t]equ[\t]\(.*\)[\t]$/#define\t:\t\1\t\2\t/

####################################################
# This is where I remove % and $ from the symbols
####################################################

:registr
s/[ \t]%e\([abcd]\)x[ \t]/\te\1x\t/g
s/[ \t]%\([abcd]\)\([hlx]\)[ \t]/\t\1\2\t/g
s/[ \t]%e\([ds]\)i[ \t]/\te\1i\t/g
s/[ \t]%\([ds]\)i[ \t]/\t\1i\t/g
s/[ \t]%e\([bs]\)p[ \t]/\te\1p\t/g
s/[ \t]%\([bs]\)p[ \t]/\t\1p\t/g
s/[ \t]%\([cdefgs]\)s[ \t]/\t\1s\t/g
s/\$//g
s/%//g
s/<//g
s/>//g

####################################################
# word ptrs
####################################################

s/[ \t]\([abcd]\)\([hlx]\)[ \t],[ \t]1x\([^   ]*\)[ \t]([ \t]\([^   ]*\)[ \t])/\tbyte\t[\t\4\t+\t0x\3\t]\t,\t\1\2/g

s/[ \t]e\([abs]\)\([xip]\)[ \t],[ \t]0x\([^   ]*\)[ \t]([ \t]\([^   ]*\)[ \t])/\tdword\t[\t\4\t+\t0x\3\t]\t,\te\1\2/g

s/[ \t]0x\([^  ]*\)[\t]([ \t]e\([abs]\)\([xip]\)[ \t])[ \t]*,[ \t]\([^   ]*\)/\t\4,\t[\te\2\3\t+\t0x\1\t]/g

/-\t0x/ s/[\t]e\([abs]\)\([xip]\)[\t],[\t]-[\t]0x\([^   ]*\)[ \t]([ \t]\([^   ]*\)[ \t])/\tdword\t[\t\4\t-\t0x\3\t]\t,\te\1\2/g

####################################################
# colapsing
####################################################

s/[\t]\([)(,*/+-]\)/\1/g
s/\([)(,*/+-]\)[ \t]*/\1/g
s/\[[\t]/\[/g
s/[\t]\]/\]/g
s/[\t]:/:/g

####################################################
# This normalizes everything
####################################################
:normliz
/:\t/ !s/^\([^\t;#][^\t]*\)/\t\1/
/:\t/ !s/^[\t]\([^\t+,-]*\)\([+-]\)/\t\1\t\2/g
/:\t/ !s/^[\t]\([^\t+,]*\)+%/\t\1\t+/g
/:\t/ s/^\([^\t]*\)[\t]\([^\t+,-]*\)\([+-]\)/\1    \2\t\3/g
/:\t/ s/^\([^\t]*\)[\t]\([^\t+,]*\)+%/\1\t\2\t+/g
s/+%\([^\t,]*\)/(\1)/g
s/[\t]+/\t/g
s/\([:,]\)+/\1/g

#####################################################
# changed below commented out what I dont know
#####################################################
:operand
/[.;#^+]/  !s/^    \([^\t]*\)[\t]\([^,]*\),\([^\t]*\)/\t\1\t\3,\2/
/[.;#]/ !s/^\([^\t][^\t]*\)[\t]\([^\t]*\)[\t]\([^,]*\),\([^\t]*\)/\1\t\2\t\4,\3/

######################################################
# I changed the to remove the b and x
#####################################################
:opcode
s/movl/mov/g
s/movb/mov/g
s/leal/lea/g
s/xorl/xor/g
s/popl/pop/g
s/pushl/push/g
s/pushw/push/g
s/incl/inc/g
s/xchgl/xchg/g
s/addl/add/g
s/cmpb/cmp/g

######################################################
# this shoves everything back
######################################################
:comment
s/[\t]*$//
# this shoves the whole thing back to the left
x
# this removed everything and left ;
/;/ !s/^.*$//
# this added another ;
s/\([\t]*\);/;\1;/
# this removed the ;
s/^[^;]*;//
# everythig is back
x
# this added a space between the lines
G
# then puts it back together removing lines spaces
s/\n//
# changes the ; at the end with a /
s/;/\//



Now that we can see that it works lets give it a shot on another script that we can find but I will go simple since I am sure it is missing a lot so we will use code at http://www.shell-storm.org/shellcode/files/shellcode-632.php a ping shellcode. So we will use the sed script and compile and see what happens but first we need to get the file into something workable. Like in the first post we need to strip the other junk out of the code. First lets copy just the assembly portion to a file named 632.asm then we will strip the code to what we need.
cat 632.asm | cut -d '/' -f3 > 632a.asm
This will leave us with a file named 632a.asm that will look like this:
 push   $0xb
 pop    %eax
 cltd
 push   %edx
 push   $0x20207473
 push   $0x6f686c61
 push   $0x636f6c20
 push   $0x676e6970
 mov    %esp,%esi
 push   %edx
 pushw  $0x632d
 mov    %esp,%ecx
 push   %edx
 push   $0x68732f2f
 push   $0x6e69622f
 mov    %esp,%ebx
 push   %edx
 push   %esi
 push   %ecx
 push   %ebx
 mov    %esp,%ecx
 int    $0x80
Now we have it in a working format that our script will be able to work with.

#tr "[A-Z]" "[a-z]" < 632a.asm | sed -f att2intel.sed > 632i.asm
# nasm -f elf 632i.asm
# ld -o 632i 632i.o
# ./632i
PING localhost (127.0.0.1) 56(84) bytes of data.
64 bytes from localhost (127.0.0.1): icmp_seq=1 ttl=64 time=0.195 ms
64 bytes from localhost (127.0.0.1): icmp_seq=2 ttl=64 time=0.034 ms
We have Intel code execution from converted ATT syntax. I realize there are several limitations to the sed script but I just wanted to do a little 1-1 comparison and I will update the above script as I make some changes like you will see that I used clear tabs sometimes and [ \t] others and will convert to \t for copy paste ability.


Now that we have the scripts to convert back and forth we can really start to have fun. In our Arsenal we now have the ability to:

              -Convert pure shell code to local binary for testing
              -Retrieve the shellcode from a binary
              -Convert AT&T code to Intel for nasm to get the shellcode
              -Compile AT&T syntax code.

I believe this gives a great start to testing and using shellcode or assembly that you find on the interwebz when you are in a pinch , or if you find some really cool code that you might want to use that automated tools cannot produce shellcode for.


References and further reading:

http://wiki.osdev.org/Opcode_syntax
http://asm.sourceforge.net/articles/linasm.html (from Tyler's Comment)
http://www.ibm.com/developerworks/linux/library/l-gas-nasm/index.html
http://banisterfiend.wordpress.com/2008/08/17/att-vs-intel-assembly-syntax/
http://www.delorie.com/djgpp/mail-archives/djgpp/1995/06/06/05:48:34 (original sed)
http://asm.sourceforge.net/intro/hello.html


3 comments:

  1. Again a very good post! thank you very much...
    some questions...
    why the ping shellcode when i compile it in linux it spits a segmantation fault? I compile it aslo in ubuntu in which the author of the shellcode tested the shellcode.Shouldnt just work at least in ubuntu?

    the opcodes between at&t and intel syntax is the same?

    if we have only a hex shellcode how can we deal with it?

    Thanks and keep up the good work!

    ReplyDelete
  2. Thanks Chris for the comment,

    I changed a few things in the post to help with making sure you are on the right track with getting this working regarding what the file should look like before you compile.

    Also the exact system that I am using is the 32bit version of backtrack5r1 if you want to replicate the same environment.

    As a final note there is a difference in nasm's that I should have explained in the previous post about the "push immediate word/ byte" a push 0x66 in version 2.07 is a 68 in version 2.09.04 it is a 6a. So after you have compiled the program make sure you check your work with objdump or a hexeditor to see if the shellcode matches up. Good luck

    ReplyDelete
  3. I also removed all the clear tabs in the sed script just in case the copy and paste does not translate it and added a revision so you can see when I updated the script. I will make this available as a download also.

    ReplyDelete