September 5, 2015

Viral metamorphic engine design (code level)

In constantly escalating fight between malware authors and anti-virus companies the use of polymorphism/metamorphism has become commonplace. The design and features of the metamorphic engine dictates how hard or easy the viral code can be detected via signature based analysis and to a lesser extent heuristic analysis.

The challenge of polymorphic code is in the way it alters itself to achieve as much of the original purpose of the non-polymorphic code was written.

Here is an example, say I wanted to encrypt a nasty piece of SQLi for later use, a short snippet below (it’s broken for a reason…).

string payloadText = "AND '1=1';exec utl_cmd('/bin/sh \'rm -rf /*\'');"
  RijndaelManaged aes;
  aes.BlockSize = 256;     // Be sure to use full AES-256 with full 256bit IV :)
  aes.Mode = CipherMode.CBC;
  aes.Padding = PaddingMode.PKCS7;
  aes.Key = keyBytes;      // assume Key is stored somewhere else and used here
  aes.IV = ivBytes;        // assume IV is stored somewhere else and used here
  var encryptor = aes.CreateEncryptor(aes.key, aes.IV);
  var memEncrypt = new MemoryStream();
  using(var crypto = new CryptoStream(memEncrypt, encryptor, CryptoStreamMode.Write))
  using(var stream = new StreamWriter(crypto))
  {
    stream.Write(payloadText);
  }
  stream.Flush();
  streem.Close();

The latest and greatest polymorphic engines would change this code in at least three places, some in many many more ways.

First the code would be rearranged according to a set of predefined permutations of the various lines of code, for instance transforming a string to a byte array then back to a string, or changing from a using loop to an explicit set of statements or a while loop. This set of changes can be done randomly to these instructions and loops.

Second there would be nonsense sprinkled throughout the code, basically dummy code that gets compiled but adds nothing to the actual execution, like simple math functions to calculate a number that never gets used. These nonsense instructions can be sprinkled randomly around the code.

Third once compiled the bytecode of the PE header and the actual code would be rearranged using predefined permutations changing a byte move to a byte increment, a byte decrement and stack push a pointer move and a stack pop to do the move… this swapping can be randomized.

These are just permutations, obfuscation thru encryption routines and numeric oracles and magic numbers are not included in this. Nor is code re-sequencing included which just moves functions and variables around to change the signature/hash of the file. Just assume that if it is well thought out these other techniques will be used as well to produce polymorphism.  This doesn’t include the various anti-debugging and virtualization detection routines that are typically injected (and polymorphed) into the code prior to or after the original code is polymorphed.

Lastly code is typically obfuscated using normal language specific tools (dotfuscator) and packed using typical PE packers like UPX/PEtite/etc… The use of these isn’t an obvious sign of malware since many legitimate software vendors use these to protect their code from reverse engineering. Though a simple change at the code level creates a completely different MD5/SHA signature after a UPX packing, multiple levels of changes make one piece of code morph-able into many thousands to many millions of variations (or beyond).