Friday, January 23, 2015

The PE File Header: Digging the Rich Signature and other things

We touched on Rich data only briefly before and I'm not going to dwell on what it means any more than I did.  Here's a link for a reminder of what is contained in it and why you only see it sometimes in PE files.  This chunk of information not only reminds us that the file in question was built with a Microsoft compiler but can be leveraged for detection.

A couple of neat bits of information.  PE Files are traditionally pretty hefty.  The PE header alone can take up to 512 bytes because of file alignment. That's due to the specification though it be ignored in some instances to compress this into smaller sections.  That's the entire goal of Tiny PE, for example. The information in the PE Header can be removed since really the only two things required in the header are e_magic (MZ) and e_lfanew (tells you the offset of the real PE Header).  That's why we leveraged looking for the DOS stub and other items in the previous article.  If they are missing and its an peexe then that's a useful indicator of obfuscation or compression via removal.

Previously we used information in the Rich Signature as static values.  It can be helpful to determine obfuscation pursuing that method.  It can also be useful in other ways.  First off, a couple of points.  The python library pefile reads this data in its latest release.  The area has two constants and some helpers.  Constants are awesome because they help you fight obfuscation.  The values are (in hex) 52 69 63 68 for "Rich" and 44 61 6E 53 for "DanS".   This last value is special.  Following the "Rich" value at the end of the signature is a dword value that you XOR the beginning of the signature to derive DanS.  Just past it should be a trio of identical checksum values.  Here's what I'm talking about:



You can see the checksums are identical and one after another.  The rest is an array of linker information values showing the build number, and product identifier.  The link given previously provides the code necessary to find this information and derive the values if desired.  You can also check out the powershell scripting from Exploit Monday to do it another way if desired.

I'm sure you've noted that I keep talking about obfuscation and XOR.  That's obviously not the only means, easily ROT,ROL, and Shift techniques could be employed.  People are lazy though and convenience can often drive a decision that security would have chosen otherwise.

Finding these kinds of things doesn't have to be done by hand or even by Yara.  I like tools as much as the other guy but I believe in understanding the reason and technique behind them before employing them.  In this case, Didier Steven's XORSearch is excellent tool to both find the key for XOR or any of the others mentioned and to unobfuscate the file afterward.



Tuesday, January 20, 2015

Some Side Points on rules and detections

I'm sandwiching this in-between talking about the PE File Header & Structure because it seems an appropriate item to cover since we talked about detection, classification and rules earlier.  How you employ rules and how many you deploy depend on the situation.  For example, in a monitoring situation you need rules to be small and fast.  Especially in heavy duty traffic situations or when you are building rules to detect inline network traffic.  Get into your lab or in a situation where this demanding you can deploy any number of rules to classify and detect.

So, just to summarize, know when you need a light touch or can be heavy handed.  That elaborate, 80-rule combination you use to iterate through the three main types of PlugX should probably be kept in your lab and not on virustotal or in your active monitoring suite.  The five rule set that picks up 90% without getting into typing versions is probably your best bet in a situation like that.

While I'm  on the topic, let's address rule length and condition complexity.  While it seems like the number of strings you may employ in a Yara rule seems endless, it does have a breaking point where it becomes either 1) too slow or 2) too unwieldy to understand.  Its better to break this logic up as discretely as possible in those instances, especially when you are matching on combinations of said strings (2 of this, 3 of that, 1 of those, etc.).  Here's an example (abbreviated for space) where we have 400 lines of strings and some complex condition logic.  Its a good candidate for breaking into smaller chunks.

rule TooBig {
strings:
//...assume 399 lines precede this one, in chunks of 50 runing z-h
$h50 = "I am number 400"
condition:
(6 of ($a*) and 9 of ($b*) and 15 of ($d*) and 4 of ($h*)) or
(all of ($c*) and all of ($e*) and not ($f15 or $f18 or $h48)) or
(all of ($b*) and all of ($d*) and all of ($g*)) }

Absolute nonsense, of course as a rule but it does okay for an example.  If you get into a situation like the above, you need to decomplicate the rule.  Try to break the rule into smaller chunks.  Don't forget that Yara supports private rules.  These rules match but don't report; however they can be used in the condition lines of other rules.  This lets you put somethings into one rule that by itself should never alert but in conjunction with others should.

Condition complexity is where you get into very difficult to understand or to implement actions.  Usually this is where employing a logic structure in Yara or breaking the logic into smaller rules assists.  Remember, Yara supports all the Boolean operators and, or and not and relational operators >=, <=, <, >, == and !=. Also, the arithmetic operators (+, -, *, \, %) and bitwise operators (&, |, <<, >>, ~, ^) can be used on numerical expressions.  Use them.  Don't forget about being able to count matches and the ability to look at specific offsets (very useful).

Saturday, January 17, 2015

The PE File Header: Talking about MZ and other things

I debated on several different ideas to make as the first one to the blog.  After some contemplation I decided to start with the PE File Header*.  I've read more than a few and frankly, yawned my way through or just up and left many of them.  So, in order to make this different I decided to make it about what I tend to focus on: what can be leveraged as a detection.  As a norm, I'll refer to Yara when I talk about detecting things unless I say otherwise.  Tends to make it clearer that way.  Let's knock out some basics, first.

If you have no idea what structure of a PE File looks like, you need to go this page and get some fundamentals under your belt first.

Take a second and look into the Rich Signature that might be present.  Our first example will not have this data but the second one will.  Look at the differences.

Unfamiliar with Yara?  Zip your way to the Yara docs to find out its uses.

Okay, with those two things out of the way, let's take a look at the beginning of a PE file in a hex editor.


Since we already provided some links to how this structure relates, I'm going to skip some of the blabber you would normally get here.  Let's talk about detections for a second.  Integral to a lot of yara rules is a check in the condition line to make sure its being compared against the right type of file.  The documentation will give an example to make this check like this:

rule IsPE {
condition:
uint16(0) == 0x5A4D and uint32(uint32(0x3C)) == 0x00004550
}

Quite efficient, this checks for the "magic number" of MZ (the 4D 5A value you see in the hex) and for PE (hex 50 45).  Most people are much less efficient and just place a string equal to "MZ" before the condition and just check for that at offset 0.  Like this:

rule IsPE {
strings:
$mz = "MZ"
condition:
($mz at 0) and (uint32(uint32(0x3C)) == 0x00004550)
}

Frankly, you'll find that just checking for MZ at 0 will work well enough in the condition line.  You can even look for "This program cannot be run in DOS mode" if you wanted.  Any of the three or singly is usually more than sufficient.  A simple means to leverage this with detections is to contrast the magic number, in this case an exe, with its file extension.  While simplistic, it acts as a simple means of contrast of contents to purported type.  For example, a MZ magic number but a file ending of jpg is an easy give away.

Finding a PE File by its "magic number" or by just looking for MZ at 0 is all well and good when you have an modified file.  The hacks and artists that generate malware tend to employ a whole back of tricks to defeat such things.  For example, our detection of a valid PE file would fail if we changed the magic numbers to something else, say "XD".  Of course, this also means it won't run either since Windows would not identify it as valid anymore.  That doesn't invalidate it, though.  This tactic happens frequently since a dropper or injector file could easily change those numbers back and execute it.  In this instance, you might think of leveraging a rule like this:

rule PEWrongMagicNumber {
strings:
  $mz = "MZ"
  $msdos = "This program cannot be run in DOS mode"
condition:
($mz at 0) or $msdos or (uint32(uint32(0x3C)) == 0x00004550)
}

Saying that nothing else had been done beyond that, the above rule would still flag that it was a PE File.  Of course, it would be just one in a series of matches you would want to do before flagging it in any specific fashion.  A lot of permutations exist here.  Let's focus on a couple.  The previous rule checks for the existence of one of the three items common to all PE Files.  What about if one of them was missing?  The below rules looks at three possibilities where one but not the second or third exists.

rule MissingEssential{
strings:
  $mz = "MZ"
  $msdos = "This program cannot be run in DOS mode"
condition:
(($mz at 0) and not ($msdos or (uint32(uint32(0x3C)) == 0x00004550))) or
(($msdos and not (uint32(uint32(0x3C)) == 0x00004550) or ($mz at 0))) or
((uint32(uint32(0x3C)) == 0x00004550) and not (($mz at 0) or $msdos))
}

Before I move into a second example, I'd like to point out that you should confine the signatures above to just the first 300 of a file.  For example, adding "in 0..300".  You can make this smaller and in most cases its no issue to use just "0..100" as well.

Let's look at the beginning of a different PE File.  This one has the Rich Signature data.


Here, you'll see the Rich data between the "This program cannot be run in DOS mode." and "PE".
If you read the link provided earlier, you'll realize this means the PE  File was created by a Microsoft Compiler.  That can be handy, especially if you have some intel that indicates your focus might be with a Microsoft crafted PE File over, say, a Delphi one.

rule IsMicrosoftPE {
strings:
$mz = "MZ"
$rich = "Rich"
condition:
($mz at 0) and ($rich in (0..300)) and (uint32(uint32(0x3C)) == 0x00004550)
}

Having the product version of the Microsoft compiler might be useful, so this might be an interesting detection to look for and then extract the data to use as an indicator to find or type malware families or builders with.  Additionally, since the linker data in the Rich signature points to linker database, you can pivot from the data and find the list of linked libraries.  Also another good indicator for typing and grouping malware.

Before I end the introduction to this part of the PE File Header, let's tackle one more item.  We know a couple of things about this piece of the File Header.  First, we  have a couple of known values.  "MZ", "This program cannot be run in DOS mode" and "PE".  You could could count the "Rich" value as well if desired.  This gives us some typically unchanging data points to work with, especially when you are presented with something where a PE File is obfuscated, saying with XOR.  To use this information properly, you need to also know that the magic number (4D 5A) is followed by a 90 00 03 00 and that XOR has a flaw.  That flaw is that any byte you XOR with 0 stays the same.  Given that a PE File should start with 4D 5A 90 00 03 00, if you XOR'd the file with a value, every "0" takes on the value of the XOR key.  That becomes something we can detect with our handy Yara tool. Let's take a simple example of an XOR by 0x33 (the number "3").

*** run this only after you have previously identified the file as a possible PE File

rule PEIsXOR {
strings:
  $xormz = { 4D 69 90 33 03 33 }
  $mz = { 4D 5A 90 00 03 00 }
condition:
($xormz in (0..10)) and not ($mz in (0..10))
}

To make this work effectively you would need a larger yara rule that matches against the full ascii table for a single byte XOR.  Of course, more complex ones would easily defeat this kind of check. The downside to complexity is simplicity tends to fall by the side.  For example, a common PE File will have numerous "00" ascii values.

rule PENoZeros {
strings:
  $zero = { 00 }
condition:
   $zero in (0..100)
}

Again, not something to run singly since it will be slow, even if only looking the first 100.  It should be bound to other rules to provide good context, like checking for normal PE, Wrong Magic Number, Missing Essential, etc.

As a parting thought, remember my point about the known values?  You can leverage this information with a little work, even against more complex keys.  We'll cover this in more detail in future detail.

* I'm quite aware that the region I'm referencing is more accurately called the MS DOS header and the "This program cannot be run in DOS mode" is properly the MS-DOS stub.  I've always called the whole thing the PE Header and didn't see any reason to stop now.