Deobfuscating a Malicious Word Document

Visual Basic (VB) and its derivatives – VBA and VBScript – are common attack vectors when targeting Windows systems. These languages allow users and attackers alike to complete security-sensitive tasks such as accessing the internet, running command-line instructions and editing the Registry.

Last month, we came across two samples of weaponised VB code: one was embedded as a Macro in a Word document, the other as a standalone VBScript file (.vbs). Both samples made use of basic obfuscation techniques to avoid detection with varying degrees of success. In this article, we follow along the investigation process by which we reverse-engineered the first of the two samples. By doing so we expect to learn about its functionality, and maybe learn about the threat actors responsible for it.

Inspecting the sample

The first sample was collected as part of a phishing attempt. A malicious Word document was attached to an email in an encrypted archive. Interestingly, the sender address belonged to the domain of a legitimate cloud service provider used by our client; suggesting that the account or mail server was already compromised by the threat actor.

After downloading the payload, our first investigation step was to list the internal objects of the Word document using the tool oledump. As expected, the document contained macros which we extracted using oledump once again. Upon first look, these files seem to only list random words (see Extracted VBA code below). This is where the de-obfuscation and reverse engineering process starts.

Contents of the Word document as revealed by oledump. “M” or “m” indicate the presence of a macro

Extracted VBA code from object 11 “VBA/omvLL”. Notice the highlighted “AutoOpen” section.

No comments…

As we can see from the screen capture above, the authors of the code have added random English words as comments to confuse detection heuristics into classifying this file as text rather than code. While this attempt may trick basic systems, a context-aware approach should be able to recognise the scripting language and parse comments for what they are.

Collating all the objects containing macros and removing all comments allows us to get a closer look at the code (see sample below). While most instructions are still obfuscated, a file path was visible as a clear string.

Sample VBA code showing the main AutoOpen section without comments

Naming obfuscation

You may recall coding courses or tutorials emphasising the importance of correctly naming variables in your code. Not only is this good advice, failing to do so is a common obfuscation technique known as naming obfuscation. This method is particularly useful to frustrate human efforts at inspecting code. Without code readability we need to study each function’s definition to determine what the code actually does.

To break through naming obfuscation, we follow function calls until we find recognisable keywords and operators. Two functions in particular – gBxAG and ZSMFd – seem to be called often and lead to the generation of VBA-specific keywords.

The former, pictured above, iterates over the length of an input object (notice the For–Next structure). The ampersand operator indicates that the function is working with strings. This is confirmed by the fact that the SpWIG function simply acts as a buffer for the built-in Mid function (see screen capture below). Following the loop, we understand that gBxAG reverses strings: for example gBxAG(“drow”) would return “word”.

An example of naming obfuscation: the Mid function is hidden behind a meaningless name.

Hiding suspicious instructions until run time

The ZSMFd function, pictured below, also seems to operate on strings as indicated by the call to the built-in Split function. Elsewhere in the code, ZSMFd is called with a single integer argument. This type of behaviour is reminiscent of arrays, but what data does it store?

Showcasing the array-like behaviour of “ZSMFd”

Upon closer inspection of ZSMFd, the only possible source of data is ActiveDocument.Shapes(1#).Title. Reading through Microsoft’s documentation, we understand that Word documents contain “Shapes” such as pictures or drawing layers. These “Shapes” are programmatically accessible using VBA and have properties; one of which is “Title”, a user-defined string.

We can now deduce that ZSMFd extracts the title of the Word document’s first shape, and splits that string into an array according to a delimiter defined in the VBA code – see pKcUT As String = “-----” above. Upon input i, the function returns the i-th string stored in the array. These extracted strings are used in the code to compute longer strings, some of which are then executed using the built-in exec method.

This extraction mechanism is in fact the main obfuscation technique used by the attacker: suspicious keywords and information linking back to the attacker are not included in the code. Instead, they are stored in the Word document and only revealed at run time. Doing so prevents a purely static analysis of the sample.

Furthering our investigation requires us to run code on the malicious document. Extracting and splitting the embedded string using the same method as the attacker reveals four items: a reversed URL shown below, and the strings “23lldnur”, “wscript.shell” and “winhttp”. We now see the use for the reverse function above!

https://t497sword.com/analytics/m2G0QdFFlMNhK1x9mHmWCPgkGQEztbhWqSTyKNnzIZEmV35KJkTO2HdXZv_7BzD1KcziQgvY_y9yhZI/bzm12?DUHC=IlEAxkQxRpfRD&Lk_M=jJDaUNLTmiPHVoFXT&BuN_G=UytaScjGNxHYmqc_&QQvUO=GhOwZkkBdMBH&eBP=zrYRQpMykQBxbk_FS&dm=gqbPvgKocPBFWw

Putting it all together

Having found the core building blocks for the code, we can now reconstruct the obfuscated instructions and follow the code’s flow. When a user opens the malicious Word document and enables Macros, the following actions are executed automatically:

Extract the 4 strings from the Word document’s “Shapes”. Reverse the URL and “23lldnur”. The latter becomes “rundll32”.
Using the extracted URL and “winhttp”, send a HTTP GET request to the C2 server.
Write the response from the server to a PDF file c:\programdata\AEXbT.pdf.
Using the extracted “wscript.shell” keyword, spawn a shell on the infected machine.
Finally in the shell, using the extracted string “rundll32” execute the command line instruction rundll32 c:\programdata\AEXbT.pdf,ShowDialogA -r.

Sample of the fully de-obfuscated code: the communication module

Sample of the fully de-obfuscated code: the AutoOpen routine

This final step may seem confusing: how can we run a PDF file? Rundll32 is a built-in Windows tool that allows to run .dll files. These files, known as dynamically-linked libraries, contain code or data, and are frequently used in support of other applications. However, rundll32 does not check file extensions before running a target file; it only checks for correct formatting of the file’s contents. Assuming the attacker sends a correct stream of binary data, the PDF file will execute. This is an example of Living-off-the-Land (LotL) tactics: an attacker makes use of the operating system’s resources to compromise their target.

And thus we conclude our analysis of the malicious code. Its function is to communicate with a C2 server and drop an executable payload. Malware of this type is known as a dropper.

Chasing the attacker

The domain pointing to the C2 server was registered on the day we received the malicious sample and showed no abuse reports. The corresponding IP address belongs to a Russian web hosting company and does not provide further information about the attacker. As we attempted to conduct a dynamic analysis of the malicious sample, it became apparent that the C2 server was offline. When it eventually came back online, it seemed to have been wiped.

The most likely scenario was that an opportunistic attacker had compromised an email account from the cloud service provider, as well as a vulnerable server. The attacker then used the email account to launch a phishing campaign, sending a malicious Word document to compromise target devices.

Fool me once…

If there is anything to take away from this analysis, it is that no single layer of defense is infallible. In this example, email filtering was bypassed by taking control of a trusted sender and sending malware as an encrypted attachment. Antivirus solutions were temporarily neutralised by code obfuscation. As we saw, the attacker used random word insertion, naming obfuscation and an inventive hiding mechanism to craft malicious code from which most suspicious instructions are absent. While Windows Defender eventually flagged the file as malware, it had been able to stay on our machine’s disk long enough for a user to fall for the trap.

Although both of these defenses failed at first, our client was not compromised. The email’s recipient never opened the attachment and instead flagged it as malicious. In taking such precautions, the recipient became an additional layer of defense and played a crucial role in the company’s security as a whole. In this case, the attacker’s target showed good security awareness, flagging a suspicious email even though its sender was a known partner.

info@threatspike.com

Deobfuscating a Malicious Word Document

Deobfuscating a Malicious Word Document