Home Malware Analysis 1 - Creating a PE parser, Shannon Entropy and more (Golang)
Post
Cancel

Malware Analysis 1 - Creating a PE parser, Shannon Entropy and more (Golang)

Introduction

Hello hackers!

Today we’ll be creating a CLI tool to analyze and extract PE files information as much as possible using github.com/Binject/debug/pe package.

Explanation

Parsing PE files in Golang is quite hard, that’s why we’ll use the github.com/Binject/debug/pe package as it provides a lot of functions, structs and more to work with PE files without breaking our head.

But first of all let’s discuss about what a PE file is and its different parts. The Portable Executable (PE) format is a file format for executables, object code, DLLs and others used in 32-bit and 64-bit versions of Windows operating systems. The PE format is a data structure which encapsulates the information necessary for the Windows OS loader to manage the wrapped executable code. This includes dynamic library references for linking, API export and import tables, resource management data and thread-local storage (TLS) data. On NT operating systems, the PE format is used for EXE, DLL, SYS (device driver), MUI and other file types. The Unified Extensible Firmware Interface (UEFI) specification states that PE is the standard executable format in EFI environments. Filename extensions: .acm, .ax, .cpl, .dll, .drv, .efi, .exe, .mui, .ocx, .scr, .sys, .tsp

I won’t explain in depth all the PE structure parts as it will be a really long post so you can get a quick overview of it with this general scheme

However I encourage you to check out the “References” posts, they could be really useful

These are the main goals of our PE parser/analyzer:

  • DOS, RICH, NT, File and Optional headers info
  • Get all data directories info
  • Section headers info like .text and .reloc
  • Import table attributes
  • Base relocations table
  • MD5, Sha-1 and Sha-256 file hashes
  • Calculate file Shannon entropy

Code

In this case we’ll create a CLI program with multiple arguments

We start by downloading the pe package

1
go get github.com/Binject/debug/pe

Then we import the packages

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
package main

import (
  "os"
  "fmt"
  "log"
  "flag"
  "time"
  "crypto/md5"
  "crypto/sha1"
  "crypto/sha256"
  "encoding/hex"
  "encoding/binary"

  // Used to parse PE files
  "github.com/Binject/debug/pe"
)

Before starting with the PE part we define the flags. They will be a variable with the path to the PE and an extra “verbose”

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
...

func main(){
  var file string
  var verbose bool

  flag.StringVar(&file, "f", "", "path to PE file to parse and analyze")
  flag.BoolVar(&verbose, "v", false, "enable verbose")
  flag.Parse()

  if file == "" {
    Banner()
    fmt.Println("Usage: .\\main.exe -f malware.exe")
    flag.PrintDefaults()
    os.Exit(0)
  }
}

Let’s start with the pe package

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
func main(){
  ...

  // Parse PE structure
  pe_file, err := pe.Open(file)
  if err != nil {
    log.Fatal(err)
  }
  // Close file
  defer pe_file.Close()

  if verbose {
    fmt.Println("[*] Determining file type...")
  }

  // Check PE type (PE32 or PE32+)
  var opt_header32 *pe.OptionalHeader32
  var opt_header64 *pe.OptionalHeader64
  var opt32 bool

  if pe_file.FileHeader.SizeOfOptionalHeader == sizeofOptionalHeader32 {
    fmt.Println("[+] PE type: PE32\n")
    // We'll use this later
    opt_header32, _ = pe_file.OptionalHeader.(*pe.OptionalHeader32)
    opt32 = true
  } else if pe_file.FileHeader.SizeOfOptionalHeader == sizeofOptionalHeader64 {
    fmt.Println("[+] PE type: PE32+\n")
    // We'll use this later
    opt_header64, _ = pe_file.OptionalHeader.(*pe.OptionalHeader64)
    opt32 = false
  } else {
    fmt.Println("[-] Error recognizing PE type!")
    os.Exit(0)
  }

}

As most of antivirus and malware analyzers, we also want to get the file hash in common formats like md5, sha1 and sha256 so let’s do that:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
func main(){
  ...

  // file handle
  f, _ := os.Open(file)

  // md5 hash
  md5_h := md5.New()
  _, err = io.Copy(md5_h, f)
  hashInBytes := md5_h.Sum(nil)
  fmt.Println("[*] MD5 hash:", hex.EncodeToString(hashInBytes))

  // sha1 hash
  sha1_h := sha1.New()
  _, err = io.Copy(sha1_h, f)
  hashInBytes = sha1_h.Sum(nil)
  fmt.Println("[*] Sha-1 hash:", hex.EncodeToString(hashInBytes))

  // sha256 hash
  sha256_h := sha256.New()
  _, err = io.Copy(sha1_h, f)
  hashInBytes = sha256_h.Sum(nil)
  fmt.Println("[*] Sha-256 hash:", hex.EncodeToString(hashInBytes), "\n")
}

Another simple malicious indicator is the entropy. What does entropy is? Well, it’s called Shannon entropy and comes from the Information Theory. It’s the amount of randomness in a message or data stream. This is its matemathical formula:

And we can calculate the PE entropy like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
func main(){
  ...

  f2, _ := os.Open(file)
  defer f2.Close()

  contents, err := ioutil.ReadAll(f2)
  if err != nil {
    log.Fatal(err)
  }

  freq := make(map[byte]int)
  for _, b := range contents {
    freq[b]++
  }

  totalBytes := len(contents)
  probs := make(map[byte]float64)
  for b, f := range freq {
    probs[b] = float64(f) / float64(totalBytes)
  }

  entropy := 0.0
  for _, p := range probs {
    if p > 0 {
      entropy -= p * math.Log2(p)
    }
  }

  fmt.Println("File entropy:", entropy)
}

Before starting with the PE file format thigs, I recommend you to install the CFF Explorer because it’s a really great tool which can be extremely useful to analyze PE files, processes and more. Some of its features are this:

  • Process Viewer
  • Drivers Viewer
  • Windows Viewer
  • PE and Memory Dumper
  • Full support for PE32/64
  • Special fields description and modification (.NET supported)
  • PE Utilities
  • PE Rebuilder (with Realigner, IT Binder, Reloc Remover, Strong Name Signature Remover, Image Base Changer)
  • View and modification of .NET internal structures
  • Resource Editor (full support for Windows Vista icons)
  • Quick Disassembler (x86, x64, MSIL)
  • Support in the Resource Editor for .NET resources (dumpable as well)
  • File Scanner
  • Extension support

And much more

As you can see here it has a multi-option menu with a clean view of the information

Once we’ve opened and parsed our file we have to iterate over the different properties we want to analyze. We start with the DOS header (also called MS-DOS), its structure is something like this (code in C++):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
typedef struct _IMAGE_DOS_HEADER {      // DOS .EXE header
    WORD   e_magic;                     // Magic number
    WORD   e_cblp;                      // Bytes on last page of file
    WORD   e_cp;                        // Pages in file
    WORD   e_crlc;                      // Relocations
    WORD   e_cparhdr;                   // Size of header in paragraphs
    WORD   e_minalloc;                  // Minimum extra paragraphs needed
    WORD   e_maxalloc;                  // Maximum extra paragraphs needed
    WORD   e_ss;                        // Initial (relative) SS value
    WORD   e_sp;                        // Initial SP value
    WORD   e_csum;                      // Checksum
    WORD   e_ip;                        // Initial IP value
    WORD   e_cs;                        // Initial (relative) CS value
    WORD   e_lfarlc;                    // File address of relocation table
    WORD   e_ovno;                      // Overlay number
    WORD   e_res[4];                    // Reserved words
    WORD   e_oemid;                     // OEM identifier (for e_oeminfo)
    WORD   e_oeminfo;                   // OEM information; e_oemid specific
    WORD   e_res2[10];                  // Reserved words
    LONG   e_lfanew;                    // File address of new exe header
} IMAGE_DOS_HEADER, *PIMAGE_DOS_HEADER;

So that can be translated into Golang code like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
type DosHeader struct {
	MZSignature              uint16
	UsedBytesInTheLastPage   uint16
	FileSizeInPages          uint16
	NumberOfRelocationItems  uint16
	HeaderSizeInParagraphs   uint16
	MinimumExtraParagraphs   uint16
	MaximumExtraParagraphs   uint16
	InitialRelativeSS        uint16
	InitialSP                uint16
	CheckSum                 uint16
	InitialIP                uint16
	InitialRelativeCS        uint16
	AddressOfRelocationTable uint16
	OverlayNumber            uint16
	Reserved                 [4]uint16
	OEMid                    uint16
	OEMinfo                  uint16
	Reserved2                [10]uint16
	AddressOfNewExeHeader    uint32
}

All the names are different as on the github.com/Binject/debug/pe package but the order is the same.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
func main(){
  ...

  // DOS Header info
  var dos_check string
  if pe_file.DosExists == true {
    dos_check = "Yes"
  } else {
    dos_check = "No"
  }

  fmt.Println("[+] DOS Header:")
  fmt.Println("  Is header present?:", dos_check)

  fmt.Printf("  Magic: 0x%X\n", pe_file.DosHeader.MZSignature)
  fmt.Printf("  New exe header addr: 0x%X\n", pe_file.DosHeader.AddressOfNewExeHeader)

  // If verbose flag is especified
  // more DOS header info is printed
  if verbose {
    fmt.Printf("  File size in pages: 0x%X\n", pe_file.DosHeader.FileSizeInPages)
    fmt.Printf("  Checksum: 0x%X\n", pe_file.DosHeader.CheckSum)
    fmt.Printf("  Overlay number: 0x%X\n", pe_file.DosHeader.OverlayNumber)
    fmt.Printf("  Relocation table addr: 0x%X\n", pe_file.DosHeader.AddressOfRelocationTable)
  }
}

As you can see we check for verbose to print more info if user wants

Now let’s see what about the DOS Stub. First of all, we should know that the PE header starts with the MS-DOS header and contains a 16-bit MS-DOS executable (stub program). When the PE format was introduced (year 1994, Windows NT 3.1), DOS was still very much around. The risk that a Windows EXE would be run from DOS by mistake was very real. So they needed to make Windows EXE’s superficially compatible with the DOS loader. So that in such a scenario the program would do something (i. e. print a message and quit) instead of crashing randomly. That’s why DOS stub actually exists.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
func main(){
  ...

  // DOS stub bytes
  fmt.Println("[+] DOS Stub:")
  fmt.Print("  ")
  for i, b := range pe_file.DosStub {
    // Print it in columns
    if (i + 1) % 5 == 0 {
      fmt.Printf("0x%X\n  ", b)
    } else {
      fmt.Printf("0x%X, ", b)
    }
  }
}

Let’s move to the Rich Header. It’s an undocumented header contained within PE files compiled and linked using the Microsoft toolchain. It contains information about the build environment that the PE file was created in.

As well as with the DOS stub we’ll print it in columns

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
func main(){
  ...

  if verbose {
    fmt.Println("[*] Parsing Rich header...")
  }

  if len(pe_file.RichHeader) == 0 {
    fmt.Println("[+] Rich Header not found\n")
  } else {
    fmt.Println("[+] Rich Header:")
    // Rich header bytes
    fmt.Print("  ")
    for i, b := range pe_file.RichHeader {
      // Print it in columns
      if (i + 1) % 5 == 0 {
        fmt.Printf("0x%X\n  ", b)
      } else {
        fmt.Printf("0x%X, ", b)
      }
    }
  }
}

Let’s continue with the File Header (also calle PE Header) which is located by looking at the e_lfanew field of the MS-DOS Header, some of its most important fields are: Machine, NumberOfSections, NumberOfSymbols and others.

This is its Golang structure:

1
2
3
4
5
6
7
8
9
type FileHeader struct {
	Machine              uint16
	NumberOfSections     uint16
	TimeDateStamp        uint32
	PointerToSymbolTable uint32
	NumberOfSymbols      uint32
	SizeOfOptionalHeader uint16
	Characteristics      uint16
}

Now we move all this info into code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
func main(){
  ...

  if verbose {
    fmt.Println("[*] Parsing File header...")
  }

  // File header part
  fmt.Println("[+] File Header:")
  fmt.Printf("  Machine: 0x%X\n", pe_file.FileHeader.Machine)
  fmt.Printf("  Number of sections: 0x%X\n", pe_file.FileHeader.NumberOfSections)

  if verbose { // Print more info if verbose flag is enable
    fmt.Printf("  Timestamp: 0x%X\n", pe_file.FileHeader.TimeDateStamp)
    fmt.Printf("  Symbol table pointer: 0x%X\n", pe_file.FileHeader.PointerToSymbolTable)
    fmt.Printf("  Number of symbols: 0x%X\n", pe_file.FileHeader.NumberOfSymbols)
    fmt.Printf("  Characteristics: 0x%X\n", pe_file.FileHeader.Characteristics)
  }
}

Let’s move onto the Optional Header, at this point we also have to use the earlier defined variables because we have to distinguise between OptionalHeader32 and OptionalHeader64. Both are almost the same but it doesn’t matter, this is the OptionalHeader32 structure in Golang:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
type OptionalHeader32 struct {
	Magic                       uint16
	MajorLinkerVersion          uint8
	MinorLinkerVersion          uint8
	SizeOfCode                  uint32
	SizeOfInitializedData       uint32
	SizeOfUninitializedData     uint32
	AddressOfEntryPoint         uint32
	BaseOfCode                  uint32
	BaseOfData                  uint32
	ImageBase                   uint32
	SectionAlignment            uint32
	FileAlignment               uint32
	MajorOperatingSystemVersion uint16
	MinorOperatingSystemVersion uint16
	MajorImageVersion           uint16
	MinorImageVersion           uint16
	MajorSubsystemVersion       uint16
	MinorSubsystemVersion       uint16
	Win32VersionValue           uint32
	SizeOfImage                 uint32
	SizeOfHeaders               uint32
	CheckSum                    uint32
	Subsystem                   uint16
	DllCharacteristics          uint16
	SizeOfStackReserve          uint32
	SizeOfStackCommit           uint32
	SizeOfHeapReserve           uint32
	SizeOfHeapCommit            uint32
	LoaderFlags                 uint32
	NumberOfRvaAndSizes         uint32
	DataDirectory               [16]DataDirectory
}

So let’s print some important info of this header

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
func main(){
  ...

  if verbose {
    fmt.Println("[*] Parsing Optional header...")
  }

  fmt.Println("[+] Optional Header:")
  // Check if optional header is 32 or 64
  if opt32 == true {
    fmt.Printf("  Magic: 0x%X\n", opt_header32.Magic)
    fmt.Printf("  Code size: 0x%X\n", opt_header32.SizeOfCode)
    fmt.Printf("  Checksum: 0x%X\n", opt_header32.CheckSum)

    if verbose {
      fmt.Printf("  Initialized data size: 0x%X\n", opt_header32.SizeOfInitializedData)
      fmt.Printf("  Uninitialized data size: 0x%X\n", opt_header32.SizeOfUninitializedData)
      fmt.Printf("  Entry point addr: 0x%X\n", opt_header32.AddressOfEntryPoint)
      fmt.Printf("  Code base: 0x%X\n", opt_header32.BaseOfCode)
      fmt.Printf("  Image base: 0x%X\n", opt_header32.ImageBase)
      fmt.Printf("  File alignment: 0x%X\n", opt_header32.FileAlignment)
    }
  } else {
    fmt.Printf("  Magic: 0x%X\n", opt_header64.Magic)
    fmt.Printf("  Code size: 0x%X\n", opt_header64.SizeOfCode)
    fmt.Printf("  Checksum: 0x%X\n", opt_header64.CheckSum)

    if verbose {
      fmt.Printf("  Initialized data size: 0x%X\n", opt_header64.SizeOfInitializedData)
      fmt.Printf("  Uninitialized data size: 0x%X\n", opt_header64.SizeOfUninitializedData)
      fmt.Printf("  Entry point addr: 0x%X\n", opt_header64.AddressOfEntryPoint)
      fmt.Printf("  Code base: 0x%X\n", opt_header64.BaseOfCode)
      fmt.Printf("  Image base: 0x%X\n", opt_header64.ImageBase)
      fmt.Printf("  File alignment: 0x%X\n", opt_header64.FileAlignment)
    }
  }
}

One of the most important parts of malware analysis is the DLLs and functions the PE file imports so let’s take a look at it using the Import Tables such as Import Address Table, Import Directory Table or Import Lookup Table

The import address table is the part of the Windows module (executable or dynamic link library) which records the addresses of functions imported from other DLLs. For example, if your program calls GetSystemInfo(), then the executable or DLL will have an entry in its import table that says, “I would like to be able to call the function GetSystemInfo() from kernel32.dll.” When the module is loaded, the system goes and finds that function, obtains its address, and stores it in a table known as the Import Address Table (IAT). When the module needs to call the GetSystemInfo() function, it does so by fetching the value from the Import Address Table and calling it

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
func main(){
  ...

  fmt.Println("[+] Import Table:\n")
  iat, _, _, err := pe_file.ImportDirectoryTable()
  if err != nil {
    log.Fatal(err)
  }

  symbols, err := pe_file.ImportedSymbols()
  if err != nil {
    log.Fatal(err)
  }

  for _, imp := range iat {
    fmt.Println("  DLL:", imp.DllName)
    fmt.Printf("  ILT RVA: 0x%X\n", imp.OriginalFirstThunk)
    fmt.Printf("  IAT RVA: 0x%X\n", imp.FirstThunk)

    if verbose {
      fmt.Printf("  Name RVA: 0x%X\n", imp.NameRVA)
    }
    fmt.Println("  Entries:")

    for _, s := range symbols {
      if strings.Split(s, ":")[1] == imp.DllName {
        fmt.Println("    " + strings.Split(s, ":")[0])
      }
    }
    fmt.Println()
  }
}

And finally let’s parse the Relocation Table. The relocation table is a lookup table that lists all parts of the PE file that need patching when the file is loaded at a non-default base address.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
func main(){
  ...

  fmt.Println("[+] Relocation Table:")
  reloc_table := pe_file.BaseRelocationTable
  fmt.Printf("  Number of entries: 0x%X\n", len(*reloc_table))
  fmt.Println("  Entries:\n")

  for _, s := range *reloc_table {
    time.Sleep(50 * time.Millisecond)
    fmt.Printf("    Virtual Addr: 0x%X\n", s.RelocationBlock.VirtualAddress)
    fmt.Printf("    Size: 0x%X\n", s.RelocationBlock.SizeOfBlock)
    fmt.Printf("    Type: 0x%X\n\n", s.BlockItems[0].Type)
  }
}

Now the code has ended, we could analyze a few more things but it’s fine like that so we put together all the code and it should work. The full tool source code is here

Demo

To check if this works we will use the generated .exe from first post to see the results.

Compile the code

And then we pass the .exe file to the program via -f flag

It seems to work and we see its hashes, entropy and more values. Let’s keep looking

Then we see can see the Optional Header and all the PE sections

In this picture we see, in this case, the DLL, its information and the imported functions from kernel32.dll (Have in mind that Golang is a bit different from C++ which is the most common language in malware dev so this works better with them)

And finally the relocation table

Extra

As we know one of the biggest signs of a PE file to be malware is that it uses some uncommon API calls like NtCreateThread or VirtualAllocEx so let’s improve the program a little bit.

We start by defining an array which holds some “malicious” API calls

1
var malicious_calls = []string{"ntcreatethread","createthread","virtualallocex","writeprocessmemory","createremotethread","queueuserapc","rtlmovememory","convertthreadtofiber","setthreadcontext","ntqueryinformationprocess","ntprotectvirtualmemory","ntwritevirtualmemory","ntallocatevirtualmemory","ntcreatethreadex","virtualalloc"}

And the modified part would be something like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
func main(){
  ...

  fmt.Println("[+] Import Table:\n")
  iat, _, _, err := pe_file.ImportDirectoryTable()
  if err != nil {
    log.Fatal(err)
  }

  symbols, err := pe_file.ImportedSymbols()
  if err != nil {
    log.Fatal(err)
  }

  for _, imp := range iat {
    fmt.Println("  DLL:", imp.DllName)
    fmt.Printf("  ILT RVA: 0x%X\n", imp.OriginalFirstThunk)
    fmt.Printf("  IAT RVA: 0x%X\n", imp.FirstThunk)

    if verbose {
      fmt.Printf("  Name RVA: 0x%X\n", imp.NameRVA)
    }
    fmt.Println("  Entries:")

    var m_check bool
    for _, s := range symbols {
      m_check = false
      if strings.Split(s, ":")[1] == imp.DllName {
        for _, call := range malicious_calls {
          if strings.ToLower(strings.Split(s, ":")[0]) == call {
            fmt.Println("    " + strings.Split(s, ":")[0] + " --> Malicious!")
            m_check = true
            break
          }
        }

        if m_check == false {
          fmt.Println("    " + strings.Split(s, ":")[0])
        }
      }
    }
    fmt.Println()
  }
}

And now if we run this new program against a C++ malware (they work better on this) which uses some of the defined API calls, we should see how the program sais that it’s malicious. This isn’t professional and it’s just a simple implementation which could be highly improved with much more functions but I let you to do that.

References

1
2
3
4
5
6
7
https://learn.microsoft.com/en-us/windows/win32/debug/pe-format
https://en.wikipedia.org/wiki/Portable_Executable
https://medium.com/ax1al/a-brief-introduction-to-pe-format-6052914cc8dd
https://tech-zealots.com/malware-analysis/pe-portable-executable-structure-malware-analysis-part-2/
https://malwology.com/2018/10/05/exploring-the-pe-file-format-via-imports/
https://www.ired.team/miscellaneous-reversing-forensics/windows-kernel-internals/pe-file-header-parser-in-c++
https://github.com/RichHeaderResearch/RichPE

Conclusion

We’ve learned the different parts of PE file format and how we can approach them to extract information from headers, sections, data directories and more. If I have an error or you wanna ask me anything contact me via Discord, my user is d3ext

This tool isn’t professional so you should use other tools like CFF Explorer but I hope this post and tool has helped you to understand how PE files work and how you can extract information from them to analyze potential malwares.

Source code here

Go back to top

This post is licensed under CC BY 4.0 by the author.