The Windows Portable Executable (PE) File Format

17.Dec.2023, Yuriy Georgiev

Intro

The Portable Executable (PE) is a Common Object File Format (COFF) for executable, object code and libraries of all Microsoft Windows operating systems.
 
Other COFF formats are:
  • ELF – used in many Unix-like operating systems (Linux, BeOS…)
  • Mach-O – from the NeXTSTEP to nowadays macOS, iOS, watchOS, tvOS (apple in general)
  • COM – used in MS DOS, CP/M, FreeDOS and others alike
 
PE stands for “Portable Executable” and the format is invented in the 1980s. 
The dominant format then was MZ MS-DOS format, which has a special marker (or magic string) at the very beginning of the file (the first 2 bytes) to identify itself: the letters “MZ”. These are the initials of Mark Zbikowski, one of the MS-DOS developers. 
If you open any MS DOS or Windows executable with a hex editor, you will still find those two bytes at the beginning of the file.
 
The PE file format plays a crucial role in the functioning of Windows-based applications and is an essential component of the Windows operating system.
 
The Windows Portable Executable (PE) file format is a file format used for executables, DLLs (dynamic-link libraries), and VXDs (device drivers — VxD stands for “Virtual xxx Driver”, where “xxx” is a class of device) in the Microsoft Windows.
 
It defines the structure of the binary executable file, including header information, sections, and resources.
 
The structure of the PE file format allows the operating system to load and execute the binary executable file by following the information provided in the header and sections of the file.
 
It is also the standard file format of the EFI binaries executed by your UEFI (the BIOS successor).
 
I and a colleague of mine (Dentcho Bankov) happened to code EFI binaries and a code injector, at VMware, for a project he owned, called OTAMax. Our goal was to inject a payload to the resource section (.rsrc) of the EFI binary and later export that payload and execute it. It was a pure fun low-level party to us. I initially coded it in C, and as he always does, he later improved it and recoded it in D. He has this bizarre fetish to D… inexplicable.

For all the screenshots I used PE-Bear, a PE reversing and analysis tool. 

The structure of PE 

- DOS Header
|	|
|	+--- MZ Header ("MZ" string)
|  	|
|	+--- DOS Stub (a minimal DOS program that says 
|           "This program cannot be run in DOS mode")
|
+ NT Header
|	|
|	+--- PE Header ("PE" string)
|  	|
|	+--- Image NT Header
|  	|
|	+--- Image File Header
|  	|
|	+--- Image Optional Header
|		|
|		+--- Image Directory Entry #1
|		+--- ...
|		+--- Image Directory Entry #N
|
+ Section Headers Array
|	|
|	+--- Image Section Header #1
|	|	|
|	|	+--- .text (code section header)
|	|  	|
|	|	+--- .data (data section header, contains several sub-data section headers)
|	|	|	|
|	|	|	+--- .data (header only, info about the section below)
|	|	|	+--- .rdata (header only, info about the section below)
|	|	|	+--- .bss (header only, info about the section below)
|	|	|
|	|	+--- .idata (header only, info about the section below)
|	|	+--- .edata (header only, info about the section below)
|	|	+--- .rsrc (header only, info about the section below)
|	|	+--- .reloc (header only, info about the section below)
|	|	+--- others...
|	|
|	+--- Image Section Header #2
|	+--- ...
|	+--- Image Section Header #N
|
- Sections
    |
    +--- .text (executable code)
    +--- .rdata (read-only initialized data)
    +--- .data (initialized, predefined data/variables)
    +--- .pdata (exception information)
    +--- .rsrc (resource section: icons, images, GUI strings, etc.)
    +--- .bss (uninitialized data/variables)
    +--- .edata (export API data - all public APIs implemented in your app)
    +--- .idata (import API data - all APIs called from your app)
    +--- ...

 
This is rather rough definition of the structure. For more details click on the image below.
 

Structure Dissection

I will go through some of the sections and give you some basic information and explanations for some of them.
It’s not possible to go through all the information about the PE file format in one tutorial, but you will learn things that may guide you to further readings. 

The best resource to learn more in-depth for the PE is none other than Microsoft’s official documentation pages.

Link: https://learn.microsoft.com/en-us/windows/win32/debug/pe-format

 

 

MZ header

If you open any executable (.exe) file with a hex editor the first thing you will see is the MZ header. This is also used as a signature (or identifier) by the operating system, to verify that this is a valid executable file (along with the “PE” string at the PE header address a bit forward in the file).
 

 

But it’s not just the “MZ” signature. Right after it there is additional information such as:

If you take a closer look in a Hex Editor you will see that information is present in the beginning of the file in a hex format. Have a look at the last value “File address of new exe header” – 108 (HEX):

Note that the “Magic number” in the table above 0x5A4D is exactly the hex value of the string “MZ” (in reverse-byte order in the hex editor, e.g.: 0x5A 0x4D -> 0x4D 0x5A). Also 0x108 is set as an int value (4 bytes, 32 bits) in reverse order again: 0x08 0x01 0x00 0x00.

This is because the hex editor is printing out hex bytes from low address to high address, and little-endian (such x86/ia32) machines store the low digits of multi-byte entities in the lower address. You can search and learn more about little-endian and big-endian over the internet.

DOS Stub

Remember the COM COFF I’ve mentioned at the beginning of this article? The DOS Stub is actually one such program.

This is a small DOS program that has only one purpose: to print the text “This program cannot be run in DOS mode” and exit, in case the program is run under DOS.

The DOS Stub is right after the MZ Header.

PE Header (or NT Headers)

The PE header contains some valuable information about the application, and more specifically, information that the Windows loader will use to load and execute the program.
It starts with the string “PE” which is also a signature that this is a Portable Executable file. 

If we break it down we will get to the sub-sections of it which are the Image File Header and the Image Optional Header.

 

Image File Header

Here is an overview of the Image File Header:

As you can see it contains some information about the target machine the executable is compiled for, sections count, time date stamp, etc.

 

Image Optional Header

After that is the Image Optional Header. It contains way more information than the Image File Header. I simply cannot comment each if its entries, but you can find detailed information about them in the Microsoft Documentation.

However, note the underlined line that says “Entry Point”. This one is important. This is the entry point of the program executable code. It’s pretty much the “main()” function of the application. It’s where the Windows loader begin execution.

Note that this value 0x957C8 is a RVA (relative virtual address), which is an actual file offset. Windows will convert it to VA (Virtual Address) by adding the Image Base value (see the screenshot above) to it and gets the address in the virtual memory of this very same code but after it is loaded and mapped by the Windows Loader.
The address is called a “VA” because Windows creates a distinct VA space for each process, independent of physical memory (RAM).

In other words, when loading the application, Windows is allocating a process space in the memory for it, then loads end expands the file there. Then it starts executing the code from the memory by following this Entry Point address. 

You can read more about the Virtual Memory Management in my other article here.

 

Section Headers (or Section Table)

The Section Headers (or Section Table) contains information about the sections in the file. They may and most probably will vary from file to file based on various factors.

It mostly contains the addresses, size and characteristics of each section of the file.

If someday you want to perform binary code injection you will need to alter the values here since you will either add new section or alter existing one, causing a change of its size. It’s what we did with my colleague.

Rebuilding it, is not an easy task. Whichever section you alter, you need to update its header and also shift the next sections based on the changes you did. A precise address calculations are required otherwise the PE will be corrupted and won’t even run.

Sections

The actual sections contain relative to their purpose data or code.
Here is a quick overview of some of the most important sections:
  • .text – contains executable code
  • .data – containers initialized data and variables
  • .bss – uninitialized data
  • .idata – import data — all the APIs you call from your application
  • .edata – export data — all public APIs you coded in your application
  • .rsrc – the resource section. This is the most complicated one. It contains virtual directories and subdirectories which contain localizations and data such as images, icons, etc. It is a file structure on its own to some degree.

Conclusion

The PE file format is both straightforward and a bit complicated at the same time.
It is very old but still works its purpose. 

If you are interested in Operating Systems, reverse-engineering, malware analysis, code injection, or any other kind of binary manipulation and analysis, the PE structure is a must-have knowledge to have in your toolbox.

A good complimentary read after this article is my “Reverse-Engineering: 101” tutorial.

Best of luck.