The Windows Portable Executable (PE) File Format
17.Dec.2023, Yuriy Georgiev
Intro
- ELF – used in many Unix-like operating systems (Linux, BeOS…)
- Mach-O – from the NeXTSTEP to nowadays macOS, iOS, watchOS, tvOS (apple in general)
- COM – used in MS DOS, CP/M, FreeDOS and others alike
The dominant format then was MZ MS-DOS format, which has a special marker (or magic string) at the very beginning of the file (the first 2 bytes) to identify itself: the letters “MZ”. These are the initials of Mark Zbikowski, one of the MS-DOS developers.
The structure of PE
- DOS Header
| |
| +--- MZ Header ("MZ" string)
| |
| +--- DOS Stub (a minimal DOS program that says
| "This program cannot be run in DOS mode")
|
+ NT Header
| |
| +--- PE Header ("PE" string)
| |
| +--- Image NT Header
| |
| +--- Image File Header
| |
| +--- Image Optional Header
| |
| +--- Image Directory Entry #1
| +--- ...
| +--- Image Directory Entry #N
|
+ Section Headers Array
| |
| +--- Image Section Header #1
| | |
| | +--- .text (code section header)
| | |
| | +--- .data (data section header, contains several sub-data section headers)
| | | |
| | | +--- .data (header only, info about the section below)
| | | +--- .rdata (header only, info about the section below)
| | | +--- .bss (header only, info about the section below)
| | |
| | +--- .idata (header only, info about the section below)
| | +--- .edata (header only, info about the section below)
| | +--- .rsrc (header only, info about the section below)
| | +--- .reloc (header only, info about the section below)
| | +--- others...
| |
| +--- Image Section Header #2
| +--- ...
| +--- Image Section Header #N
|
- Sections
|
+--- .text (executable code)
+--- .rdata (read-only initialized data)
+--- .data (initialized, predefined data/variables)
+--- .pdata (exception information)
+--- .rsrc (resource section: icons, images, GUI strings, etc.)
+--- .bss (uninitialized data/variables)
+--- .edata (export API data - all public APIs implemented in your app)
+--- .idata (import API data - all APIs called from your app)
+--- ...
Structure Dissection
I will go through some of the sections and give you some basic information and explanations for some of them.
It’s not possible to go through all the information about the PE file format in one tutorial, but you will learn things that may guide you to further readings.
The best resource to learn more in-depth for the PE is none other than Microsoft’s official documentation pages.
Link: https://learn.microsoft.com/en-us/windows/win32/debug/pe-format
MZ header
But it’s not just the “MZ” signature. Right after it there is additional information such as:
If you take a closer look in a Hex Editor you will see that information is present in the beginning of the file in a hex format. Have a look at the last value “File address of new exe header” – 108 (HEX):
Note that the “Magic number” in the table above 0x5A4D is exactly the hex value of the string “MZ” (in reverse-byte order in the hex editor, e.g.: 0x5A 0x4D -> 0x4D 0x5A). Also 0x108 is set as an int value (4 bytes, 32 bits) in reverse order again: 0x08 0x01 0x00 0x00.
This is because the hex editor is printing out hex bytes from low address to high address, and little-endian (such x86/ia32) machines store the low digits of multi-byte entities in the lower address. You can search and learn more about little-endian and big-endian over the internet.
DOS Stub
This is a small DOS program that has only one purpose: to print the text “This program cannot be run in DOS mode” and exit, in case the program is run under DOS.
The DOS Stub is right after the MZ Header.
PE Header (or NT Headers)
It starts with the string “PE” which is also a signature that this is a Portable Executable file.
If we break it down we will get to the sub-sections of it which are the Image File Header and the Image Optional Header.
Image File Header
Here is an overview of the Image File Header:
As you can see it contains some information about the target machine the executable is compiled for, sections count, time date stamp, etc.
Image Optional Header
After that is the Image Optional Header. It contains way more information than the Image File Header. I simply cannot comment each if its entries, but you can find detailed information about them in the Microsoft Documentation.
However, note the underlined line that says “Entry Point”. This one is important. This is the entry point of the program executable code. It’s pretty much the “main()” function of the application. It’s where the Windows loader begin execution.
Note that this value 0x957C8 is a RVA (relative virtual address), which is an actual file offset. Windows will convert it to VA (Virtual Address) by adding the Image Base value (see the screenshot above) to it and gets the address in the virtual memory of this very same code but after it is loaded and mapped by the Windows Loader.
The address is called a “VA” because Windows creates a distinct VA space for each process, independent of physical memory (RAM).
In other words, when loading the application, Windows is allocating a process space in the memory for it, then loads end expands the file there. Then it starts executing the code from the memory by following this Entry Point address.
You can read more about the Virtual Memory Management in my other article here.
Section Headers (or Section Table)
It mostly contains the addresses, size and characteristics of each section of the file.
If someday you want to perform binary code injection you will need to alter the values here since you will either add new section or alter existing one, causing a change of its size. It’s what we did with my colleague.
Rebuilding it, is not an easy task. Whichever section you alter, you need to update its header and also shift the next sections based on the changes you did. A precise address calculations are required otherwise the PE will be corrupted and won’t even run.
Sections
- .text – contains executable code
- .data – containers initialized data and variables
- .bss – uninitialized data
- .idata – import data — all the APIs you call from your application
- .edata – export data — all public APIs you coded in your application
- .rsrc – the resource section. This is the most complicated one. It contains virtual directories and subdirectories which contain localizations and data such as images, icons, etc. It is a file structure on its own to some degree.
Conclusion
The PE file format is both straightforward and a bit complicated at the same time.
It is very old but still works its purpose.
If you are interested in Operating Systems, reverse-engineering, malware analysis, code injection, or any other kind of binary manipulation and analysis, the PE structure is a must-have knowledge to have in your toolbox.
A good complimentary read after this article is my “Reverse-Engineering: 101” tutorial.
Best of luck.