Windows Internals: The Storage Stack

Understanding the Windows Storage Stack

Understanding the Windows Storage Stack

Prerequisites

Before diving into the Windows Storage Stack, you should have a basic understanding of the following concepts:

  • Operating Systems: General knowledge of how operating systems work.
  • File Systems: How data is organized and stored on a disk (e.g., files, directories).
  • I/O Operations: The process of reading and writing data (Input/Output).
  • Drivers: Software that allows the OS to communicate with hardware.
  • Function Calls: Basic understanding of how software components interact.
  • Device Objects and Device Extensions: How drivers represent hardware in software.

Introduction

The Windows Storage Stack is a layered architecture that manages how your applications access data stored on storage devices (like hard drives, SSDs, USB drives, etc.). It's like a well-organized series of steps, where each layer has a specific job, making the process of reading and writing data efficient and reliable.

The Storage Stack: A Layered Approach

Application
I/O Subsystem
File System
Volume Snapshot
Volume Manager
Partition Manager
Class
Port
Miniport
Disk Hardware
Layer Description
Application The program requesting to read or write data.
I/O Subsystem Sends the I/O request to the File System.
File System Translates file-level requests (like "open file") into requests for specific locations on the volume.
Volume Snapshot Manages software snapshots (for backups and system restore).
Volume Manager Presents logical volumes (like C: or D: drives) to the user. Manages basic and dynamic disks.
Partition Manager Manages partitions on the disk.
Class Driver Handles device-type specific operations (e.g., disk, tape).
Port Driver Manages the specific transport protocol (like SATA, SAS, NVMe).
Miniport Driver Vendor-supplied driver that interacts directly with the hardware.
Disk Hardware The Physical Storage device

Key Concepts

Let's explain some terms you will often come across

Device Objects and Device Extensions

In Windows, drivers use data structures called device objects to represent hardware or software devices. A device object holds general information about the device. Each class of drivers (like storage class drivers) can add its own specific data to a device object using a device extension. This is like having a general form (the device object) and adding extra details (the device extension) specific to a particular type of item.

The DeviceExtension field within the device object holds this class-specific data. Different classes use different data structures within the DeviceExtension.

Here is an example of the _DEVICE_OBJECT structure definition in C:


nt!_DEVICE_OBJECT
+0x000 Type               : Int2B
+0x002 Size               : Uint2B
+0x004 ReferenceCount     : Int4B
+0x008 DriverObject       : Ptr64 _DRIVER_OBJECT
+0x010 NextDevice         : Ptr64 _DEVICE_OBJECT
+0x018 AttachedDevice     : Ptr64 _DEVICE_OBJECT
+0x020 CurrentIrp         : Ptr64 _IRP
+0x028 Timer              : Ptr64 _IO_TIMER
+0x030 Flags              : Uint4B
+0x034 Characteristics    : Uint4B
+0x038 Vpb                : Ptr64 _VPB
+0x040 DeviceExtension    : Ptr64 Void   <-- Class-specific data here
+0x048 DeviceType         : Uint4B
+0x04c StackSize          : Char
+0x050 Queue              : 
+0x098 AlignmentRequirement : Uint4B
+0x0a0 DeviceQueue        : _KDEVICE_QUEUE
+0x0c8 Dpc                : _KDPC
+0x108 ActiveThreadCount  : Uint4B
+0x110 SecurityDescriptor : Ptr64 Void
+0x118 DeviceLock         : _KEVENT
+0x130 SectorSize         : Uint2B
+0x132 Spare1             : Uint2B
+0x138 DeviceObjectExtension : Ptr64 _DEVOBJ_EXTENSION
+0x140 Reserved           : Ptr64 Void
                  

Drilling Down: The Disk.sys Class Driver

disk.sys is the class driver for hard disk devices. It handles operations common to all disks. Its DeviceExtension is of type disk!_COMMON_DEVICE_EXTENSION, which, among other things, points to a disk!_FUNCTIONAL_DEVICE_EXTENSION. These structures contain important information about the disk's state and location.

Here is an example of disk!_COMMON_DEVICE_EXTENSION:


dt disk!_COMMON_DEVICE_EXTENSION
+0x000 Version          : Uint4B
+0x008 DeviceObject     : Ptr64 _DEVICE_OBJECT
+0x010 LowerDeviceObject : Ptr64 _DEVICE_OBJECT
+0x018 PartitionZeroExtension : Ptr64 _FUNCTIONAL_DEVICE_EXTENSION
+0x020 DriverExtension  : Ptr64 _CLASS_DRIVER_EXTENSION
+0x028 RemoveLock       : _KEVENT
+0x030 RemoveEvent      : Int4B
+0x048 RemoveTrackingSpinlock : Uint8B
+0x050 RemoveTrackingList : Ptr64 Void
+0x058 RemoveTrackingUntrackedCount : Int4B
+0x060 DriverData       : Ptr64 Void
+0x068 IsFdo            : Pos 0, 1 Bit
+0x068 IsInitialized    : Pos 1, 1 Bit
+0x068 IsSrbLookasideListInitialized : Pos 2, 1 Bit
+0x069 PreviousState    : UChar
+0x06a CurrentState     : UChar
+0x06c IsRemoved        : Uint4B
+0x070 DeviceName       : _UNICODE_STRING
+0x080 ChildList        : Ptr64 _PHYSICAL_DEVICE_EXTENSION
+0x088 PartitionNumber  : Uint4B
+0x090 PartitionLength  : _LARGE_INTEGER
+0x098 StartingOffset   : _LARGE_INTEGER
+0x0a0 DevInfo          : Ptr64 _CLASS_DEV_INFO
+0x0a8 PagingPathCount  : Uint4B
+0x0ac DumpPathCount    : Uint4B
+0x0b0 HibernationPathCount : Uint4B
+0x0b8 PathCountEvent   : _KEVENT
+0x100 SrbLookasideList : _NPAGED_LOOKASIDE_LIST
+0x180 MountedDeviceInterfaceName : _UNICODE_STRING
+0x190 GuidCount        : Uint4B
+0x198 GuidRegInfo      : Ptr64 GUIDREGINFO
+0x1a0 FileObjectDictionary : _DICTIONARY
+0x1b8 PrivateCommonData : Ptr64 _CLASS_PRIVATE_COMMON_DATA
+0x1c0 DispatchTable    : Ptr64 Ptr64 long
+0x1c8 Reserved3        : Uint8B
+0x1d0 Reserved4        : Uint8B
                    

And here is disk!_FUNCTIONAL_DEVICE_EXTENSION:


disk!_FUNCTIONAL_DEVICE_EXTENSION
+0x000 Version           : Uint4B
+0x008 DeviceObject      : Ptr64 _DEVICE_OBJECT
+0x000 CommonExtension   : _COMMON_DEVICE_EXTENSION
+0x200 LowerPdo          : Ptr64 _DEVICE_OBJECT
+0x208 DeviceDescriptor  : Ptr64 _STORAGE_DEVICE_DESCRIPTOR
+0x210 AdapterDescriptor : Ptr64 _STORAGE_ADAPTER_DESCRIPTOR
+0x218 DevicePowerState : _DEVICE_POWER_STATE
+0x21c DMByteSkew        : Uint4B
+0x220 DMSkew            : Uint4B
+0x224 DMActive          : UChar
+0x225 SenseDataLength   : UChar
+0x226 Reserved0         : [2] UChar
+0x228 DiskGeometry      : _DISK_GEOMETRY
+0x240 SenseData         : Ptr64 _SENSE_DATA
+0x248 TimeOutValue      : Uint4B
+0x24c DeviceNumber      : Uint4B
+0x250 SrbFlags          : Uint4B
+0x254 ErrorCount        : Uint4B
+0x258 LockCount         : Int4B
+0x25c ProtectedLockCount : Int4B
+0x260 InternalLockCount : Int4B
+0x268 EjectSynchronizationEvent : _KEVENT
+0x280 DeviceFlags       : Uint2B
+0x282 SectorShift       : UChar
+0x283 CdbForceUnitAccess : UChar
+0x288 MediaChangeDetectionInfo : Ptr64 _MEDIA_CHANGE_DETECTION_INFO
+0x290 Unused1           : Ptr64 _KEVENT
+0x298 Unused2           : Ptr64 Void
+0x2a0 KernelModeMcnContext : _FILE_OBJECT_EXTENSION
+0x2b8 MediaChangeCount  : Uint4B
+0x2c0 DeviceDirectory   : Ptr64 Void
+0x2c8 ReleaseQueueSpinLock : Uint8B
+0x2d0 ReleaseQueueIrp   : Ptr64 _IRP
+0x2d8 ReleaseQueueSrb   : _SCSI_REQUEST_BLOCK
+0x330 ReleaseQueueNeeded : UChar
+0x331 ReleaseQueueInProgress : UChar
+0x332 ReleaseQueueIrpFromPool : UChar
+0x333 FailurePredicted  : UChar
+0x334 FailureReason     : Uint4B
+0x338 FailurePredictionInfo : Ptr64 _FAILURE_PREDICTION_INFO
+0x340 PowerDownInProgress : UChar
+0x344 EnumerationInterlock : Uint4B
+0x348 ChildLock         : _KEVENT
+0x360 ChildLockOwner    : Ptr64 _KTHREAD
+0x368 ChildLockAcquisitionCount : Uint4B
+0x36c ScanForSpecialFlags : Uint4B
+0x370 PowerRetryDpc     : _KDPC
+0x3b0 PowerRetryTimer   : _KTIMER
+0x3f0 PowerContext      : _CLASS_POWER_CONTEXT
+0x478 PrivateFdoData    : Ptr64 _CLASS_PRIVATE_FDO_DATA
+0x480 FunctionSupportInfo : Ptr64 _CLASS_FUNCTION_SUPPORT_INFO
+0x488 MiniportDescriptor : Ptr64 _STORAGE_MINIPORT_DESCRIPTOR
+0x490 AdditionalFdoData : Ptr64 _ADDITIONAL_FDO_DATA

                    

The Class/Port/Miniport Model

The core of the storage stack uses a three-layer driver model:

  • Class Drivers: Provide functionality common to a *class* of devices (e.g., all disks, all tapes). Microsoft provides these.
  • Port Drivers: Implement communication over a specific type of *bus* (like SATA or NVMe). Microsoft provides these.
  • Miniport Drivers: Handle the specifics of a particular hardware *controller*. Storage vendors provide these. They "plug in" to the port driver.

This model allows for flexibility and easier driver development. Vendors only need to write a miniport driver for their specific hardware, relying on the class and port drivers for common functionality.

Partition Manager (partmgr.sys)

The Partition Manager is responsible for discovering, creating, deleting, and managing partitions on a disk. It works closely with the disk class driver (disk.sys) to read partition information from the disk (using the MBR or GPT) and create device objects to represent those partitions.

Key Function:

IoReadPartitionTableEx: This function is used by the Partition Manager to read the partition table from the disk.

Volume Manager (volmgr.sys)

The Volume Manager sits above the Partition Manager. It presents the enumerated partitions to higher layers of the operating system as logical volumes (e.g., \Device\HarddiskVolume1, \Device\HarddiskVolume2, etc.). It also works with the Plug and Play (PnP) Manager to handle changes to disk configurations.

The volmgr.sys use \Device\HarddiskVolumeX where X is a number that represents the logical partition

Virtual Hard Disk (VHD) Support (vhdmp.sys)

Windows has built-in support for Virtual Hard Disk (VHD and VHDX) files. The vhdmp.sys driver acts as a parser for the VHD format and provides the necessary interfaces for the OS to treat a VHD file like a physical disk. It sits on the top of the file system in the storage stack

Storage Spaces Direct

Storage Spaces Direct (S2D) is a feature in Windows Server that allows you to cluster servers with local storage into a highly available and scalable software-defined storage solution. Here's a simplified breakdown of the key components.

Key Components

  • Spaceport.sys: This is the core driver for Storage Spaces Direct. It manages the storage pool, which is a collection of physical disks from multiple servers.
  • ClusPort.sys: Handles cluster communication and coordination between the servers.
  • SMB (Server Message Block): Used for communication between nodes in the S2D cluster.
  • Clusbflt.sys Cluster block filter driver.
  • Firmware: The firmware on your storage devices (SSDs and HDDs) plays a role, especially in terms of performance and compatibility.

How it Works (Simplified)

  1. Pooling: Physical disks from multiple servers are combined into a single storage pool.
  2. Virtual Disks (Spaces): From the pool, you create virtual disks, also known as "spaces". These virtual disks can have different resiliency settings (mirroring, parity) to protect against disk failures.
  3. Volumes: The virtual disks appear to the Windows operating system as regular disks. You can format them with a file system (like ReFS or NTFS) and create volumes.

Author: Asher Le