Windows Server Failover Clustering

Executive Summary

Windows Server Failover Clustering (WSFC) provides the foundational high-availability platform for mission-critical workloads on the Windows Server operating system. This report details the software architecture of WSFC, focusing on Windows Server 2016 and later versions, intended for system architects and senior engineers. It dissects the core user-mode components like the Cluster Service (ClusSvc.exe) and Resource Host Subsystem (RHS.exe), kernel-mode drivers such as ClusDisk.sys and NetFT.sys, and the intricate integration with the operating system, networking, storage, and security subsystems. Key architectural elements examined include the cluster database replication mechanism using the Global Update Manager (GUM), the network heartbeat and fault tolerance provided by NetFT.sys, storage arbitration via SCSI-3 Persistent Reservations, the distributed file system capabilities of Cluster Shared Volumes (CSV), and the software-defined storage architecture of Storage Spaces Direct (S2D). The report further analyzes the security model involving Active Directory objects (CNO, VCOs) and authentication protocols, the operational mechanics governing quorum and failover processes, and the built-in telemetry and monitoring infrastructure. Understanding these internal workings is crucial for designing, deploying, and troubleshooting robust and resilient high-availability solutions based on WSFC.

1. Core WSFC Software Components

This section details the fundamental user-mode software components that constitute the WSFC engine, managing cluster state, resources, and node interactions. These components work cooperatively across all nodes to present a unified, highly available system.

1.1. Cluster Service (ClusSvc.exe) Architecture

The Cluster Service (ClusSvc.exe) is the central coordinating engine of WSFC, running as a vital Windows service on every node within the cluster.¹ Its primary mandate is to maintain the cluster's operational state, manage the lifecycle of clustered resources, coordinate activities across all participating nodes, and ultimately ensure the high availability of the clustered roles (applications and services).² Since Windows Server 2008, the Cluster Service runs in a special security context with specific permissions necessary for its operation, possessing fewer privileges than the Local System account, thereby adhering to the principle of least privilege.³

The responsibilities of ClusSvc.exe are multifaceted:

Node Management: It meticulously tracks the membership status of all nodes in the cluster, continuously monitoring their health primarily through the heartbeat mechanism facilitated by the NetFT.sys driver.² ClusSvc.exe manages the complex processes involved when nodes join or leave the cluster, handles administrative actions like pausing and resuming nodes, and orchestrates the state transitions (e.g., Up, Down, Paused) that define a node's participation level.²
Configuration Database Management: ClusSvc.exe is the custodian of the distributed cluster configuration database (ClusDB). It ensures that the configuration state is consistent across all active nodes by participating in robust replication mechanisms. This involves leveraging the Global Update Manager (GUM) to propagate and commit configuration changes cluster-wide.¹²
Resource Control: The service acts as the orchestrator for all cluster resources. It directs the state transitions of resources (e.g., bringing them online, taking them offline, handling failures) based on defined policies, inter-resource dependencies, and health monitoring information received indirectly from the Resource Host Subsystem (RHS). It communicates instructions to RHS.exe processes to load, unload, and interact with the specific Resource DLLs (ResDLLs) that manage individual resources.²³
Eventing and Notifications: ClusSvc.exe generates critical cluster events that are logged into the Windows Event Log system, providing a vital audit trail and diagnostic information. It also notifies relevant components, including cluster-aware applications, about significant state changes, such as resource failovers, enabling them to react appropriately.²

Conceptually, ClusSvc.exe contains several internal managers dedicated to specific functions: a Node State Manager, a Membership Manager (MM), a Database Manager (DM) interacting with ClusDB and GUM, a Resource Control Manager (RCM) for orchestrating resource states via RHS²³, the Global Update Manager (GUM) itself¹², an Eventing Subsystem, a Failover Manager (FM) responsible for failover decisions⁴, a Topology Manager (TM) for network discovery, a Quorum Manager (QM)⁹, a Security Manager (SM), and a Host Manager (HM) for initial connections¹⁵. (These managers are discussed in more detail in subsequent sections).

ClusSvc.exe interacts extensively with other system components. It directs RHS.exe for resource management²³, exposes management capabilities through ClusAPI.dll²⁵, relies on NetFT.sys for fault-tolerant networking and heartbeats²⁶, utilizes ClusDisk.sys for shared disk arbitration and access, and integrates with core OS services including the Service Control Manager (SCM) for service start/stop, the Event Log for diagnostics²⁷, and the Local Security Authority Subsystem Service (LSASS) for authentication related to its own operations and the management of cluster network identities in Active Directory.³

1.2. Resource Hosting Subsystem (RHS.exe)

The Resource Host Subsystem (RHS.exe) is a fundamental architectural component of Windows Server Failover Clustering (WSFC), designed as a dedicated surrogate process to host Resource DLLs (ResDLLs). This architectural separation from the core Cluster Service (ClusSvc.exe) is crucial for cluster stability and resilience. By isolating ResDLL execution, WSFC prevents failures within a single resource's code from crashing the main cluster service and bringing down the entire node's cluster participation.

RHS Architecture and Function

Isolation Surrogate: RHS.exe acts as a container for ResDLLs. The core Cluster Service (ClusSvc.exe) communicates with RHS.exe to manage the lifecycle and health of the resources hosted within it.
Fault Containment: If a ResDLL hosted within an RHS.exe instance encounters a critical error (e.g., an access violation, unhandled exception) or becomes unresponsive (deadlocks), only that specific RHS.exe process is terminated. The Cluster Service detects this termination and initiates recovery actions according to the resource's configured policies, which typically involves attempting to restart the resource, potentially in a new RHS.exe instance. This prevents the failure from impacting the core ClusSvc.exe or other resources running in different RHS processes.
Communication Protocol: The interaction between ClusSvc.exe (specifically the Resource Control Manager - RCM) and RHS.exe utilizes Remote Procedure Calls (RPC). RCM uses RPC to instruct RHS to load specific ResDLLs, bring resources online/offline, and perform health checks. RHS uses RPC to report status changes, health check results, and resource-specific events back to RCM.
Multiple Instances: A cluster node can, and typically does, run multiple instances of RHS.exe simultaneously. By default, several resources may share a single RHS process for efficiency.
Dedicated Monitor Option: For enhanced isolation, particularly for critical or potentially less stable resources, a resource can be configured to run in its own dedicated RHS.exe process. This is achieved by setting the SeparateMonitor common property of the resource to 1. This ensures that if that resource's DLL fails, it absolutely cannot affect any other resource running in different RHS processes. This is a common configuration for workloads like SQL Server Availability Groups.
32-bit Support: On 64-bit operating systems, a separate 32-bit version, often seen as rhs.exe *32 in Task Manager, is used to host legacy 32-bit Resource DLLs, ensuring compatibility.

Resource Health Monitoring (IsAlive/LooksAlive)

RHS plays a vital role in cluster health monitoring by periodically executing health checks defined within the hosted ResDLLs. These checks determine if a resource is functioning correctly. There are two primary types of health checks:

LooksAlive: A basic, quick check (typically defaults to every 5 seconds) to determine if the resource appears responsive at a fundamental level. It's designed to be lightweight and detect obvious failures quickly. A failure here usually indicates a significant problem.
IsAlive: A more thorough, potentially resource-intensive check (typically defaults to every 60 seconds) that verifies the actual functionality of the resource. This check confirms the resource can perform its designated task.

The frequency of these checks is configurable via the resource's properties (LooksAlivePollInterval, IsAlivePollInterval). RHS executes these checks based on the configured intervals and reports the status (Online, Failed) back to the RCM.

Failure Scenarios and Deadlocks

Resources hosted within RHS can fail in several ways relevant to the subsystem itself:

ResDLL Crash: The resource DLL encounters an unhandled exception (e.g., access violation), causing the hosting RHS.exe process to terminate unexpectedly. Event ID 1146 is typically logged in the System Event Log.
Resource Deadlock: A call made by RHS into the ResDLL (e.g., Online, Offline, IsAlive, LooksAlive) does not return within the configured PendingTimeout period (default varies by resource, often several minutes). RHS detects this timeout, declares the resource deadlocked, logs Event ID 1230 in the System Event Log, and terminates the hosting RHS.exe process to break the deadlock.

In both scenarios, the RCM within ClusSvc.exe is notified of the failure (either via the failure notification for a deadlock or the unexpected termination of the RHS process). RCM then initiates the resource's failover policy, which might involve restarting the resource on the same node (potentially in a separate monitor process if the SeparateMonitor property was triggered by the failure) or failing the entire resource group over to another node.

Common RHS-Related Failures and Indicators
Failure Type	Description	Primary Indicator(s)	Cluster Action
ResDLL Crash	Code within the Resource DLL causes an unhandled exception.	Hosting `RHS.exe` process terminates unexpectedly. System Event ID 1146 logged.	RCM detects termination, marks resource as failed, initiates resource failure policy.
Resource Deadlock	A call into the ResDLL (e.g., LooksAlive, IsAlive, Online) does not complete within its `PendingTimeout`.	System Event ID 1230 logged indicating a timeout. Hosting `RHS.exe` process is terminated by the cluster service.	RCM detects deadlock via RHS notification, marks resource as failed, initiates resource failure policy.

Troubleshooting RHS Issues

When troubleshooting failures involving RHS (indicated by Event IDs 1146 or 1230), the primary goal is to identify the specific resource and its associated DLL that caused the crash or deadlock. Analysis typically involves:

Reviewing the System Event Log for the specific RHS termination or deadlock events.
Examining the Cluster Log (Get-ClusterLog) for detailed RCM and RHS interactions leading up to the failure.
Identifying the resource(s) hosted by the specific RHS.exe instance that terminated (Process ID might be logged).
Isolating potentially problematic resources by configuring them to run in a separate monitor (SeparateMonitor=1).
Engaging the vendor of the resource DLL if it's a third-party component.
Analyzing memory dumps (e.g., using WinDbg) of the crashed RHS.exe process if available and configured.

Resource Health Checks and Failure Conditions

The Resource Host Subsystem (RHS) executes health checks (LooksAlive and IsAlive) defined by the Resource DLLs to monitor resource health. A failure reported by these checks triggers the cluster's recovery mechanisms. Understanding the specific actions performed during these checks and what constitutes a failure is crucial for troubleshooting.

Detailed IsAlive/LooksAlive Failure Conditions for Common Resource Types
Resource Type	LooksAlive Check Details	IsAlive Check Details
IP Address / IPv6 Address	Failure Condition: A query to the TCP/IP stack indicates the virtual IP address is not present or not bound to any local network interface. Key API(s) / Mechanism: `GetIpAddrTable` (for IPv4): Retrieves interface-to-IPv4 address mapping. `GetAdaptersAddresses` (for IPv4 and IPv6): Retrieves addresses associated with local adapters. Queries NetBT driver if NetBIOS is enabled for the resource. Listens for network interface failure notifications from the Cluster service. Key Manual Test Command(s): CMD (IPv4): `ipconfig /all` CMD (IPv6): `netsh interface ipv6 show address` PowerShell (IPv4/IPv6): `Get-NetIPAddress -IPAddress <VirtualIPAddress>` WMI Query (PowerShell): `Get-WmiObject -Class Win32_NetworkAdapterConfiguration -Filter "IPEnabled='TRUE'" \| ForEach-Object { $_.IPAddress } \| FindStr "<VirtualIPAddress>"`	Failure Condition: Same as LooksAlive. Key API(s) / Mechanism: Same as LooksAlive: `GetIpAddrTable` or `GetAdaptersAddresses`. NetBT checks if applicable. Key Manual Test Command(s): Same as LooksAlive: `ipconfig /all`, `netsh interface ipv6 show address`, `Get-NetIPAddress -IPAddress <VirtualIPAddress>`.
Network Name	Failure Condition: (1) Local NetBT query shows NetBIOS name not registered on any local interface, OR (2) If dynamic DNS is enabled, repeated attempts to verify/re-register DNS A/AAAA records fail. Key API(s) / Mechanism: NetBIOS Check (if enabled): `Netbios()` function with `NCBASTAT` command. DNS Check (if dynamic DNS enabled): `DnsQuery_W`: To verify A/AAAA records. `DnsModifyRecordsInSet_W` (or related): Status of DNS registration attempts is checked. Key Manual Test Command(s): NetBIOS Check: CMD: `nbtstat -n` CMD: `nbtstat -A <AssociatedClusterIPAddress>` PowerShell: `Resolve-NetBiosName -Name <NetworkName>` DNS Check: CMD: `nslookup <NetworkNameFQDN>` PowerShell: `Resolve-DnsName -Name <NetworkNameFQDN> -Type A` (and `-Type AAAA`) PowerShell: `Get-DnsClientCache \| Where-Object {$_.Entry -like "<NetworkNameFQDN>"}`	Failure Condition: Same as LooksAlive. Key API(s) / Mechanism: Re-executes NetBIOS queries (via `Netbios()` with `NCBASTAT`). Re-executes DNS checks (via `DnsQuery_W` and status of registration attempts). Key Manual Test Command(s): Same as LooksAlive: `nbtstat -n`, `nslookup <NetworkNameFQDN>`, `Resolve-DnsName -Name <NetworkNameFQDN>`.
Physical Disk	Failure Condition: SCSI Persistent Reservation (PR) check indicates the current node no longer holds the reservation, or communication with the disk via the storage stack fails. Key API(s) / Mechanism: `DeviceIoControl` with `IOCTL_STORAGE_PERSISTENT_RESERVE_IN`. Alternatively, `DeviceIoControl` with `IOCTL_SCSI_PASS_THROUGH` or `IOCTL_SCSI_PASS_THROUGH_DIRECT`. May check a status flag or state maintained by `ClusDisk.sys`. Key Manual Test Command(s): Specialized vendor storage utilities. Third-party tools (e.g., `ScsiCmdTool.exe`). Linux equivalent concept: `sg_persist --in --read-reservations /dev/sdX`. Cluster Log Analysis: `Get-ClusterLog` and search for PR messages. Legacy CMD: `cluster res "<DiskResourceName>" /priv`. PowerShell (Indirect): `Get-ClusterResource -Name "<DiskResourceName>" \| Get-ClusterParameter`.	Failure Condition: Basic file system operation (e.g., accessing root directory metadata) returns an error. Key API(s) / Mechanism: `CreateFileW`: To open the root directory of the volume. `GetVolumeInformationW` (or `GetVolumeInformationByHandleW`): To retrieve volume information. `FindFirstFileW` / `FindNextFileW` / `FindClose`: To list contents of the root directory. Key Manual Test Command(s): CMD: `dir <DriveLetterOfClusteredDisk>:\` PowerShell: `Get-ChildItem -Path <DriveLetterOfClusteredDisk>:\` PowerShell: `Test-Path -Path <DriveLetterOfClusteredDisk>:\`
File Share Witness (FSW)	Failure Condition: Attempt to perform a basic file system operation on the configured UNC path (e.g., check `witness.log` existence/access, list directory) results in a network, authentication, or file system error. Key API(s) / Mechanism: `CreateFileW`: To open/check existence of `witness.log` or cluster folder on share. `GetFileAttributesW` or `FindFirstFileW`: To check existence or list directory. Relies on underlying Windows networking stack (SMB client, DNS, authentication like Kerberos). `WNetAddConnection2A/W` (generally not needed by FSW resource DLL if system can resolve/connect using CNO). Key Manual Test Command(s): CMD: `dir \\FileServerName\WitnessShare\ClusterGUID\` CMD: `type \\FileServerName\WitnessShare\ClusterGUID\witness.log` PowerShell: `Test-Path -Path \\FileServerName\WitnessShare\ClusterGUID\witness.log` PowerShell: `Get-ChildItem -Path \\FileServerName\WitnessShare\ClusterGUID\`	Failure Condition: Same as LooksAlive. Key API(s) / Mechanism: Generally same as LooksAlive: `CreateFileW`, `GetFileAttributesW`, `FindFirstFileW`. Possibly more involved: e.g., `ReadFile` from `witness.log`, or attempt to create/delete a temporary file (using `CreateFileW` with `CREATE_NEW`, `WriteFile`, `DeleteFile`). Key Manual Test Command(s): Same as LooksAlive. PowerShell for temp file test (conceptual): `$witnessDir = "\\FileServerName\WitnessShare\ClusterGUID\" $tempFile = Join-Path -Path $witnessDir -ChildPath "test_alive.tmp" try { if (-not (Test-Path -Path $witnessDir -PathType Container)) { throw "Dir not found" } New-Item -Path $tempFile -ItemType File -ErrorAction Stop \| Out-Null Remove-Item -Path $tempFile -Force -ErrorAction Stop # Success } catch { # Failed }`
Generic Service (Applies also to DHCP, DTC, iSCSI Target, iSNS, MSMQ, WINS, VMMS, etc.)	Failure Condition: Query to Service Control Manager (SCM) reports the service state is not 'Running' or 'Start Pending'. Key API(s) / Mechanism: `OpenSCManagerW`: Connects to SCM. `OpenServiceW`: Obtains handle to the specific service. `QueryServiceStatus` or `QueryServiceStatusEx`: Retrieves current service status. Examines `dwCurrentState` for `SERVICE_RUNNING` or `SERVICE_START_PENDING`. `CloseServiceHandle`. Key Manual Test Command(s): CMD: `sc query <ServiceName>` PowerShell: `Get-Service -Name <ServiceName>`	Failure Condition: Same as LooksAlive. Key API(s) / Mechanism: Re-employs the exact same sequence of SCM API calls: `OpenSCManagerW`, `OpenServiceW`, `QueryServiceStatus` or `QueryServiceStatusEx`. Key Manual Test Command(s): Same as LooksAlive: `sc query <ServiceName>`, `Get-Service -Name <ServiceName>`.
Generic Application	Failure Condition: The process handle, obtained when the application was started by RHS and subsequently monitored, becomes signaled, indicating the application process has terminated. Key API(s) / Mechanism: Event-driven, not periodic polling. RHS monitors process handle (obtained during Online) using `WaitForSingleObject`. If `WaitForSingleObject` returns `WAIT_OBJECT_0`, process has terminated. `GetExitCodeProcess` may be called for diagnostic logging. Key Manual Test Command(s): CMD: `tasklist /FI "IMAGENAME eq <ApplicationExecutableName.exe>"` PowerShell: `Get-Process -Name <ApplicationNameWithoutExtension>` PowerShell (robust check): `if (Get-Process -Name "<AppName>" -ErrorAction SilentlyContinue) { # Running } else { # NOT running }`	Failure Condition: Same as LooksAlive. Key API(s) / Mechanism: Based on the signaled state of its process handle; effectively continuous monitoring. Key Manual Test Command(s): Same as LooksAlive: `tasklist /FI "IMAGENAME eq <ApplicationExecutableName.exe>"`, `Get-Process -Name <ApplicationNameWithoutExtension>`.
Virtual Machine	Failure Condition: (1) The associated Hyper-V Worker Process (`vmwp.exe`) for the VM is not running, OR (2) If guest heartbeating (via Hyper-V Integration Services) is enabled, the heartbeat signal from the guest OS is not received by the host. Key API(s) / Mechanism (vmwp.exe check): `OpenProcess`: To obtain handle to specific `vmwp.exe` instance (matching VM GUID). `WaitForSingleObject` on `vmwp.exe` process handle. `GetExitCodeProcess` if termination detected. Key API(s) / Mechanism (Guest Heartbeat check - if enabled): WMI Query to `root\virtualization\v2` namespace for `Msvm_HeartbeatComponent`. Identified by VM's GUID (`DeviceID` property). Inspects `OperationalStatus` property. Values like `OK (2)` indicate health; `No Contact (12)`, `Lost Communication (13)`, etc., indicate failure. Involves COM interfaces: `IWbemServices::ExecQuery`, `IWbemClassObject::Get`. Key Manual Test Command(s): vmwp.exe check (Host): PowerShell: `Get-VM -Name "MyVMName" \| Select-Object Name, State, Status, Uptime` PowerShell (Advanced): `$vm = Get-VM -Name "MyVMName"; if ($vm) { Get-Process -Name vmwp \| Where-Object {$_.CommandLine -like "$($vm.VMId.Guid)"} }` Guest Heartbeat check (Host): PowerShell: `Get-VM -Name "MyVMName" \| Select-Object Name, Heartbeat` PowerShell (Direct WMI): `$vmGuid = (Get-VM -Name "MyVMName").Id.Guid; Get-WmiObject -Namespace root\virtualization\v2 -Class Msvm_HeartbeatComponent \| Where-Object {$_.DeviceID -like "$vmGuid"} \| Select-Object PSComputerName, ElementName, OperationalStatus, StatusDescriptions`	Failure Condition: Same as LooksAlive. Key API(s) / Mechanism: Re-evaluates `vmwp.exe` process status (`WaitForSingleObject`). Re-evaluates guest heartbeat via WMI queries for `Msvm_HeartbeatComponent`. Key Manual Test Command(s): Same as LooksAlive: `Get-VM -Name "MyVMName" \| Select-Object Name, State, Status, Heartbeat, Uptime`, WMI query for `Msvm_HeartbeatComponent.OperationalStatus`.
Virtual Machine Configuration	Failure Condition: The Hyper-V Virtual Machine Management Service (VMMS) reports an inability to find, load, or validate the VM's configuration files. Key API(s) / Mechanism: WMI Query to `root\virtualization\v2` namespace for `Msvm_VirtualSystemSettingData` associated with VM's GUID. Checks existence and accessibility of this WMI object. `OperationalStatus` property of associated `Msvm_ComputerSystem` WMI object might also reflect configuration errors. VMMS is responsible for parsing/validating; WMI reflects VMMS's state. Key Manual Test Command(s): PowerShell (High-Level): `Get-VM -Name "MyVMName"` PowerShell (WMI Query): `$vm = Get-VM -Name "MyVMName"; if ($vm) { $vmGuid = $vm.Id.Guid; Get-WmiObject -Namespace root\virtualization\v2 -Class Msvm_VirtualSystemSettingData \| Where-Object {$_.InstanceID -like "$vmGuid"} }` Hyper-V Event Log Check: `Get-WinEvent -LogName Microsoft-Windows-Hyper-V-VMMS-Admin \| Where-Object {$_.Message -like "MyVMName" -and ($_.LevelDisplayName -eq "Error" -or $_.LevelDisplayName -eq "Critical")}`	Failure Condition: Same as LooksAlive. Key API(s) / Mechanism: Re-attempts WMI query for `Msvm_VirtualSystemSettingData` object or checks related status properties on `Msvm_ComputerSystem` as reported by VMMS. Key Manual Test Command(s): Same as LooksAlive: `Get-VM -Name "MyVMName"`, WMI query for `Msvm_VirtualSystemSettingData`, Hyper-V VMMS event logs.
Scale-Out File Server (SOFS)	Failure Condition: No specific direct check by the SOFS resource itself; health is derived from the status of its dependencies, primarily the Distributed Network Name (DNN) for client access and the underlying Cluster Shared Volumes (CSVs) for data storage. Key API(s) / Mechanism (Dependencies): DNN Health: (See Network Name APIs) `Netbios()` with `NCBASTAT`, `DnsQuery_W`, status of `DnsModifyRecordsInSet_W`. CSV Health: WMI Queries to `root\MSCluster` for `MSCluster_ClusterSharedVolume` or `root\Microsoft\Windows\Cluster` for `MSFT_ClusterSharedVolume`. Key properties: `State`, `FaultState`. Includes underlying Physical Disk health checks. Key Manual Test Command(s) (Dependencies): DNN Health: (See Network Name commands) `nbtstat -n`, `Resolve-DnsName <SOFS_DNN_Name>`, `nslookup <SOFS_DNN_Name>`. CSV Health: PowerShell: `Get-ClusterSharedVolume -Name "MySOFS_CSV_Name1" \| Select-Object Name, State, OwnerNode, FaultState` PowerShell: `Get-ClusterSharedVolume \| Format-Table Name, OwnerNode, State, FaultState` Check underlying disk resources (e.g., `Get-ClusterResource \| Where-Object {$_.ResourceType -eq "Physical Disk" -and $_.OwnerGroup -like "Cluster Group"}`). Legacy CMD: `cluster res`	Failure Condition: The primary SOFS resource monitors health messages from its clone instances. It fails if a critical number of these clone instances report failures (e.g., due to CSV access problems or service failures on clone nodes). Key API(s) / Mechanism: Internal cluster service (`ClusSvc.exe`) and File Server resource DLL (`FsClusRes.dll`) messaging. Clone instances check locally: CSV Accessibility: WMI queries to `MSFT_ClusterSharedVolume` (`State`, `FaultState`). Local Service Health: SCM API calls (`OpenSCManagerW`, `OpenServiceW`, `QueryServiceStatus/Ex`) for services like `LanmanServer`. Network Interface Health: `GetAdaptersAddresses`. Key Manual Test Command(s) (Checking Individual Clones/CSVs influencing clone health): Check CSV Health/Accessibility (on each SOFS node): PowerShell: `Get-ClusterSharedVolume \| Select Name, OwnerNode, State, FaultState` PowerShell: `Test-Path -Path "C:\ClusterStorage\MySOFS_CSV_Name1\testfile.txt"` SMB Witness/Multichannel: `Get-SmbWitnessClient -ComputerName <NodeName>`, `Get-SmbMultichannelConnection -ComputerName <NodeName>`. Event Logs (on each node): `Get-WinEvent` for CSV, disk, or SMB errors. Server service (`LanmanServer`) on each SOFS node: `Get-Service -Name LanmanServer -ComputerName <NodeName>`. Overall SOFS Resource State: `Get-ClusterResource -Name "<SOFS_ResourceName>" \| Select-Object Name, State, OwnerNode`.

It's important to note that a resource failing its LooksAlive or IsAlive check initiates the resource's configured failure policy, managed by the RCM. This might involve restarts on the current node, running in a separate monitor, or ultimately failing over the entire resource group to another node. Understanding the specific failure condition helps pinpoint whether the issue lies with the resource itself, its dependencies (like storage or network), or the underlying node infrastructure.

1.3. Resource DLLs (ResDLLs) Function and API

Resource DLLs (ResDLLs) are dynamic-link libraries that encapsulate the specific logic required to manage a particular type of cluster resource.³⁸ They act as the bridge between the generic resource management framework provided by the Cluster Service and the unique requirements of diverse resources such as physical disks, IP addresses, network names, file shares, print spoolers, or complex applications like Microsoft SQL Server.²⁴ The core function of a ResDLL is to implement the standard set of operations that the Cluster Service needs to perform on a resource: bringing it online, taking it offline gracefully, terminating it abruptly, and performing periodic health checks.³⁷

To ensure interoperability, ResDLLs must adhere to a standard interface defined by the Failover Cluster API, exposing a specific set of entry-point functions.³⁸ While the full list is extensive, the most critical entry points include:

Open: Called by RHS when the resource is being prepared to come online. Allows the ResDLL to initialize context for the resource instance.
Close: Called by RHS when the resource is being taken offline. Allows the ResDLL to clean up context.
Online: Contains the resource-specific code to start the resource and make it operational on the current node. For an IP Address resource, this involves configuring the IP on the network adapter; for a SQL Server resource, it involves starting the SQL Server service.
Offline: Contains the resource-specific code to gracefully stop the resource and release any held system resources.
Terminate: Called for an immediate, potentially non-graceful shutdown of the resource, often used when Offline fails or times out.
LooksAlive: Performs a quick, lightweight check to determine if the resource appears to be running and responsive. This check is polled frequently by RHS (e.g., every 5 seconds is a common default⁴²) and should be designed to have minimal impact on the resource's performance. A failure might trigger an IsAlive check.
IsAlive: Performs a more thorough verification to confirm that the resource is fully functional and operational. This check is polled less frequently (e.g., every 60 seconds is a common default⁴²) or may be triggered if LooksAlive returns failure. The logic is resource-specific (e.g., connecting to a database, checking service status). A definitive failure reported by IsAlive typically initiates the resource's configured restart or failover policy within the cluster.³⁷

In addition to these entry points, ResDLLs can utilize callback functions provided by the Resource Monitor (RHS).³⁸ These callbacks allow the ResDLL to proactively report status changes, events, or properties back to the Cluster Service without waiting for a poll.

Microsoft provides built-in ResDLLs (often consolidated within clusres.dll) for common resource types like Physical Disk, IP Address, Network Name, and File Share.³⁹ More complex, cluster-aware applications provide their own custom ResDLLs. For instance, SQL Server uses sqsrvres.dll for Failover Cluster Instances (FCIs)¹³ and hadrres.dll for managing Always On Availability Group resources.²⁴ This extensible architecture allows third parties to integrate custom applications and devices into the WSFC high-availability framework.

1.4. Cluster API (ClusAPI.dll) Interface

The Cluster API, implemented in ClusAPI.dll, serves as the primary programmatic interface for external entities to manage and interact with a Windows Server Failover Cluster.²⁵ It exposes a comprehensive set of functions that allow management tools and cluster-aware applications to query the cluster's state, manipulate cluster objects (nodes, resources, groups, networks), control resource states, initiate administrative actions like failover, and manage cluster configuration settings such as quorum.⁴¹

This API is the foundation upon which standard Microsoft management tools are built, including the graphical Failover Cluster Manager snap-in and the FailoverClusters PowerShell module.²⁵ Cluster-aware applications can also leverage ClusAPI.dll to integrate more deeply with the cluster, for example, to monitor the state of their own resources or to influence failover decisions. WMI (Windows Management Instrumentation) providers for failover clustering also likely utilize the Cluster API underneath or provide an alternative management interface.²⁵

ClusAPI.dll contains a large number of exported functions (over 200 reported for some versions⁴³). Core functionalities include:

Establishing a connection to a cluster using OpenCluster or OpenClusterEx, specifying the cluster name or NULL for the local cluster. This returns a cluster handle (HCLUSTER).²⁵
Closing the connection and releasing resources using CloseCluster.²⁵
Enumerating cluster objects like nodes (OpenClusterNode, GetClusterNodeState), resources (OpenClusterResource, GetClusterResourceState), groups (OpenClusterGroup, GetClusterGroupState), and networks (OpenClusterNetwork, GetClusterNetworkState).
Reading and modifying properties of cluster objects (e.g., GetClusterResourceNetworkName, SetClusterResourceName).
Controlling resource states (e.g., OnlineClusterResource, OfflineClusterResource).
Initiating group movement (failover) (MoveClusterGroup).
Managing quorum configuration (SetClusterQuorumResource).

Interaction with the API typically begins by obtaining a cluster handle (HCLUSTER) via OpenCluster.²⁵ Subsequent operations often use this primary handle or specific handles derived from it for nodes (HNODE), resources (HRESOURCE), groups (HGROUP), etc. The API distinguishes between connections made via RPC (for remote or local connections specifying a name) and LPC (for local connections using NULL).²⁵ Using handles obtained from different connection types (RPC vs. LPC) or contexts together is not recommended and can lead to unpredictable behavior.²⁵ Proper error handling involves checking function return values and using GetLastError for detailed error codes.²⁵

1.5. Cluster Database Architecture and Replication

The Cluster Database (ClusDB) is the central repository holding the authoritative configuration state for the entire failover cluster.² It stores critical information including node membership details, resource definitions and their properties, group (role) configurations, resource dependencies, network configurations and roles, and quorum settings.² Maintaining the consistency and availability of this database across all nodes is paramount for cluster operation.

Architecturally, the ClusDB exists in multiple forms and locations. The active, authoritative copy resides in the memory of the Cluster Service (ClusSvc.exe) process on each active node. This in-memory copy is loaded at service startup from a checkpoint file, typically named CLUSDB, located in the %windir%\Cluster directory.¹¹ For integration with registry-based tools and potentially for certain configuration lookups, the Cluster Service also loads the database content into a dedicated registry hive located at HKEY_LOCAL_MACHINE\Cluster while the service is running.¹¹ This registry hive is volatile and disappears when the Cluster Service stops.¹¹ If a disk witness is configured and owned by a particular node, that node might also load a secondary hive, HKEY_LOCAL_MACHINE\0.Cluster, reflecting the database copy stored on the witness disk.¹¹

Ensuring that the ClusDB remains identical and up-to-date across all participating nodes is achieved through a sophisticated replication mechanism.² WSFC employs a distributed metadata and notification system where changes made on one node are propagated to all other active nodes. The Global Update Manager (GUM) component within the Cluster Service is responsible for coordinating these cluster-wide state updates.¹² GUM ensures that updates are applied atomically and consistently across the cluster. To track the version and ensure convergence on the correct configuration, especially during node joins or recovery, WSFC utilizes a mechanism involving the PaxosTag value, a REG_DWORD found under HKLM\Cluster.¹¹ This tag is incremented with each configuration change and replicated along with the change data, allowing nodes to identify and adopt the latest configuration state.¹¹

The GUM can operate in different modes, particularly from Windows Server 2012 R2 onwards, which affects the replication and commit semantics.¹² The default mode for most workloads, "All (write) / Local (read)", prioritizes strong consistency by requiring acknowledgment from all active nodes before an update is considered committed. Reads are then performed from the local node's guaranteed-consistent copy. While ensuring consistency, this mode can be sensitive to latency from any single node. For Hyper-V clusters, the default is "Majority (read and write)". This mode only requires acknowledgment from a majority of active nodes to commit a write, potentially improving performance in high-latency networks. However, because not all nodes might have processed the latest update immediately after commit, reads become more complex. When reading configuration data, the cluster must query a majority of nodes, compare timestamps associated with the data, and use the data with the latest timestamp to ensure consistency.¹² A third mode, "Majority (write) / Local (read)", also commits on majority acknowledgment but performs reads locally, offering write performance but risking reads of potentially stale data if the local node wasn't part of the last majority commit. This highlights an architectural flexibility allowing administrators to tune the trade-off between consistency and performance based on workload requirements and network characteristics.

Persistence is achieved through checkpointing the in-memory database to the CLUSDB file. Transactional logging (implied by the presence of .blf and .container files alongside CLUSDB¹¹) is likely used to ensure durability of changes before they are fully checkpointed, protecting against data loss in case of unexpected service termination. Direct modification of the HKLM\Cluster registry hive is strongly discouraged, as it bypasses the GUM replication and PaxosTag mechanisms, leading to cluster inconsistencies and potential instability.¹¹

1.6. Node Management and State Transitions

Node management encompasses the processes by which servers join, participate in, and leave the failover cluster, along with the various operational states a node can occupy. Nodes typically join the cluster during the initial setup phase using the Create Cluster wizard or New-Cluster PowerShell cmdlet, or they can be added later to an existing cluster via the Add Node wizard or Add-ClusterNode cmdlet.³² Leaving the cluster can occur through administrative eviction (Remove-ClusterNode) or due to a node failure. The joining process involves authenticating the node to the cluster, synchronizing the cluster configuration database (ClusDB), and updating the membership state across all existing nodes. Eviction requires not only removing the node from the cluster configuration but also performing cleanup operations on the evicted node itself to remove residual cluster settings.⁴³

Cluster nodes transition through several distinct states, reflecting their operational status and participation level within the cluster. These states are visible through management tools like Failover Cluster Manager and PowerShell cmdlets such as Get-ClusterNode.⁴

Table 1: WSFC Node States
State Name	Description	Participates in Quorum?	Accepts Roles?	Failover Target?	Typical Trigger/Transition
Joining	Node is actively in the process of joining the cluster and synchronizing its configuration database with the existing members.⁶	No (Initially)	No	No	Add Node operation initiated. Transitions to Up upon successful completion.
Up	Node is fully operational, healthy, communicating with other nodes, and actively participating in the cluster.⁴	Yes (Typically)	Yes	Yes	Successful join/resume, service start. Transitions to Down/Paused/Quarantined on event.
Down	Node is unreachable over the network (missed heartbeats) or its Cluster Service (`ClusSvc.exe`) is stopped or has failed.⁶	No	No	No	Heartbeat timeout, service stop/crash, network failure. Transitions to Up on recovery.
Paused	Node is administratively taken out of active participation, often for maintenance.⁴	Yes	No	No	Drain Roles/Pause Node operation.⁵ Transitions to Up upon Resume Node operation.⁵
Quarantined	Node automatically isolated by the cluster after repeatedly failing and rejoining ("flapping") within a short period.⁴⁸	No	No	No	Repeated node failures/rejoins. Transitions to Down/Up after timeout or manual clear.⁴⁸
Note: Quorum participation for 'Up' nodes can be dynamically adjusted by Dynamic Quorum.⁴⁹

The Paused state is crucial for planned maintenance. When a node is paused, its roles are typically "drained" - live-migrated (for VMs) or failed over to other active nodes – before the node enters the paused state.⁵ A paused node remains a member of the cluster and participates in quorum voting but cannot host any clustered roles or be selected as a failover target.⁴ Resuming the node brings it back to the 'Up' state, making it eligible to host roles again.⁴

The Quarantined state is an automatic stability mechanism introduced to handle "flapping" nodes.⁴⁸ If a node repeatedly fails (goes Down) and rejoins (goes Up) in quick succession, the cluster can automatically place it in quarantine for a configurable duration (default is 2 hours). While quarantined, the node is effectively offline from the cluster's perspective – it cannot host roles or rejoin active membership. This prevents the instability caused by the flapping node from affecting the rest of the cluster. The node can be manually forced out of quarantine using Start-ClusterNode -ClearQuarantine or will automatically attempt to rejoin after the quarantine period expires.⁴⁸

2. Operating System Integration

Windows Server Failover Clustering is not an isolated application but a deeply integrated feature of the Windows Server operating system. It relies on specialized kernel-mode drivers for critical low-level functions related to storage and networking, interacts extensively with core user-mode OS services, and utilizes the Windows Registry for configuration storage. This tight integration is fundamental to its ability to manage hardware resources and provide high availability.

2.1. Kernel-Mode Components (Drivers)

WSFC leverages several key kernel-mode drivers to interface directly with hardware and provide essential functionalities that cannot be efficiently or reliably handled in user mode.

Table 2: Key WSFC Kernel Drivers
Driver File	Primary Function	Key Responsibilities	Interaction Points (OS/Hardware)
`ClusDisk.sys`	Cluster Disk Management	Manages shared disk access, performs disk arbitration using SCSI Persistent Reservations (PR), marks LUNs offline on passive nodes, presents disks to the file system on owner.⁵⁰	Storage Stack (`Disk.sys`, `Storport.sys`, HBA drivers), SCSI/SAS/FC/iSCSI Storage Devices, PnP Manager, NTFS/ReFS
`NetFT.sys`	Network Fault Tolerance & Heartbeat	Creates virtual adapter, manages heartbeat transmission/reception (UDP 3343), detects node failures via missed heartbeats, routes internal cluster traffic, handles network path failover.⁵¹	Network Stack (TCP/IP), Physical NIC Drivers, Cluster Service (`ClusSvc`)
`CsvFs.sys` / `CsvFIT.sys`	Cluster Shared Volumes File System / Filter Driver	Implements CSVFS, manages simultaneous multi-node access, coordinates metadata updates via Coordinator Node, handles Direct vs. Redirected I/O (via SMB), presents `\ClusterStorage` namespace.⁵³	File System Stack (NTFS/ReFS), Storage Stack (`ClusDisk.sys`/`Disk.sys`), Network Stack (SMB/TCP/IP), Cluster Service (`ClusSvc`)

ClusDisk.sys (Cluster Disk Driver): This driver is essential for managing traditional shared storage (like Fibre Channel or iSCSI LUNs) presented to multiple nodes. Its most critical role is implementing storage fencing using SCSI-3 Persistent Reservations (PRs).⁵⁰ When a node takes ownership of a Physical Disk resource, ClusDisk.sys places a persistent reservation on the corresponding LUN, preventing any other node from writing to it, thus avoiding data corruption.⁵⁰ It continuously renews this reservation (e.g., every 3 seconds⁵⁰). During failover or split-brain scenarios, nodes use ClusDisk.sys to arbitrate for ownership by attempting to break existing reservations (via SCSI RESET or PR PREEMPT commands) and establish their own.⁵⁰ ClusDisk.sys also interacts with the OS storage stack to mark shared disks as 'offline' or 'reserved' on non-owning nodes, preventing accidental access, while making the disk available to the file system (NTFS/ReFS) on the owning node.⁵⁰ It identifies cluster-managed disks based on signatures stored in the registry.⁵⁰
NetFT.sys (Network Fault-Tolerant Driver): This driver provides the resilient communication backbone for the cluster.⁵² It creates a hidden virtual network adapter ("Microsoft Failover Cluster Virtual Adapter") which abstracts the physical network interfaces.⁵¹ NetFT.sys is responsible for sending and receiving UDP heartbeat packets (default port 3343) over all cluster-enabled networks to monitor node health.²⁶ It determines node failure based on missed heartbeats according to configurable delay and threshold settings.³⁷ Beyond heartbeats, NetFT.sys routes all internal cluster communication (like GUM updates and RPC calls between ClusSvc instances).⁵¹ It discovers multiple network paths between nodes and uses internal metrics (prioritizing dedicated cluster networks) to select the optimal path.⁵¹ If the primary path fails, NetFT.sys automatically and transparently reroutes traffic over an alternative path, providing network fault tolerance to the higher-level Cluster Service.⁵¹ The virtual adapter uses non-routable APIPA and IPv6 link-local addresses for this internal communication.⁵¹
CsvFs.sys / CsvFIT.sys (CSV File System / Filter Drivers): These drivers implement the Cluster Shared Volumes feature.⁵³ CsvFs.sys acts as a pseudo-file system, while the newer CsvFIT.sys operates as a file system mini-filter driver.⁵³ Together, they enable simultaneous read/write access to a single NTFS or ReFS volume from all nodes by managing I/O flow and metadata consistency. They present the shared volumes under the consistent C:\ClusterStorage\ namespace on all nodes.⁵⁴ These drivers determine whether I/O should be sent directly to the storage (Direct I/O) or redirected over the network (via SMB) to the designated Coordinator Node for that volume (Redirected I/O).⁵³ They coordinate with the Coordinator Node for all metadata updates to ensure file system integrity across all accessing nodes.⁵³

The interplay between these drivers forms a critical abstraction layer. NetFT.sys shields the Cluster Service from the complexities of network path failures, while ClusDisk.sys and CsvFs.sys/CsvFIT.sys abstract the intricacies of shared storage access, arbitration, and distributed file system management. This allows the core Cluster Service (ClusSvc.exe) to operate at a higher level, managing logical resources and nodes without needing deep knowledge of underlying network routing or storage protocols like SCSI-3 PR.

2.2. User-Mode / OS Services Interaction

WSFC integrates with several standard user-mode Windows services to perform its functions and report its status:

Event Log: WSFC is a heavy user of the Windows Event Log service. It logs a wide range of events, including node status changes, resource state transitions, failures, warnings, and informational messages, providing essential diagnostic and auditing capabilities. Key logs include the standard System log and dedicated logs under Applications and Services Logs > Microsoft > Windows > FailoverClustering (such as Operational and Diagnostic).²⁶ Specific Event IDs (e.g., 1135 for node removal, 1069 for resource failure, 1177 for quorum loss) are crucial for troubleshooting. Cluster-wide event channel logging can be configured for enhanced diagnostics.⁶²
Performance Monitor: WSFC registers a rich set of performance counter objects and instances with the Performance Monitor service.⁶³ These allow real-time and historical monitoring of cluster health and performance metrics. Important objects include Cluster Resource, Cluster Network, Cluster Shared Volumes (multiple counters for different aspects like IO, redirection, state), Physical Disk (especially relevant for CSV), and SMB-related counters (SMB Client Shares, SMB Server Shares) used for monitoring redirected CSV traffic.⁶³
Service Control Manager (SCM): The Cluster Service (ClusSvc.exe) itself runs as a Windows service managed by the SCM (services.exe).⁶⁵ The SCM is responsible for starting and stopping the Cluster Service on each node during boot-up or administrative actions.⁶⁸ Furthermore, many clustered applications (like SQL Server, Exchange services) are also implemented as Windows services. WSFC interacts with the SCM, likely via the Cluster API or internal mechanisms, to start and stop these dependent services as part of bringing cluster resources online or taking them offline.
Security Subsystem (LSASS): Authentication is critical for cluster operations involving Active Directory. WSFC interacts with the Local Security Authority Subsystem Service (lsass.exe) for these functions. The Cluster Service account (represented by the CNO in AD) must authenticate to the domain.³ Network Name resources (CAPs) rely on associated Virtual Computer Objects (VCOs) in AD for their identity, and client connections to these CAPs involve Kerberos or NTLM authentication, mediated by LSASS.³ LSASS manages the underlying credential handling, ticket granting (for Kerberos), and protocol negotiation.³¹
Plug and Play (PnP) Manager: The PnP Manager is responsible for detecting hardware additions and removals and configuring devices.⁷⁴ When shared storage LUNs are presented to cluster nodes, the PnP manager detects them. However, unlike typical PnP devices, WSFC exerts explicit control over these shared resources. The ClusDisk.sys driver likely interacts with the PnP manager and the storage stack to identify and claim LUNs based on the cluster configuration (specifically, the disk signatures stored in the registry⁵⁰) rather than allowing automatic PnP assignment. PnP events related to the arrival or removal of storage devices are relevant inputs to the cluster's storage management logic, ensuring the cluster is aware of the available physical resources.

This deep integration means WSFC's health is intrinsically linked to the health of the underlying OS services. A failure or misconfiguration in critical areas like Active Directory/DNS (impacting LSASS and authentication), the network stack (impacting NetFT and heartbeats), the storage stack (impacting ClusDisk and PnP), or even the Event Log service can manifest as cluster instability or failure. Therefore, maintaining overall OS health is a prerequisite for a stable and reliable failover cluster.

2.3. Windows Registry Integration (HKLM\Cluster)

The Windows Registry serves as the storage location for the active, loaded cluster configuration database on each node while the Cluster Service is running.¹¹

The primary location for this loaded configuration is the HKEY_LOCAL_MACHINE\Cluster hive.¹¹ This hive is dynamically loaded into the registry by ClusSvc.exe when the service starts and is unloaded when the service stops.¹¹ It mirrors the structure and content of the persistent CLUSDB file found in %windir%\Cluster.

Key subkeys within HKLM\Cluster typically include:

Nodes: Contains subkeys for each server participating in the cluster, holding node-specific properties and state information.
Resources: Contains subkeys for every cluster resource, uniquely identified by a Globally Unique Identifier (GUID).⁴⁰ Each resource's subkey contains standard values like Name (REG_SZ, the friendly name displayed in management tools)⁴⁰ and potentially resource type information. Crucially, under each resource GUID key, a Parameters subkey exists.⁴⁰ This Parameters subkey stores resource-specific configuration data needed by the corresponding ResDLL to manage the resource. For example, a SQL Server FCI resource would store its InstanceName and VirtualServerName here.⁴⁰
Groups: Contains subkeys for each cluster group (often representing a clustered role or application), defining the resources that belong to the group, dependencies within the group, and group-level properties (like preferred owners).
Networks: Contains subkeys for each discovered cluster network, storing properties like the network role, metric, and associated IP subnets.
Quorum: Stores details about the configured quorum model and witness (if any).
PaxosTag: A REG_DWORD value directly under HKLM\Cluster that serves as a version counter for the database, incremented with each configuration change and used by the GUM replication mechanism to ensure consistency.¹¹

In configurations using a disk witness, the node currently owning the witness resource might also load a second hive, HKEY_LOCAL_MACHINE\O.Cluster.¹¹ This hive reflects the copy of the cluster database stored on the witness disk itself, used as part of the quorum and consistency mechanism.¹¹

It is critical to understand that the HKLM\Cluster hive represents a loaded, in-memory view of the configuration. Direct modification of keys within this hive using tools like Regedit is strongly discouraged.¹¹ Such changes do not update the persistent CLUSDB file correctly, nor do they trigger the necessary GUM replication process or update the PaxosTag. This will inevitably lead to inconsistencies between nodes and likely cause cluster instability or failure.¹¹ All configuration changes must be made through supported interfaces like Failover Cluster Manager, PowerShell cmdlets, or the Cluster API, which ensure changes are properly replicated and persisted.

3. Networking Architecture

The networking architecture of WSFC is fundamental to its operation, providing the communication pathways for node health monitoring (heartbeats), cluster state synchronization, client access, and potentially storage traffic in certain configurations like CSV or S2D. It features specialized components like the NetFT driver and virtual adapter to ensure resilience against network failures.

3.1. Cluster Networks Roles and Metrics

Upon formation or when network interfaces are added, WSFC automatically discovers the network interfaces and logical subnets connecting the cluster nodes. Each discovered network can be configured with a specific role, dictating how the cluster utilizes it⁵¹:

Role 0: Disabled for Cluster Communication: The cluster will not use this network for any internal communication, including heartbeats or CSV traffic. It might be used solely for other purposes like dedicated storage replication or backup traffic, isolated from cluster operations.
Role 1: Enabled for Cluster Communication Only: This network is designated for internal cluster traffic. This includes node-to-node heartbeats, cluster database synchronization (GUM updates), internal RPC calls, and potentially Cluster Shared Volume (CSV) traffic if configured. Client connections are not permitted over networks with this role. These are often referred to as "private" or "internal" cluster networks.
Role 3: Enabled for Client and Cluster Communication: This network serves a dual purpose. It allows clients to connect to clustered roles (via CAPs) hosted on the cluster nodes. It also serves as a backup path for internal cluster communication (heartbeats, GUM, CSV) if all Role 1 networks become unavailable. These are typically the "public" or "management" networks.

To manage and prioritize the paths for internal cluster communication, the Network Fault Tolerant driver (NetFT.sys) assigns a metric value to each cluster-enabled network (Roles 1 and 3).⁵¹ Unlike standard TCP/IP metrics, these are internal cluster metrics where a lower value indicates a higher priority. NetFT.sys automatically calculates these metrics based primarily on the assigned role and secondarily on the physical characteristics of the network adapter, such as its speed, RDMA capability (iWARP/RoCE), and Receive Side Scaling (RSS) support.

Table 3: Cluster Network Roles and Default Metrics
Role ID	Role Name	Allowed Traffic	Default Base Metric Range (Approx.)	Primary Use Case
0	Disabled	None (by cluster)	N/A	Dedicated non-cluster traffic (e.g., storage replication, backup)
1	Cluster Communication Only	Heartbeats, GUM, Internal RPC, CSV	3xxxx - 7xxxx⁵¹	Dedicated internal cluster network(s) for heartbeats and potentially CSV traffic
3	Cluster and Client Communication	Client Access, Heartbeats (backup), GUM (backup), Internal RPC (backup), CSV (backup)	7xxxx - Bxxxx⁵¹	Public network for client connections and management
Note: Exact metric values vary based on adapter speed and features; lower values indicate higher priority.

Role 1 networks receive significantly lower base metrics (higher priority) than Role 3 networks, ensuring that critical internal cluster communication preferentially uses the dedicated internal networks when available. Faster adapters with features like RDMA will have their metrics further reduced, increasing their priority. The Get-ClusterNetwork | ft Name, Metric PowerShell command displays the calculated metrics for each network.⁵¹ If multiple networks have metrics within a close range (specifically, less than 16 apart), NetFT.sys can potentially leverage SMB Multichannel to distribute certain types of cluster traffic (like CSV or S2D storage traffic) across them.⁵¹

3.2. Heartbeat Mechanism

The heartbeat mechanism is the primary method used by WSFC to monitor the health and reachability of cluster nodes and maintain an accurate view of cluster membership.²

Heartbeats are implemented as UDP unicast packets exchanged periodically between every pair of active nodes over all networks enabled for cluster communication (Roles 1 and 3).⁵¹ The default destination port for these UDP packets is 3343.⁵¹ While UDP is used for the periodic keepalives, TCP port 3343 is also utilized for certain cluster operations, notably during the node join process and potentially for reliable sequenced communication streams between nodes.⁵¹

The frequency of heartbeat transmission and the tolerance for missed heartbeats are governed by four key cluster-wide properties, configurable via PowerShell³⁷:

SameSubnetDelay: The interval (in milliseconds) between sending heartbeats to nodes on the same subnet.
SameSubnetThreshold: The number of consecutive heartbeats that can be missed from a node on the same subnet before that node is considered 'Down'.
CrossSubnetDelay: The interval (in milliseconds) between sending heartbeats to nodes on different subnets.
CrossSubnetThreshold: The number of consecutive heartbeats that can be missed from a node on a different subnet before that node is considered 'Down'.

Table 4: Default Heartbeat Settings (Windows Server 2016 and later)
Setting Name	Default Value	Description
SameSubnetDelay	1000 (ms)	Interval between heartbeats to nodes on the same IP subnet.
SameSubnetThreshold	10	Number of missed heartbeats before declaring same-subnet node down.
CrossSubnetDelay	1000 (ms)	Interval between heartbeats to nodes on different IP subnets.
CrossSubnetThreshold	20	Number of missed heartbeats before declaring cross-subnet node down.
RouteHistoryLength	40	(Win2012+) Number of heartbeat route events logged (recommend 2x threshold)
References: ³⁷

Failure detection occurs when a node fails to receive the expected number of heartbeats (defined by the threshold) within the expected timeframe (determined by the delay) from another specific node over any available network path.⁶⁰ If all paths to a node appear unresponsive based on missed heartbeats, the detecting node declares the remote node 'Down'. This triggers a cluster membership recalculation (potentially involving quorum changes) and initiates failover procedures for any roles hosted on the failed node. Event ID 1135 is typically logged in the System Event Log when a node is removed from active membership due to heartbeat failure.⁸

Heartbeat traffic itself is generally lightweight, but the mechanism is highly sensitive to network latency and packet loss.⁵¹ Excessive latency can delay heartbeat acknowledgments, potentially leading to false positives where nodes are incorrectly declared down. The default thresholds in Windows Server 2016 and later (10 seconds for same subnet, 20 seconds for cross-subnet) are more relaxed compared to earlier versions (which defaulted to 5 seconds), providing better tolerance for transient network issues often seen in virtualized or cloud environments.⁶¹ However, tuning these thresholds involves a direct trade-off: increasing the thresholds makes the cluster more resilient to temporary network glitches but slows down the detection of genuine hard failures, potentially increasing application downtime before failover completes.⁶⁰ It's generally recommended not to exceed thresholds that would cause client timeouts, often around 20 seconds.⁶¹

The NetFT.sys driver is responsible for the low-level management of sending heartbeats across all configured network paths, receiving and acknowledging heartbeats from peers, tracking missed packets, and notifying the Cluster Service when thresholds are exceeded.⁵¹

3.3. Cluster IP Address Resource Management

A Cluster IP Address resource provides a highly available virtual IP address (VIP) that serves as a stable network endpoint for clients connecting to a clustered role (application or service). This abstracts the physical IP addresses of the individual nodes hosting the role.

The IP Address resource is managed as a standard cluster resource type, with its online and offline logic implemented within a built-in ResDLL (typically part of clusres.dll). When the Cluster Service brings an IP Address resource online on a specific node, the ResDLL interacts with the underlying Windows TCP/IP stack to add the configured virtual IP address to the appropriate network interface (selected based on the cluster network specified in the resource's properties).

A critical part of the online process, especially during failover, is updating the network's address resolution caches.⁸⁴ Upon successfully adding the IP address, the owning node's TCP/IP stack (directed by the cluster software/ResDLL) broadcasts specific network packets:

For IPv4 addresses, it sends gratuitous ARP (Address Resolution Protocol) requests.⁸⁴
For IPv6 addresses, it sends unsolicited Neighbor Advertisement messages (part of Neighbor Discovery protocol).

These packets announce the association between the virtual IP address and the MAC (Media Access Control) address of the network interface on the currently active node.⁸⁴ Devices on the same Layer 2 network segment (including client machines and routers/default gateways) receive these broadcasts and are mandated by the respective protocols (ARP RFC 826, ND RFC 4861) to update their local ARP or Neighbor caches.⁸⁴ This cache update effectively redirects network traffic destined for the virtual IP address to the physical port of the node that now owns the resource.⁸⁵ This ARP/ND update mechanism is fundamental for enabling seamless client redirection during failover without requiring manual reconfiguration or long DNS propagation delays.

During the online process, the cluster also uses ARP requests to perform duplicate address detection, ensuring the configured IP address is not already in use on the network before bringing the resource fully online.⁸⁴ Cluster IP Address resources typically serve as dependencies for Cluster Network Name resources.²¹

3.4. Cluster Network Name Resources (Client Access Points - CAPS)

A Cluster Network Name resource provides a highly available network name (both NetBIOS and DNS) that clients use to connect to a clustered role. It works in conjunction with one or more Cluster IP Address resources to form a Client Access Point (CAP), presenting a stable name and address regardless of which node is currently hosting the role.

The management of Network Name resources is deeply integrated with Active Directory Domain Services (AD DS).³ When a Network Name resource is configured for a CAP, the cluster typically attempts to automatically create a corresponding computer object in AD DS.³ This object is known as a Virtual Computer Object (VCO).⁷⁰ By default, the VCO is created in the same Organizational Unit (OU) where the Cluster Name Object (CNO) resides.³ For this automatic creation to succeed, the CNO computer account must possess the 'Create Computer objects' permission within that OU.³

Alternatively, administrators can pre-stage the VCO computer object in AD DS.⁸⁸ In this case, the CNO account must be granted 'Full Control' permissions over the pre-staged VCO object to manage its properties (like password and SPNs).⁸⁸ The VCO serves as the security principal for the clustered role itself, allowing it to authenticate within the domain if needed.

When the Network Name resource comes online on a node, it performs dynamic DNS registration.¹⁹ It registers its name (e.g., CAPName.domain.com) against the IP address(es) provided by its dependent IP Address resource(s). The computer account associated with the resource (either the CNO or the VCO) needs the necessary permissions to update its corresponding DNS host (A/AAAA) records. In environments with secure dynamic updates, this might require the 'Validated Write to DNS Host Name' permission on the computer object.⁷¹ The HostRecordTTL property of the Network Name resource controls the Time-To-Live value for the registered DNS record, influencing how long clients cache the name-to-IP mapping.⁵¹ A property, RegisterAllProvidersIP, determines if the Network Name registers all its dependent IP addresses (for multi-subnet clusters) or just the one currently active on the online node (0=Active Only [Default], 1=All).⁸⁶

Authentication to the CAP by clients utilizes standard Windows protocols: Kerberos or NTLM.⁷¹ For Kerberos authentication to succeed, several prerequisites must be met:

The corresponding VCO must exist and be enabled in Active Directory.⁷¹
Appropriate Service Principal Names (SPNs) must be registered against the VCO account.⁷¹ SPNs link the service type (e.g., MSSQLSvc for SQL Server, HOST for file services) and the Network Name (e.g., CAPName or CAPName.domain.com) to the VCO.
The Network Name resource in the cluster configuration must have the 'Enable Kerberos Authentication' option checked.⁷¹

If the CNO manages the VCO, it needs 'Validated Write to Service Principal Name' permission on the VCO to automatically manage SPNs.⁷¹ If any of these Kerberos requirements are not met, authentication attempts will typically fall back to the less secure NTLM protocol, or may fail entirely depending on client and server policy.⁷² The reliability of CAPs is therefore heavily dependent not only on the cluster itself but also on the health and correct configuration of the supporting Active Directory and DNS infrastructure. Issues like CNO/VCO permission problems, AD replication latency, or DNS update failures can prevent CAPs from coming online or cause client connection failures.⁴⁷

3.5. Network Fault Tolerance (NetFT.sys) Deep Dive

The Network Fault Tolerant driver (NetFT.sys) is more than just the heartbeat manager; it provides the comprehensive, resilient, and abstracted network communication fabric for all internal cluster operations.⁵¹

Its functionality extends beyond sending and receiving heartbeats to include the routing of general intra-cluster Remote Procedure Call (RPC) traffic.⁵¹ This includes critical communications such as Global Update Manager (GUM) updates for database synchronization, cluster state change notifications, and management API calls directed between nodes.

Central to NetFT's operation is the Microsoft Failover Cluster Virtual Adapter.⁵¹ This virtual adapter is created automatically when the Failover Clustering feature is installed or when a cluster is formed.⁵¹ It is typically hidden from view in Device Manager unless "Show hidden devices" is selected.⁵² This adapter possesses its own unique MAC address, algorithmically derived from the MAC address of one of the physical network adapters on the node.⁵¹ It utilizes non-routable IP addresses for communication: an Automatic Private IP Addressing (APIPA) address in the 169.254.x.x range for IPv4, and a link-local address (fe80::) for IPv6.⁵¹ The Cluster Service on each node binds to this virtual adapter's IP address for its internal communication endpoints (using port 3343).⁵¹ Actual network transmission occurs by NetFT.sys encapsulating the cluster communication (e.g., TCP traffic destined for another node's NetFT IP) within UDP packets (also using source/destination port 3343) and sending these UDP packets out over one or more of the physical network adapters enabled for cluster use.⁵¹ The receiving node's NetFT.sys receives the UDP packet, decapsulates the original traffic, and delivers it to the local Cluster Service via the NetFT virtual adapter.⁵¹

NetFT.sys actively discovers all possible network paths between cluster nodes using the cluster-enabled physical networks.⁵¹ It builds an internal routing table and uses the cluster network metrics (described in Section 3.1) to determine the lowest-cost (highest priority) path for non-heartbeat communication.⁵¹

A key feature is dynamic rerouting.⁵¹ NetFT.sys continuously monitors the health of all paths, primarily using the heartbeat mechanism. If the currently selected primary path for communication between two nodes experiences a failure (e.g., excessive packet loss or latency leading to missed heartbeats), NetFT.sys automatically and transparently switches subsequent internal cluster traffic to the next-best available path based on its internal metrics.⁵¹ This failover at the network transport layer provides resilience to the higher-level Cluster Service, which continues communicating via the stable NetFT virtual adapter addresses, unaware of the underlying physical path change. It's important to note that while general cluster communication fails over to a single best path, heartbeat packets continue to be sent and monitored over all available cluster-enabled paths simultaneously to ensure rapid detection of any path failure.⁵¹

For security, intra-cluster communication traversing the NetFT adapter is cryptographically signed by default (SecurityLevel=1) to ensure integrity and prevent tampering.⁵¹ Administrators can configure this via the (Get-Cluster).SecurityLevel PowerShell property to either disable security (SecurityLevel=0, clear text) or enable full encryption (SecurityLevel=2).⁵¹ Encryption provides confidentiality but introduces a slight performance overhead.⁵¹

3.6. Global Update Manager (GUM) Architecture

The Global Update Manager (GUM) is a critical sub-component within the Cluster Service (ClusSvc.exe) responsible for orchestrating and ensuring the consistency of cluster-wide configuration database updates.¹² Whenever a change occurs that modifies the cluster's state – such as a resource changing state (online/offline), a node joining or leaving, or an administrator modifying a cluster property – the GUM manages the process of propagating this change to all active nodes and committing it durably to the cluster database (ClusDB).

GUM updates rely on the underlying fault-tolerant network infrastructure provided by NetFT.sys to transmit update messages between cluster nodes.¹² The reliability and latency of the cluster networks directly impact the performance and success of GUM updates.

Starting with Windows Server 2012 R2, GUM supports different operational modes that define the acknowledgment and commit semantics for updates, allowing for tuning based on consistency requirements and network topology¹²:

All (write) / Local (read): This is the most strongly consistent mode and the default for most clustered roles. Before a configuration change is considered committed, GUM requires an acknowledgment that the update has been received and processed successfully from every active node in the cluster. Once committed, any subsequent read of the configuration data can be safely performed from the local node's copy of the ClusDB, as it is guaranteed to be up-to-date. This mode ensures maximum consistency but can be bottlenecked by the slowest or least responsive node in the cluster, potentially impacting performance in high-latency environments like multi-site clusters.¹²
Majority (read and write): This mode, the default for Hyper-V clusters in Windows Server 2012 R2 and later, relaxes the acknowledgment requirement. GUM only needs to receive successful acknowledgments from a majority (more than half) of the currently active nodes before committing the change. This can significantly improve the performance of configuration updates in clusters with high inter-node latency, as the update doesn't have to wait for the slowest nodes. However, because not all nodes might have processed the latest update immediately after commit, reads become more complex. When reading configuration data, the cluster must query a majority of nodes, compare timestamps associated with the data, and use the data with the latest timestamp to ensure consistency.¹²
Majority (write) / Local (read): Similar to the previous mode, this requires only a majority acknowledgment for writes to be committed. However, reads are performed directly from the local node's ClusDB copy without consulting other nodes or comparing timestamps. While this offers the write performance benefits of the majority commit, it introduces the risk of reading stale data if the local node happened to be one of the nodes that had not yet processed the latest update when the majority commit occurred. This mode is generally not recommended for workloads requiring strong read consistency.

The GUM architecture, with its different modes and reliance on the PaxosTag versioning mechanism¹¹, ensures that despite the distributed nature of the cluster, all nodes eventually converge on a single, consistent view of the cluster configuration, which is essential for coordinated high-availability operations.

4. Storage Architecture Integration

Failover clusters often rely on shared storage systems that must be accessible by multiple nodes. WSFC integrates deeply with various storage architectures, providing mechanisms for managing access, arbitrating ownership, and offering specialized file systems like Cluster Shared Volumes (CSV) designed for simultaneous access scenarios.

4.1. Clustered Storage Concepts

The fundamental requirement for many clustered roles, such as traditional SQL Server Failover Cluster Instances (FCIs), Scale-Out File Servers (SoFS), or Hyper-V virtual machines that need to migrate between hosts without storage migration, is access to shared storage.² This storage must be concurrently accessible (though typically write access is arbitrated) by all nodes that are potential owners of the clustered role. WSFC supports several types of shared storage architectures:

Traditional Shared Storage (Block Access): These provide block-level access to Logical Unit Numbers (LUNs) over a Storage Area Network (SAN) or direct connection.
- Fibre Channel (FC) / Fibre Channel over Ethernet (FCoE): High-speed SAN protocols offering block access. Commonly used for enterprise workloads requiring high performance and low latency. WSFC supports FC/FCoE LUNs as Physical Disk resources.⁸¹
- iSCSI: An IP-based protocol providing block access over standard Ethernet networks. Offers a lower-cost alternative to FC. WSFC supports iSCSI LUNs as Physical Disk resources.⁸¹
- Shared Serial Attached SCSI (SAS): SAS enclosures connected directly to multiple cluster nodes via SAS Host Bus Adapters (HBAs). Provides block access without requiring a SAN fabric for smaller clusters. WSFC supports shared SAS LUNs as Physical Disk resources.
Software-Defined Storage (SDS) / Hyperconverged Infrastructure (HCI): These approaches virtualize storage using software and often leverage local storage within the cluster nodes.
- Storage Spaces Direct (S2D): A core feature of Windows Server (especially in Azure Stack HCI) that aggregates locally attached drives (SATA, SAS, NVMe) across multiple cluster nodes into a single, resilient virtual storage pool.⁸² It uses the Software Storage Bus and SMB3 for internal communication, eliminating the need for traditional shared storage hardware.⁹¹ Storage is exposed to applications typically via CSVFS volumes created on top of Storage Spaces (virtual disks) carved from the pool.
- Cluster Shared Volumes (CSV): A specialized clustered file system (CSVFS) designed to be layered on top of NTFS or ReFS formatted LUNs (either from traditional shared storage or S2D virtual disks).⁵³ CSV allows simultaneous read and write access to the same volume from all nodes in the cluster. This is achieved by coordinating metadata access through a designated Coordinator Node while allowing data I/O to occur directly between each node and the storage (Direct I/O) or via redirection through the Coordinator Node (Redirected I/O).⁵³ CSV presents a consistent namespace across all nodes, typically under C:\ClusterStorage\.⁵⁴ CSV is particularly crucial for Hyper-V clusters, enabling virtual machine live migration without needing to move the associated VHDX files, as the destination node already has access to the volume containing them.

Table 5: Clustered Storage Technologies Overview
Technology	Access Type	Arbitration Mechanism	Typical Use Case(s)	Key WSFC Components Involved
Shared SAS LUN	Block	SCSI-3 PR	General Purpose, SQL FCI	`ClusDisk.sys`
FC SAN LUN	Block	SCSI-3 PR	General Purpose, SQL FCI	`ClusDisk.sys`
iSCSI SAN LUN	Block	SCSI-3 PR	General Purpose, SQL FCI	`ClusDisk.sys`
Cluster Shared Vol (CSV)	File (on Block)	Software (Coordinator Node)	Hyper-V VMs, Scale-Out FS	`CsvFs.sys`/`CsvFIT.sys`, SMB3 (for redirection), `ClusDisk.sys` (underlying LUN)
Storage Spaces Direct (S2D)	Block (Virtual) / File (via CSV/SoFS)	Software (Distributed/Cluster)	HCI, Scale-Out FS, Hyper-V	SSB, Storage Spaces, ReFS/NTFS, CSVFS, SMB3, Health Service
References: ⁵⁰

The choice of storage architecture impacts how WSFC manages access and performs failover. Traditional block storage relies heavily on SCSI Persistent Reservations managed by ClusDisk.sys, while CSV and S2D shift much of the access control and consistency management into software layers within the cluster itself.

4.2. SCSI-3 Persistent Reservations (PR) in Disk Arbitration

For failover clusters utilizing traditional shared block storage (Fibre Channel, iSCSI, Shared SAS), SCSI-3 Persistent Reservations (PRs) are the cornerstone mechanism for disk arbitration and I/O fencing.⁵⁰ Their primary function is to prevent a catastrophic "split-brain" scenario at the storage level, where multiple nodes might erroneously believe they own a shared LUN and attempt simultaneous writes, leading to data corruption.

Persistent Reservations are an evolution of the older SCSI-2 Reserve/Release mechanism.⁵⁵ Unlike SCSI-2 reserves, which could be broken by a simple SCSI bus reset, SCSI-3 PRs are designed to persist across such events, providing more robust fencing.⁵⁵ The PR mechanism involves two main concepts: registration and reservation.⁵⁵

Registration: Initiators (cluster nodes) register a unique key with the target LUN. This identifies the nodes that are part of the cluster and allowed to participate in the reservation process.
Reservation: A registered initiator can then establish a persistent reservation on the LUN. Several types of reservations exist, but WSFC typically employs a type like "Write Exclusive, Registrants Only" (Type 5) or "Exclusive Access, Registrants Only" (Type 6). This allows all registered nodes (the cluster members) to potentially read from the LUN, but strictly permits only the single node holding the reservation (the "owner" node) to perform write operations.⁵⁵

WSFC leverages PRs through the Cluster Disk Driver (ClusDisk.sys).⁵⁰ When a node successfully brings a Physical Disk resource online, ClusDisk.sys executes the necessary SCSI-3 PR commands (specifically PERSISTENT RESERVE OUT with appropriate parameters) to establish the reservation on the corresponding LUN, claiming ownership for that node.⁵⁰ To maintain ownership and prevent the reservation from timing out (if configured with a timeout on the storage array), the owning node's ClusDisk.sys periodically renews the reservation, often by re-issuing PR commands or specific check-in commands.⁵⁰ When a resource is taken offline, it issues commands to release the reservation.⁵⁰ During failover or in response to a node failure (fencing), the arbitration process relies heavily on PRs.⁵⁰ A node attempting to take ownership of the LUN (a "challenging" node) must first break the reservation held by the previous owner (the "defending" node). This can be achieved using SCSI bus resets (though less common with PRs than SCSI-2) or, more typically, by issuing specific PR commands like PERSISTENT RESERVE OUT with a PREEMPT or CLEAR service action.⁵⁰ Only after successfully breaking the old reservation can the challenging node establish its own reservation and safely bring the disk resource online.

Correct implementation and configuration of SCSI-3 PRs on the storage array are critical for WSFC stability. Storage arrays must properly support the specific PR types and commands used by Windows. Failure to support PRs, or misconfiguration (e.g., incorrect key registration, reservation conflicts), can lead to cluster validation failures⁹⁵, inability to bring disk resources online, or failure during failover attempts.

4.3. Cluster Disk Driver (ClusDisk.sys) Functionality

The Cluster Disk Driver (ClusDisk.sys) is a kernel-mode driver specifically designed to manage access to shared block storage devices (LUNs) that are configured as Physical Disk resources within a failover cluster.⁵⁰ It sits within the Windows storage stack and intercepts I/O requests targeted at these clustered disks.

Its key functions include:

Identifying Cluster Disks: On system startup, ClusDisk.sys reads disk signatures from a specific registry key (HKLM\SYSTEM\CurrentControlSet\Services\ClusDisk\Parameters\Signatures).⁵⁰ These signatures uniquely identify the disks that are under cluster management. The driver then scans the storage buses to locate the physical disks matching these signatures.
Managing Disk Ownership via PRs: As detailed previously, ClusDisk.sys is the component responsible for interacting with the storage hardware (via the storage stack and HBA drivers) to implement SCSI-3 Persistent Reservations.⁵⁰ When a node brings a Physical Disk resource online, ClusDisk.sys issues the appropriate PERSISTENT RESERVE OUT commands to claim exclusive write access for that node.⁵⁰ It handles the periodic renewal of these reservations to maintain ownership.⁵⁰ When a resource is taken offline, it issues commands to release the reservation.⁵⁰ During failover arbitration, it executes the logic to break existing reservations and acquire new ones.⁵⁰
Enforcing Access Control: To prevent data corruption from simultaneous writes, ClusDisk.sys ensures that only the node currently holding the persistent reservation (the owner of the Physical Disk resource) has write access to the LUN. On all other nodes in the cluster (non-owning nodes), ClusDisk.sys interacts with the operating system's volume manager and file system drivers to keep the corresponding volume marked as 'offline' or 'reserved' and inaccessible for read or write operations at the file system level.⁵⁰
Presenting Volumes to the OS: Once a node has successfully acquired the persistent reservation and brought the Physical Disk resource online, ClusDisk.sys allows the volume manager and file system drivers (NTFS, ReFS) on that node to mount the volume(s) on the LUN, making them accessible for I/O operations by applications running on that node.⁵⁰

In essence, ClusDisk.sys acts as the gatekeeper for traditional shared storage in a WSFC environment, using SCSI-3 PRs as its primary tool to enforce single-owner write access and manage the complex process of ownership transfer during failovers.

4.4. Cluster Shared Volumes (CSV) Architecture

Cluster Shared Volumes (CSV), introduced in Windows Server 2008 R2 and significantly enhanced in subsequent versions, represent a distributed file-access solution built upon NTFS or ReFS. CSV allows multiple nodes in a Windows Server Failover Cluster to have simultaneous read and write access to the same shared storage volume. This overcomes the traditional limitation of clustered disks where only one node could own and access a disk at a time, making CSV essential for workloads requiring high availability and mobility, most notably Hyper-V virtual machines and Scale-Out File Server (SOFS) shares for application data (like SQL Server database files).

CSV Architecture and Core Concepts

CSV operates as a layer above the base file system (NTFS/ReFS) on shared storage LUNs or Spaces. While all nodes can access the data concurrently, the architecture employs specific mechanisms to manage consistency and I/O flow:

Coordinator Node: For each CSV volume, one node in the cluster is designated as the Coordinator Node (also referred to as the Owner Node in Failover Cluster Manager). This role is distributed across cluster nodes (typically balanced based on the number of CSVs each node coordinates) and is dynamically reassigned during events like node failures, shutdowns, or joins to maintain availability and load distribution. The Coordinator Node is responsible for orchestrating and synchronizing all file system metadata operations (e.g., file creation, deletion, extension, renaming, attribute changes) for its assigned CSV volume, ensuring a consistent view across all nodes.
CSV Namespace (C:\ClusterStorage\): CSV volumes are not assigned drive letters by default. Instead, they are mounted as directories under a consistent, cluster-wide namespace located at C:\ClusterStorage\ on the system drive of every cluster node. Each CSV volume appears as a subdirectory (e.g., C:\ClusterStorage\Volume1). While the root C:\ClusterStorage path is static, the individual volume mount point names (e.g., "Volume1") can be renamed by administrators for better identification, ideally before placing application data on them.
Simultaneous Access: All nodes in the cluster can perform I/O operations to the same CSV volume concurrently. This allows, for example, multiple Hyper-V hosts to run virtual machines whose VHDX files reside on the same LUN, facilitating seamless live migration without needing to transfer storage ownership.

CSV Components and Drivers

The functionality of CSV is delivered through a set of specialized file system filter drivers and cluster components:

CSV File System (CsvFs.sys / CsvFlt.sys): This kernel-mode driver (implemented as a pseudo-file system or, more recently, as a mini-filter CsvFlt.sys at altitude 404800) intercepts file I/O requests directed to the CSV namespace. It determines the appropriate I/O path (Direct or Redirected) and coordinates metadata operations with the Coordinator Node. The CsvFlt.sys mini-filter runs *only* on the Coordinator Node and plays a role in enforcing permissions via the SharedVolumeSecurityDescriptor cluster property and managing native NTFS features like advisory file locking.
CSV Volume Manager (CsvVbus.sys): This virtual bus driver is responsible for presenting the CSV pseudo-volumes to the system and managing block-level I/O redirection.
CSV Namespace Filter (CsvNsFlt.sys): Another mini-filter driver (altitude 404900) that protects the C:\ClusterStorage\ root directory from unauthorized modifications and helps dispatch block-level redirected I/O directly to the physical disk on the Coordinator Node.
Disk Control Manager (DCM): A core cluster component (part of ClusSvc.exe) that manages the CSV namespace, implements a global distributed lock for CSV volumes, handles notifications of volume changes, manages pseudo-volume lifetimes, coordinates snapshots, and links CSV volumes to their underlying physical disk resources.
Server Message Block (SMB): CSV leverages SMB (specifically SMB 3.x in modern versions) for all inter-node communication related to redirected I/O and metadata synchronization. This requires "Client for Microsoft Networks" and "File and Printer Sharing for Microsoft Networks" to be enabled on cluster networks used for CSV traffic.

CSV I/O Handling and Redirection

CSV dynamically determines the optimal path for I/O operations:

Direct I/O: The preferred mode. The node initiating the I/O communicates directly with the shared storage through its local storage stack (e.g., MPIO, Storport). This offers the highest performance as data does not traverse the cluster network. Metadata operations are still sent to the Coordinator Node via SMB.
File System Redirected I/O: Used if a node loses direct storage connectivity. The initiating node's CSVFS sends the I/O request via SMB to the Coordinator Node. The Coordinator Node's CSVFS receives the request, performs the I/O against the underlying file system (NTFS/ReFS) on the storage, and returns the result via SMB. This ensures availability but incurs network latency and processing overhead on the Coordinator.
Block Level Redirected I/O: An optimized redirection mode. Used when Direct I/O fails or for certain configurations like Storage Spaces mirrored volumes. The initiating node's CsvVbus.sys sends the block-level I/O request via SMB to the Coordinator Node. The Coordinator Node's CsvNsFlt.sys receives the request and dispatches it directly to the disk stack (Disk.sys), bypassing the file system layer on the Coordinator. This reduces overhead compared to File System redirection but is still slower than Direct I/O.

The cluster automatically attempts Direct I/O first. If that fails, it determines the appropriate redirected mode based on the failure type and configuration.

CSV Resiliency and Availability

CSV enhances cluster resiliency by handling various failure scenarios transparently:

Node Failure: If the Coordinator Node for a CSV volume fails, another node automatically takes over the Coordinator role. Nodes that were performing Direct I/O continue to do so. Nodes that were in Redirected I/O mode will re-establish SMB connections to the new Coordinator. Virtual machines generally continue running without interruption.
Storage Path Failure: If a node loses its direct connection to the storage for a CSV volume, it automatically switches to Redirected I/O (File System or Block Level) through the Coordinator Node, maintaining access to the volume.
Network Failure: CSV leverages SMB Multichannel and cluster network fault tolerance (NetFT). If one network path used for redirected I/O fails, traffic automatically fails over to another available cluster network.
NTFS/ReFS Health Integration: CSV integrates with the underlying file system's health model. If corruption is detected, NTFS/ReFS can perform online self-healing or log issues for offline repair (chkdsk /spotfix). During spotfix operations, CSV briefly pauses I/O to the affected volume region, allowing for rapid repair with minimal disruption.

CSV Performance Considerations

While Direct I/O is the goal, CSV includes features to optimize performance even during redirection:

SMB 3.x Features: Leverages SMB Multichannel to aggregate bandwidth and provide resiliency across multiple network adapters for redirected traffic. Utilizes SMB Direct (RDMA) if RDMA-capable network adapters are available and configured, significantly reducing latency and CPU overhead for redirected I/O.
CSV Block Cache: An optional read-only, in-memory cache for unbuffered I/O, primarily benefiting Hyper-V read-intensive workloads (like parent VHD reads). It uses system RAM allocated cluster-wide (configurable via the SharedVolumeBlockCacheSizeInMB cluster property, default 0) and requires the CsvEnabledBlockCache private property to be enabled per-disk resource.
Metadata Optimization: Metadata operations, while always going through the Coordinator, are optimized for efficiency.

CSV Integration and Use Cases

Hyper-V: The primary use case. Allows VHD(X) files for multiple VMs to reside on a single LUN/Space, accessible by all hosts, enabling features like Live Migration without storage migration.
Scale-Out File Server (SOFS) for Application Data: SOFS relies exclusively on CSV to provide continuously available file shares for applications like SQL Server and Hyper-V (storing VM files on SMB shares).
Storage Spaces: CSV can be layered on top of clustered Storage Spaces, including Storage Spaces Direct (S2D), providing the distributed access namespace. Note that Mirrored Spaces often trigger Block Level Redirected I/O.
Backup and VSS: CSV integrates with the Volume Shadow Copy Service (VSS). It includes a dedicated CSV VSS Writer and Helper Provider to coordinate consistent, cluster-wide snapshots for backup applications, supporting both hardware and software VSS providers.

In summary, CSV is a critical technology for modern Windows Server Failover Clusters, particularly those hosting virtualized workloads or scale-out file services. It provides a scalable, highly available, and performant shared storage access model by abstracting the complexities of storage ownership and enabling concurrent access across all cluster nodes, albeit with different I/O paths depending on connectivity and configuration.

4.5. Storage Spaces Direct (S2D) Architecture

Storage Spaces Direct (S2D) is Microsoft's software-defined storage (SDS) solution integrated within Windows Server Failover Clustering. It enables building highly available and scalable storage using industry-standard servers with locally attached drives (NVMe, SSD, HDD), eliminating the need for traditional shared SAS fabric or SAN/NAS hardware. S2D typically operates in clusters of 2 to 16 servers, although the underlying Windows Server Failover Cluster supports up to 64 nodes.

S2D fundamentally shifts storage intelligence from dedicated hardware controllers into the software stack running on clustered servers. It virtualizes the local physical storage across all nodes into a cluster-wide storage pool, providing resilient virtual disks for workloads like Hyper-V or Scale-Out File Servers.

Core Architecture and Concepts

S2D relies heavily on Windows Server Failover Clustering (WSFC) for node coordination, state management, and resource health monitoring. The architecture involves several layers and key software components:

Hardware Layer

Servers: Standard x64 servers certified for the target Windows Server version.
Drives: Locally attached NVMe, Persistent Memory (PMem/SCM), SAS/SATA SSDs, or SAS/SATA HDDs. Drives can be internal or in directly attached JBODs (Just a Bunch Of Disks) enclosures. No RAID controllers should be used; drives should be presented directly via Host Bus Adapters (HBAs) in pass-through mode.
Networking: Minimum 10 Gbps Ethernet is required for inter-node storage traffic. Redundant adapters are strongly recommended. Remote Direct Memory Access (RDMA) NICs (supporting RoCE v2 or iWARP) are highly recommended for optimal performance and lower CPU overhead via SMB Direct.

Networking Fabric

Networking is critical for S2D's performance and stability as it forms the backbone of the Software Storage Bus.

High Bandwidth: All inter-node communication, including storage replication and data access for non-local reads/writes, traverses the network. Insufficient bandwidth becomes a major bottleneck (e.g., a single 10GbE link saturates at ~1.25 GB/s).
Low Latency: Network latency directly impacts storage I/O latency, especially writes. Target latency between nodes should ideally be very low (e.g., ~50 microseconds in Microsoft internal tests). This generally precludes stretching S2D clusters across geographically distant sites.
RDMA (Recommended): Technologies like RoCE (RDMA over Converged Ethernet) or iWARP allow network data transfers (SMB Direct) to bypass the kernel stack and access memory directly, significantly reducing latency and CPU utilization for storage traffic. Correct configuration (e.g., PFC and ETS for RoCE) is essential.
Switch Embedded Teaming (SET): The recommended way to team NICs within the Hyper-V Virtual Switch, providing load balancing and failover without the limitations of older LBFO teaming.

Windows Storage Stack with S2D

S2D integrates deeply into the Windows storage stack. An I/O request travels through these layers (simplified, bottom-up):

Miniport Drivers: Hardware-specific drivers (e.g., stornvme.sys for NVMe, vendor drivers) interfacing with the physical storage devices via the storport.sys port driver.
Partition Manager (Physical): partmgr.sys manages physical disk partition tables (GPT or MBR).
Class Driver (Physical): disk.sys provides common functionality for physical disk devices.
Storage Bus Layer (SBL): A key S2D component replacing the physical storage fabric. It has two main parts:
- Software Storage Bus (SSB): Creates a virtual storage fabric over the cluster's Ethernet network using SMB3. It allows every server to see and access all eligible local disks in all other servers in the cluster. This involves:
  - clusport.sys: Acts as the "Cluster Port" or SBL Initiator, initiating connections to other nodes.
  - clusbflt.sys: Acts as the "Cluster Block Target" or SBL Target, receiving connections and exposing local disks over the network. It uses a hidden administrative share named \\ServerName\BlockStorage$ where each disk appears as a target file (e.g., \\ServerName\BlockStorage$\Device\1).
  - All inter-node storage I/O flows over SMB3, utilizing SMB Multichannel for aggregation/redundancy and SMB Direct if RDMA is available.
- Storage Bus Cache (SBC): A persistent, real-time, server-side read/write cache (in hybrid deployments) or write-only cache (typically in all-flash).
  - Automatically identifies the fastest drives (NVMe, PMem/SCM, or SSDs if HDDs are present) and designates them as cache devices.
  - Cache drives *do not* contribute to the usable storage capacity; they only accelerate I/O to capacity drives.
  - Bindings: S2D automatically binds faster cache drives to slower capacity drives *within each server*. These bindings are local (e.g., NVMe cache serves HDDs on the *same* server). The ratio of cache to capacity drives per server impacts performance. Bindings are determined automatically (round-robin) and can be dynamically re-assigned if drives are added/removed or fail.
  - Behavior:
    - Hybrid (SSD/NVMe cache for HDD capacity): Cache serves both reads and writes. Hot read data is cached, and all writes land in the cache first and are later de-staged to HDDs.
    - All-Flash (NVMe cache for SSD capacity): Cache is typically Write-Only by default, as reads from SSDs are already fast. Writes land in the NVMe cache to absorb bursts and reduce wear on capacity SSDs.
  - Configuration: Cache behavior can be influenced via PowerShell (Set-ClusterS2D) using parameters like -CacheState (Enabled/Disabled), -CacheModeHDD/-CacheModeSSD (ReadWrite/WriteOnly), and -CachePageSizeKBytes (granularity, default 16KiB). Manual configuration is generally not recommended without specific needs.
Spaceport (spaceport.sys): The core Storage Spaces engine, acting as the Storage Spaces Controller.
- Claims physical disks presented by SBL and adds them to the cluster-wide Storage Pool (automatically created when S2D is enabled, consuming disks from the "Primordial Pool").
- Creates hidden Storage Spaces partitions on each physical disk in the pool for metadata and data extents (slabs).
- Manages the creation and layout of Virtual Disks (also called Spaces or volumes) from the pool.
- Implements data Resiliency (Mirroring, Parity, Mirror-Accelerated Parity) by distributing data extents across different physical disks and, critically, across different servers (Fault Domains, typically each server/StorageScaleUnit).
- Handles I/O distribution, data repairs, and rebalancing.
Disk (Virtual Disk): disk.sys again, but this time representing the Storage Spaces virtual disks created by Spaceport to the upper layers of the OS.
Partition Manager (Virtual Disk): partmgr.sys again, managing partitions created *on* the Storage Spaces virtual disks (e.g., where you create your file system volume). It masks the underlying physical disk and Spaceport partitions.
Volume Manager (volmgr.sys): Manages the logical volumes created on the virtual disk partitions (e.g., assigning drive letters or mount points).
Volume Snapshot Service (VSS) Integration (volsnap.sys): Allows for application-consistent snapshots.
Bitlocker Drive Encryption (fvevol.sys): An optional filter driver for encrypting volumes at rest.
File System Drivers (e.g., refs.sys, ntfs.sys): Format the volumes. ReFS (Resilient File System) is highly recommended for S2D, especially for Hyper-V workloads, due to performance optimizations (block cloning for VHDX operations, sparse VDL) and built-in integrity features (checksums, scrubbing).
Filter Drivers (Various): Other drivers can intercept I/O at the file system level, such as:
- CSV Filter Driver (csvflt.sys): Part of Cluster Shared Volumes.
- Storage QoS Filter (storqosflt.sys): Enforces Quality of Service policies on virtual disks.
- Data Deduplication Filter: Performs post-process deduplication on ReFS or NTFS volumes.
- Third-party filters (e.g., Antivirus).
Cluster Shared Volumes File System (CSVFS - csvfs.sys): A clustered file system proxy layered above NTFS or ReFS. It presents storage volumes as accessible simultaneously from all nodes in the cluster (e.g., under C:\ClusterStorage\Volume1). CSVFS coordinates metadata access and handles I/O redirection if needed (though S2D aims for local I/O). It also provides an optional in-memory read cache (CSV Cache) to further accelerate reads for frequently accessed data.
I/O Subsystem: Manages I/O requests initiated by applications.
Applications/Workloads: The consumers of the storage, such as Hyper-V Virtual Machines, Scale-Out File Server shares, or SQL Server databases.

Failover Clustering (WSFC) Integration

Foundation: S2D requires and runs on WSFC. The cluster manages node membership, quorum, and overall state.
Resources: The Storage Pool, the Virtual Disks created from it, and the CSVs are all managed as cluster resources. WSFC monitors their health and handles failover/failback.
Health Service: Introduced with S2D, this service provides enhanced, real-time monitoring and operational management specific to S2D, including detailed physical disk health, proactive drive failure detection, and automated actions like retiring and replacing failed disks.

Resiliency Mechanisms

Storage Spaces provides data protection against drive and server failures:

Two-Way Mirror: Data is written in two copies on different disks in different servers. Tolerates 1 drive or 1 server failure. Requires 2x capacity. Suitable for 2-3 node clusters.
Three-Way Mirror: Data is written in three copies on different disks in different servers. Tolerates 2 drive failures or 2 server failures simultaneously. Requires 3x capacity. Recommended for 4+ nodes for high availability.
Parity (Single/Dual): Uses erasure coding (like RAID 5/6) distributing data and parity information across disks and servers. More space-efficient than mirroring but typically has lower write performance (especially random writes). Dual Parity tolerates 2 failures.
Mirror-Accelerated Parity (MAP): A hybrid approach (available on ReFS) where writes first go to a mirrored portion for speed, then are rotated to the parity portion for space efficiency. Offers a balance between performance and capacity usage.

Deployment Models

Hyperconverged Infrastructure (HCI): Compute (Hyper-V) and storage (S2D) run on the same cluster of servers. This is the most common deployment, simplifying management and reducing hardware footprint. Virtual machines run locally, accessing storage provided by the S2D stack on the same servers.
Converged (or Disaggregated): The S2D cluster acts as a dedicated storage backend. A Scale-Out File Server (SOFS) role is deployed on top of the S2D cluster, exposing the CSV volumes as continuously available SMB3 file shares. Separate compute clusters (e.g., Hyper-V or SQL Server) connect to these SOFS shares over the network to access storage. This allows independent scaling of compute and storage resources.

I/O Path Summary

Write Path: An application write (e.g., from a VM) goes to the CSV coordinator node -> File System (ReFS) -> Spaceport (Virtual Disk) -> Spaceport determines resiliency placement (e.g., 3 copies for 3-way mirror) -> One write stays local, others are sent via SBL/SMB3 to peer nodes -> On each node, the write (or portion of it due to striping) hits the Storage Bus Cache (SBC) -> Once all cache writes complete and are acknowledged, the application write is acknowledged -> Data is later de-staged from cache to capacity drives asynchronously.
Read Path: An application read goes to CSV -> File System (ReFS) -> Spaceport (Virtual Disk) -> Spaceport checks if the requested data copy is available locally -> If yes, reads occur from local drives (potentially hitting the SBC or CSV Cache first) -> If data is not local (less common with proper CSV ownership), Spaceport directs the read via SBL/SMB3 to a node holding a copy -> Data is returned to the requesting application. If data is read from capacity drives and misses the SBC, it may be loaded into the SBC for future reads.

Key Considerations

Network Design: Absolutely critical. Use RDMA, ensure sufficient bandwidth and low latency, configure properly (DCB for RoCE), and ensure redundancy.
Hardware Selection: Use hardware validated for S2D from the Azure Stack HCI Catalog (even for Windows Server deployments). Pay attention to drive types, endurance ratings (especially for cache), and compatibility.
Updates: Keep the entire stack updated – OS patches, drivers, and firmware (especially for NICs, HBAs, and drives) – using validated update packages where possible.
Monitoring: Utilize Windows Admin Center, System Center Operations Manager (SCOM) with the S2D Management Pack, and PowerShell (e.g., Get-StorageSubSystem, Get-PhysicalDisk, Get-StorageJob) to monitor health and performance.

5. Security Components & Considerations

Securing a Windows Server Failover Cluster involves managing specific Active Directory objects, understanding authentication flows, configuring network security correctly, and applying general hardening principles to the cluster nodes.

5.1. Cluster Name Object (CNO) and Active Directory Integration

The Cluster Name Object (CNO) is a standard computer account created in Active Directory Domain Services (AD DS) that represents the failover cluster itself. It serves as the primary security principal for the cluster service within the domain.

Typically, the CNO is created automatically during the cluster creation process using the Create Cluster wizard or New-Cluster PowerShell cmdlet.³² It is usually placed in the default 'Computers' container or within the same Organizational Unit (OU) where the computer accounts for the cluster nodes reside. For automatic creation to succeed, the user account performing the cluster creation must have specific permissions in AD, namely the 'Create Computer objects' and 'Read All Properties' permissions within the target container or OU.³ The installing user also requires local administrative privileges on the servers that will become cluster nodes.³

Alternatively, administrators can choose to pre-stage the CNO in AD DS before creating the cluster. This involves manually creating the computer account in the desired OU, disabling it initially (to prevent conflicts and allow the cluster creation process to verify it's unused), and granting the user account that will create the cluster 'Full Control' permission over this pre-staged CNO object.³ Pre-staging provides more administrative control over the object's placement and initial properties.

The CNO's role is critical. It authenticates the cluster service to the domain and, importantly, it is granted the necessary permissions to subsequently create and manage other related computer objects (Virtual Computer Objects - VCOs) required for clustered roles that need a network presence.³ If the CNO account is accidentally deleted, disabled (after initial creation), or has its necessary permissions revoked, the cluster may lose the ability to manage existing VCOs or create new ones, severely impacting the availability of services that rely on Client Access Points.³ Protecting the CNO object in AD (e.g., enabling "Protect object from accidental deletion"⁸⁸) is a recommended best practice.

5.2. Virtual Computer Objects (VCOs) and Network Names

Virtual Computer Objects (VCOs) are AD DS computer accounts created specifically to represent Cluster Network Name resources that function as Client Access Points (CAPs) for clustered roles.³ When a client connects to a clustered service using the Network Name (e.g., a SQL Server FCI VNN or a File Server name), the VCO associated with that Network Name resource acts as the security principal for authentication purposes.

Similar to the CNO, VCOs are typically created automatically when a Network Name resource is configured as part of a clustered role. The creation is performed by the Cluster Service acting under the security context of the CNO.³ For this automatic creation to work, the CNO computer account requires the 'Create Computer objects' permission in the OU where the VCO will be created (usually the same OU as the CNO itself).³ By default, a CNO in the 'Computers' container can create up to 10 VCOs; explicit permission is needed for more or if the CNO resides in an OU.⁸⁸

Administrators can also pre-stage VCOs in AD DS.⁸⁸ This involves manually creating the computer account for the Network Name resource before configuring it in the cluster. If a VCO is pre-staged, the CNO account must then be granted 'Full Control' permission on that specific VCO object in AD DS.⁸⁸ This allows the cluster (via the CNO) to manage the VCO's properties, such as its password and Service Principal Names (SPNs), which are essential for Kerberos authentication.

Each Network Name resource configured as a CAP within the cluster is directly associated with its corresponding VCO in Active Directory.³ The VCO enables the clustered role to authenticate on the network as if it were a distinct computer, allowing it to access other domain resources or enabling clients to authenticate to it using domain credentials, particularly via Kerberos.³

The CNO serves as the central administrative identity for the cluster itself, possessing the authority to manage the identities (VCOs) of the services running within it. Proper management and security of both CNO and VCO objects within Active Directory are therefore essential for the secure and functional operation of clustered roles requiring network names.

5.3. Authentication Mechanisms (Kerberos, NTLM, SPNs)

Authentication within a WSFC environment occurs at multiple levels, involving different principals and protocols:

Node-to-Node Communication: While much of the high-frequency internal communication (heartbeats, GUM updates) occurs over the abstracted NetFT.sys layer using its internal addressing and security settings (signed or encrypted)⁵¹, certain operations require nodes to interact as authenticated domain members. Actions involving querying or modifying Active Directory objects (like CNO/VCO management during resource online/offline) necessitate that the Cluster Service on the acting node authenticates to the domain controller using standard Windows authentication protocols (Kerberos being the preferred method in a domain). The nodes themselves, being domain members, authenticate to the domain upon startup.
Client-to-CAP Authentication: When clients connect to a clustered service via its Network Name resource (CAP), the authentication process uses standard Windows protocols, namely Kerberos or NTLM.⁷¹
- Kerberos: This is the preferred, more secure protocol. For Kerberos authentication to the CAP to succeed, the associated Virtual Computer Object (VCO) must be correctly configured in Active Directory.⁷¹ This includes the VCO existing and being enabled, and having the correct Service Principal Names (SPNs) registered against its account.⁷¹ SPNs are crucial as they map the service name (e.g., MSSQLSvc/SQLFCI1.contoso.com:1433) to the VCO account (SQLFCI1$). When a client requests a Kerberos ticket for the service, AD uses the SPN to find the correct account (the VCO) to encrypt the ticket. The cluster resource setting 'Enable Kerberos Authentication' must also be checked for the Network Name resource.⁷¹ If the VCO was pre-staged, the CNO needs the 'Validated Write to Service Principal Name' permission on the VCO object to manage these SPNs dynamically.⁷¹
- NTLM: If Kerberos authentication cannot be completed (due to missing SPNs, VCO issues, network blocks on Kerberos ports, or the 'Enable Kerberos Authentication' flag being unchecked), the authentication process may fall back to the older, challenge-response NTLM protocol.⁷² While functional, NTLM is generally considered less secure than Kerberos.
Cluster Service Account (CNO) Authentication: The CNO, being an AD computer account, authenticates to domain controllers using its own credentials (typically a complex, automatically managed password). This authentication, handled by LSASS on the node currently acting on behalf of the CNO, is necessary for operations like creating or modifying VCOs in AD.³

Understanding these different authentication contexts is crucial for troubleshooting. An issue preventing a CAP from coming online might stem from Kerberos/SPN misconfiguration related to the VCO, while an issue creating a new CAP might relate to the CNO's permissions or its ability to authenticate to AD.

5.4. Authorization Model (Cluster Object ACLS, API Security)

Authorization within WSFC determines who can perform administrative actions and how access to underlying resources managed by the cluster is controlled.

Cluster Management Authorization: Performing administrative tasks on the cluster, such as creating or deleting the cluster, adding or removing nodes, configuring resources and groups, or initiating failovers, typically requires specific privileges. The user account performing these actions generally needs to be a member of the local Administrators group on each cluster node.³ Additionally, certain actions interacting with Active Directory, like creating the cluster (which creates the CNO) or potentially managing VCOs directly, require specific AD permissions (e.g., 'Create Computer objects' or permissions on pre-staged objects). WSFC itself does not maintain a separate, granular internal authorization database; rather, it relies on standard Windows user rights and group memberships combined with AD permissions for administrative access control.
Access Control Lists (ACLs) on Cluster Objects: WSFC manages logical resources (like 'Physical Disk', 'Network Name', 'File Share'), but the actual underlying objects these represent are secured using standard Windows mechanisms, primarily Access Control Lists (ACLs).¹⁰¹
- AD Objects (CNO/VCO): Permissions on the CNO and VCO computer accounts in Active Directory are controlled by AD ACLs. Administrators grant permissions (like 'Full Control', 'Create Computer objects', 'Validated Write to SPN') to users or groups (including the CNO itself) using standard AD tools.³
- File System Objects: For resources like File Shares or files stored on clustered disks (including CSVs), access is governed by NTFS permissions (DACLs for access control, SACLs for auditing) and Share permissions.¹⁰³ While WSFC manages the availability of the File Share resource, it doesn't override the underlying file system security. The File Share resource properties might store the security descriptor defining the share permissions¹⁰⁵, which the cluster applies when bringing the share online.
- Registry Keys: Clustered registry keys are protected by standard registry ACLs.
- Other Objects: Resources representing services (like SQL Server) rely on the service's own security context and potentially application-specific authorization.
Cluster API Security: Interactions performed via the Cluster API (ClusAPI.dll)²⁵ are subject to the security context of the calling process or user. The API call will typically only succeed if the caller possesses the necessary privileges on the target cluster node(s) and potentially within AD (if the operation involves AD objects). For remote API calls using RPC, standard RPC authentication and authorization mechanisms apply, requiring the caller to authenticate and have permissions to connect to the cluster's RPC endpoint.²⁵ The API likely performs operations under the security context of the authenticated caller or uses impersonation where appropriate.

In essence, WSFC integrates with and relies upon the standard Windows security model (user rights, group memberships, ACLs on objects in AD, NTFS, Registry) rather than implementing a completely separate authorization framework. Securing a cluster involves securing both the administrative accounts and the underlying resources managed by the cluster using these standard mechanisms.

5.5. Network Security (Firewall Rules, Segmentation)

Securing the network communication paths used by WSFC is crucial for both operational stability and preventing unauthorized access. This involves implementing appropriate firewall rules and considering network segmentation.

Required Firewall Rules: Several ports and protocols must be allowed through firewalls between cluster nodes, and often between management stations or clients and the cluster nodes⁷⁹:

Table 6: Required Firewall Ports for WSFC Operation
Port(s)	Protocol	Direction	Source Scope	Destination Scope	Purpose
3343	UDP	Node-to-Node	Cluster Nodes	Cluster Nodes	Cluster Heartbeat⁵¹
3343	TCP	Node-to-Node	Cluster Nodes	Cluster Nodes	Cluster Service Join/RPC⁷⁹
135	TCP	Node-to-Node	Cluster Nodes	Cluster Nodes	RPC Endpoint Mapper
135	TCP	Mgmt/Client->Node	Mgmt Stations/Clients	Cluster Nodes	RPC Endpoint Mapper⁷⁹
Dynamic High Ports*	TCP	Node-to-Node	Cluster Nodes	Cluster Nodes	Dynamic RPC for cluster communication
Dynamic High Ports*	TCP	Mgmt/Client->Node	Mgmt Stations/Clients	Cluster Nodes	Dynamic RPC for management/API calls⁷⁹
445	TCP	Node-to-Node	Cluster Nodes	Cluster Nodes	SMB (CSV Redirection, File Share Witness, Validation)⁷⁹
445	TCP	Mgmt/Client->Node	Mgmt Stations/Clients	Cluster Nodes	SMB (File Share Access, Validation, Management)⁷⁹
5985 (HTTP) / 5986 (HTTPS)	TCP	Mgmt->Node	Mgmt Stations	Cluster Nodes	WinRM (PowerShell Remoting, WAC)⁷⁹
137, 138	UDP	Node-to-Node	Cluster Nodes	Cluster Nodes	NetBIOS Name/Datagram (If used)⁷⁹
139	TCP	Node-to-Node	Cluster Nodes	Cluster Nodes	NetBIOS Session (If used)⁸⁰
ICMP	ICMP	Bi-directional	Cluster Nodes/Mgmt	Cluster Nodes	Cluster Validation, Ping Diagnostics⁷⁹
53	UDP/TCP	Node->DNS	Cluster Nodes	DNS Servers	DNS Queries (Name Resolution)
88	UDP/TCP	Node->DC	Cluster Nodes	Domain Controllers	Kerberos Authentication
389	UDP/TCP	Node->DC	Cluster Nodes	Domain Controllers	LDAP (AD Queries)
*Dynamic RPC ports default to 49152-65535, but a smaller, specific range (e.g., 100+ ports above 5000) is often configured and recommended for firewall rules.⁷⁹

Network Security Recommendations:

Network Segmentation: Where feasible, physically or logically (VLANs) segment cluster network traffic. Dedicate separate network interfaces and subnets for internal cluster communication (heartbeats, GUM, CSV redirection - Role 1) and isolate this from client-facing traffic (Role 3).⁵⁹ This prevents client traffic storms from impacting critical heartbeat communication and enhances security by limiting exposure of internal cluster protocols. Using multiple redundant networks for internal communication is highly recommended to avoid single points of failure.⁵⁹
Strict Firewall Policies: Implement host-based (Windows Firewall) and network firewalls with rules that allow only the necessary ports and protocols listed above, restricted to the specific source and destination IP addresses or subnets involved (e.g., only allow UDP 3343 between cluster node IPs).⁷⁹ Deny all other traffic by default.
Internal Network Security: For dedicated internal cluster networks (Role 1):
- Disable unnecessary protocols like NetBIOS over TCP/IP.⁵⁹
- Consider using the built-in cluster communication security feature ((Get-Cluster).SecurityLevel = 2) to encrypt all NetFT traffic between nodes.⁵¹ Alternatively, IPsec could be implemented at the network layer, though this adds configuration complexity.
Avoid Network Teaming for Heartbeats: Microsoft generally does not recommend using NIC Teaming (LBFO) for dedicated heartbeat network adapters, as the teaming mechanisms can sometimes interfere with the latency-sensitive nature of heartbeats. Instead, use multiple independent network adapters configured for cluster communication, allowing NetFT.sys to manage the redundancy.⁵⁹

5.6. Cluster Node Hardening Principles

Securing the individual server nodes that make up the failover cluster is as important as securing the cluster configuration and network. A compromised node can potentially disrupt the entire cluster. Hardening should follow standard OS security best practices, augmented with cluster-specific considerations.¹⁰⁶

General OS Hardening Practices:

Minimize Attack Surface: Install only the necessary Windows Server roles and features required for the cluster node's function (e.g., Failover Clustering, Hyper-V, File Server). Remove or disable any unused software, services, or protocols.¹⁰⁶ Use Server Core installation where possible to further reduce the surface area.
Patch Management: Maintain a rigorous patch management process to promptly apply security updates for the Windows Server OS, drivers, firmware, and any cluster-aware applications (like SQL Server) running on the nodes.¹⁰⁷
Least Privilege: Configure services to run under accounts with the minimum necessary privileges. Limit membership in highly privileged local groups (Administrators) and domain groups (Domain Admins, Enterprise Admins).¹⁰⁷ Use Role-Based Access Control (RBAC) principles for administrative tasks. Implement secure administrative practices, such as using dedicated administrative workstations (Secure Admin Workstations - SAWs) or Privileged Access Workstations (PAWs) for cluster management.¹⁰⁷
Configuration Management: Define and enforce security configuration baselines for cluster nodes using tools like Group Policy, Desired State Configuration (DSC), or Microsoft security baselines (e.g., from the Security Compliance Toolkit).¹⁰⁷ Regularly audit configurations for compliance and drift.
Network Security: Implement host-based firewalls on each node with strict rules allowing only necessary cluster and application traffic (as detailed in Section 5.5).¹⁰⁷
Auditing and Monitoring: Configure detailed security auditing on cluster nodes to log critical events such as successful/failed logons, account management changes, policy changes, object access (if needed), and process creation.²⁹ Forward logs to a central Security Information and Event Management (SIEM) system for analysis and alerting.
Antivirus/Antimalware: Deploy and maintain up-to-date endpoint protection software. Configure exclusions carefully to avoid interfering with cluster operations (e.g., exclude cluster storage directories like C:\ClusterStorage, quorum disk paths, and core cluster processes like ClusSvc.exe, RHS.exe).⁸

Cluster-Specific Hardening:

Secure AD Objects: Protect the CNO and any VCOs in Active Directory from accidental deletion.⁸⁸ Apply the principle of least privilege when granting permissions on these objects - only the CNO and necessary administrative accounts/groups should have modification rights.³ Regularly audit permissions on these critical objects.
Secure Quorum Witness:
- File Share Witness: Ensure the file share used as a witness has appropriate permissions (typically granting the CNO Read/Write access) and that the server hosting the share is itself hardened and highly available.¹⁹ Avoid using shares on cluster nodes themselves.
- Disk Witness: Ensure only cluster nodes have access to the witness LUN at the storage level.
- Cloud Witness: Secure the Azure storage account access key used for the cloud witness. Use mechanisms like Azure Private Link if enhanced network security is required for accessing the Azure Blob storage endpoint.
Physical Security: Secure physical access to the servers acting as cluster nodes, as well as the associated network switches and storage infrastructure.¹⁰⁷

The inherent design of WSFC involves significant implicit trust between nodes communicating over cluster networks.⁵¹ While mechanisms like NetFT signing/encryption⁵¹ and RHS isolation provide some protection, a comprehensive hardening strategy applied consistently across all nodes is essential to maintain the integrity and availability of the entire distributed system.

6. Operational Mechanics ("How it Works")

This section explains the core runtime algorithms and processes that govern how WSFC maintains availability, including quorum management, the sequence of events during a failover, resource state control, and health monitoring.

6.1. Quorum: Concept, Models, Witness, and Dynamic Quorum

Quorum is the fundamental consensus mechanism in WSFC designed to ensure cluster consistency and, critically, to prevent "split-brain" scenarios.⁴⁹ A split-brain occurs if network partitions cause different subsets of nodes to lose communication with each other, leading each subset to potentially believe it is the sole active part of the cluster. Without quorum, both subsets might try to bring the same resources online and write to the same shared storage, inevitably causing data corruption.¹⁰⁸

The quorum mechanism prevents this by requiring that a cluster maintain a "majority" of active, communicating "voting elements" to remain operational.⁴⁹ If a node or group of nodes loses communication with enough other voting elements such that they no longer constitute a majority, their Cluster Service will stop, taking their resources offline and preventing them from causing conflicts.⁴⁹ Only the partition that retains the majority of votes (maintains quorum) continues to run the clustered services.

Voting elements typically include the cluster nodes themselves (each usually gets one vote) and potentially a single external "witness" resource (which also gets one vote).⁴⁹ The specific combination of nodes and witness used to calculate the majority defines the Quorum Model (also called Quorum Configuration or Type):

Table 7: Quorum Models and Witness Types
Model Name	Voting Elements	Witness Storage	Key Requirement/Consideration	Split-Brain Prevention Mechanism
Node Majority	Nodes Only	None	Best for odd number of nodes.⁴⁹	Majority vote of active nodes.
Node and Disk Witness	Nodes + Disk Witness	Dedicated Shared Disk (LUN)	Recommended for even node count; requires SAN.⁴⁹ Not supported with S2D. Holds DB copy.¹¹	Majority vote including disk lock/state.
Node and File Share Witness	Nodes + File Share Witness	SMB File Share	Recommended for even node count or no SAN.⁴⁹ Requires reliable file server.¹⁹ Stores lock info.¹⁰⁸	Majority vote including file share lock/state.
Node and Cloud Witness	Nodes + Cloud Witness	Azure Blob Storage	Good for multi-site or no shared infrastructure.⁴⁹ Requires internet access. Stores lock info.¹⁰⁸	Majority vote including cloud blob lock/state.
No Majority (Disk Witness Only)	Disk Witness Only	Dedicated Shared Disk (LUN)	Deprecated/Rare. Creates Single Point of Failure.⁴⁹	Disk lock/state only.
References: ^{11, 19, 49, 108}

The role of the witness is primarily to act as a tiebreaker, ensuring that there is always an odd total number of voting elements in the cluster.⁴⁹ This allows the cluster to sustain the failure of exactly half its nodes (in an even-node cluster) and still maintain a majority (the remaining half of the nodes plus the witness). It increases the overall fault tolerance of the cluster.⁴⁹

Modern WSFC versions (Windows Server 2012 and later) implement Dynamic Quorum.⁴⁹ Unlike static quorum where votes are fixed, dynamic quorum allows the cluster to automatically adjust the vote assignment of nodes based on their state. If a node is gracefully shut down, its vote is typically removed immediately. If a node fails unexpectedly, its vote may be removed once the cluster confirms it is definitively down. The cluster then recalculates the required majority based on the current number of active voting members.¹⁰⁸ This adaptability allows a cluster to potentially survive sequential node failures that would have caused quorum loss in a static model, potentially allowing operation down to the "last man standing" (a single node) in some scenarios.¹⁰⁸ The current vote status of a node can be checked via the DynamicWeight property in PowerShell (Get-ClusterNode).⁴⁹

Complementing dynamic quorum, Dynamic Witness (Windows Server 2012 R2 and later) automatically adjusts whether the configured witness actually casts its vote.¹⁰⁸ The goal is always to maintain an odd total number of votes in the cluster. If the number of active voting nodes is currently odd, the witness vote is dynamically disabled. If the number of active voting nodes becomes even (due to a node failure or join), the witness vote is dynamically enabled.¹⁰⁸ This prevents the witness itself from becoming a deciding factor unnecessarily and reduces the risk of the cluster failing solely due to a witness failure when a node majority could otherwise be maintained.¹⁰⁸

While dynamic quorum and witness significantly enhance resilience to sequential or graceful node departures, they don't necessarily allow the cluster to survive the simultaneous failure of a majority of voting members.⁴⁹ The dynamic adjustments take time, and a sudden, large-scale failure might cause quorum loss before the cluster can adapt the vote count and majority requirement. This makes predicting behavior under large, simultaneous failures more complex than with a static model.

6.2. Failover Process Breakdown

Failover is the core process by which WSFC provides high availability. When a failure occurs that affects a clustered role (either a failure of the node hosting it or a failure of a critical resource within the role), the cluster automatically attempts to restart the role on another available node. This process involves several distinct stages¹³:

Failure Detection: The process begins with the detection of a failure. This can occur in two main ways:
- Node Failure: Detected when other nodes stop receiving heartbeat packets from the failed node for longer than the configured threshold (SameSubnetThreshold or CrossSubnetThreshold). This detection is managed by the NetFT.sys driver on the surviving nodes.³⁷ The cluster removes the failed node from active membership (Event 1135 logged).⁸
- Resource Failure: Detected when a resource fails its health checks. The Resource Host Subsystem (RHS.exe) periodically invokes the LooksAlive and IsAlive functions implemented within the resource's specific ResDLL.¹³ If IsAlive returns a failure status (indicating the resource is not operational), and the resource's configured restart attempts on the current node are exhausted, RHS reports the failure to the Cluster Service (RCM).¹³ For complex resources like SQL Server, IsAlive might rely on internal diagnostics like sp_server_diagnostics results or lease status checks.¹³ Event 1069 is commonly logged for resource failures.¹⁷
Resource Offline Process: Once a critical failure is confirmed (either node down or unrecoverable resource failure), the Cluster Service initiates the process of taking the affected clustered role (group) offline on the failed node (if possible) or logically marks it as offline in the cluster state.¹³ This involves instructing RHS to call the Offline or Terminate entry points in the ResDLLs for all resources within the group, respecting dependencies (dependent resources are taken offline first).
Arbitration (Ownership Transfer): The cluster must decide which healthy node should take ownership of the failed role.
- Node Selection: The Cluster Service selects a target node from the list of "Possible Owners" configured for the group, often prioritizing based on the "Preferred Owners" list or potentially using load balancing heuristics. The target node must be 'Up' and part of the current active cluster membership (maintaining quorum).
- Storage Arbitration: If the role includes shared storage resources (e.g., Physical Disks using traditional SAN LUNs), a critical arbitration step occurs. The ClusDisk.sys driver on the target node must acquire ownership of the LUN(s). This involves using SCSI-3 Persistent Reservation commands to potentially break (preempt or clear) any reservation held by the failed node and then establish a new reservation for itself.⁵⁰ Failure at this stage (e.g., due to storage misconfiguration or connectivity issues) will prevent the failover.⁹⁵ For CSV or S2D, the arbitration is software-based (CSV coordinator ownership transfer or S2D internal state).
Resource Online Process on Another Node: Once arbitration succeeds and ownership is transferred, the Cluster Service on the target node begins bringing the clustered role online.¹³ It instructs RHS to execute the Online entry point for each resource within the group, strictly following the defined dependency order (dependencies must be online first).²¹
- For an IP Address resource, Online involves adding the VIP to the local network interface and broadcasting gratuitous ARP / unsolicited Neighbor Advertisements.⁸⁴
- For a Network Name resource, Online involves registering the name in DNS against the now-active IP address.⁷⁰
- For application resources (like SQL Server), Online involves starting the associated service.⁹⁰ If any critical resource fails to come online during this process, the entire group may enter a 'Failed' state on the target node.¹⁷
Client Reconnection: Clients that were connected to the role on the failed node will experience a connection interruption. Most modern client drivers and applications have built-in connection retry logic. When they attempt to reconnect using the clustered role's virtual network name (CAP name), the following should happen:
- DNS resolution for the CAP name now points to the virtual IP address active on the new node (due to the DNS update in step 4).
- Network traffic sent to the virtual IP address is now routed to the new node's MAC address (due to the ARP/ND updates in step 4).
- Assuming the service (e.g., SQL Server) is fully online on the new node, the client's reconnection attempt succeeds.⁹⁰ The duration of the outage experienced by the client depends on the time taken for failure detection, arbitration, resource online process, and the client's own retry timers.

This multi-stage process highlights that a successful failover depends on the correct functioning of numerous components across multiple nodes, including network health detection (NetFT), resource-specific logic (ResDLL/RHS), storage arbitration mechanisms (ClusDisk/PRs or software equivalents), AD/DNS updates for network identity, and appropriate client behavior. A failure in any of these intermediate steps can prevent the service from being restored automatically.

6.3. Resource Management

WSFC manages applications and services as logical "resources" grouped into "roles" (or "groups"). The Cluster Service (specifically, the Resource Control Manager or RCM) is responsible for managing the state of these resources, enforcing dependencies between them, and applying policies for handling failures.

Resource States: Cluster resources transition through several well-defined states during their lifecycle, which can be observed using Failover Cluster Manager or PowerShell (Get-ClusterResource)¹⁴:

Table 8: WSFC Resource States
State Name	Description	Typical Triggering Event(s)	Next Possible State(s)
Offline	The resource is not running and is inactive.	Administrative Offline command, successful Offline completion, initial state, group offline.	Online Pending
Online Pending	The resource is in the process of starting; its Online entry point is being executed by RHS.	Administrative Online command, dependency met during group online.	Online, Failed
Online	The resource has successfully started, passed initial health checks, and is considered operational.	Successful Online completion.	Offline Pending, Failed
Offline Pending	The resource is in the process of stopping gracefully; its Offline entry point is being executed by RHS.	Administrative Offline command, resource failure triggering offline, group offline request.	Offline
Failed	The resource failed to complete its Online operation, or it failed its IsAlive health check and exhausted its restart attempts.	Online failure, IsAlive failure exceeding restart policy, critical dependency failure.	Offline, Online Pending (if retried)
(Implicit) Waiting	The resource is waiting for another resource it depends on to reach the Online state before it can start its own Online process.	Group online process initiated, but dependency is not yet Online.	Online Pending
References: ¹⁴

Dependencies: Dependencies define the required startup order for resources within the same cluster group (role). A resource cannot begin its Online process until all the resources it depends on have successfully reached the Online state.¹⁴ Conversely, when taking a group offline, resources are stopped in the reverse dependency order. Dependencies create a directed acyclic graph (DAG) within the group, ensuring, for example, that an IP address is online before the network name that uses it, and the network name and disks are online before the application service (like SQL Server) that requires them.²¹ Dependency relationships are critical for ensuring services start correctly and can be visualized using dependency reports.²¹ An incorrectly configured dependency can prevent a group from coming online.

Ownership and Placement Policies:

Possible Owners: Each resource group has a configured list of nodes on which it is allowed to run.⁷ A failover can only target a node listed as a possible owner.
Preferred Owners: Administrators can define an ordered list of preferred nodes for a group.⁷ During automatic failover, the cluster attempts to move the group to the highest-listed preferred owner that is currently available and a possible owner. If no preferred owners are available, it may move to any available possible owner.
AntiAffinityClassNames: A property used to ensure specified groups do not run on the same node simultaneously.

Failure and Failover Policies: These policies, configurable per resource or per group, dictate how the cluster responds to failures²²:

Restart Policy (Resource Level): Determines if the cluster should attempt to restart a resource on the same node if it enters the Failed state. Options typically include: Do not restart, Restart on failure, or Restart if resource causes group failure. A threshold often limits the number of restart attempts within a specified time period.
Failover Policy (Group Level): Defines the behavior if a resource failure cannot be resolved by restarting on the current node, or if the node itself fails. Key settings include:
- Maximum Failures in the Specified Period: Limits how many times the group can fail over within a defined time window (e.g., N-1 times in 6 hours is a common default for critical roles like SQL AGs [^{109, B32}]) before the group is left in a Failed state, requiring manual intervention.
- Failback Policy: Determines if and when the group should automatically move back to its preferred owner node after that node becomes available following a failure. Options include preventing failback, failing back immediately, or failing back within a specified time window.²²

Works Cited

Remove clussvc.exe Virus - Malware Search Engine. https://file-intelligence.comodo.com/windows-process-virus-malware/exe/clussvc
Windows Server Failover Cluster with SQL Server - SQL Server Always On | Microsoft Learn. https://learn.microsoft.com/en-us/sql/sql-server/failover-clusters/windows/windows-server-failover-clustering-wsfc-with-sql-server?view=sql-server-ver16
Configuring cluster accounts in Active Directory - Learn Microsoft. https://learn.microsoft.com/en-us/windows-server/failover-clustering/configure-ad-accounts
Cluster Service Stops Responding - Windows Server - Learn Microsoft. https://learn.microsoft.com/en-us/troubleshoot/windows-server/high-availability/cluster-service-stops-responding-a-cluster-node
Failover cluster maintenance procedures for Azure Stack HCI and Windows Server. https://docs.azure.cn/en-us/azure-local/manage/maintain-servers
sys.dm_os_cluster_nodes (Transact-SQL) - SQL Server | Microsoft Learn. https://learn.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-os-cluster-nodes-transact-sql?view=sql-server-ver16
Query on Failover - Microsoft Q&A. https://learn.microsoft.com/en-us/answers/questions/1338637/query-on-failover
Windows 2016 Failover Cluster Node Down - Microsoft Q&A. https://learn.microsoft.com/en-us/answers/questions/1289407/windows-2016-failover-cluster-node-down
The Cluster service is shutting down because quorum was lost - Learn Microsoft. https://learn.microsoft.com/en-us/answers/questions/799622/the-cluster-service-is-shutting-down-because-quoru
Windows Server Failover Clustering (ClusDb) Backup and Recovery. https://www.altaro.com/backup-dr/clusdb-backup-recovery/
The Cluster and 0.Cluster Registry Hives - Working Hard In IT. https://blog.workinghardinit.work/2016/03/29/the-cluster-and-0-cluster-registry-hives/
What's New in Failover Clustering in Windows Server | Microsoft Learn. https://learn.microsoft.com/en-us/previous-versions/windows/it-pro/windows-server-2012-r2-and-2012/dn265972(v=ws.11)
Failover policy for failover cluster instances - SQL Server Always On ... https://learn.microsoft.com/en-us/sql/sql-server/failover-clusters/windows/failover-policy-for-failover-cluster-instances?view=sql-server-ver16
Cluster group that has dependent resources does not fail over on a Windows Server-based computer - Microsoft Support. https://support.microsoft.com/en-us/topic/cluster-group-that-has-dependent-resources-does-not-fail-over-on-a-windows-server-based-computer-62bbed0f-e7e4-fa6b-d717-60b2d396f64e
WSFC cluster service is offline - SQL Server Always On - Learn Microsoft. https://learn.microsoft.com/en-us/sql/database-engine/availability-groups/windows/wsfc-cluster-service-is-offline?view=sql-server-ver16
Always On Availability group is offline - SQL Server - Learn Microsoft. https://learn.microsoft.com/en-us/sql/database-engine/availability-groups/windows/availability-group-is-offline?view=sql-server-ver16
Can't bring a clustered resource online troubleshooting guidance - Windows Server. https://learn.microsoft.com/en-us/troubleshoot/windows-server/high-availability/troubleshoot-cannot-bring-resource-online-guidance
Can't bring an IP address online in a failover cluster - Learn Microsoft. https://learn.microsoft.com/en-us/troubleshoot/windows-server/high-availability/troubleshoot-cannot-bring-ip-address-online
WSFC Fixing Errors for AlwaysOn Availability Group or AlwaysOn Failover Cluster Instances (SQL Clustering) - Virtual-DBA. https://virtual-dba.com/blog/wsfc-fixing-errors-for-alwayson-availability-group-or-failover-cluster-instances/
Cannot add SQL Server Always On Availability Group Listener to second subnet - Server Fault. https://serverfault.com/questions/965897/cannot-add-sql-server-always-on-availability-group-listener-to-second-subnet
Exploring The Windows Server Failover Cluster Dependency Report. https://learnsqlserverhadr.com/exploring-the-cluster-dependency-report/
Working with Roles in Failover Cluster Manager - Altaro. https://www.altaro.com/hyper-v/failover-cluster-manager/roles/
Failover Clustering system log events - Windows Server - Learn Microsoft. https://learn.microsoft.com/en-us/previous-versions/troubleshoot/windows-server/failover-clustering-system-log-events
Generate & analyze CLUSTER.LOG for availability groups - SQL... https://learn.microsoft.com/en-us/sql/database-engine/availability-groups/windows/cluster-log-always-on-availability-groups?view=sql-server-ver16
OpenCluster function (clusapi.h) - Win32 apps | Microsoft Learn. https://learn.microsoft.com/en-us/windows/win32/api/clusapi/nf-clusapi-opencluster
Windows Server Failover Clustering | NXLog Documentation. https://docs.nxlog.co/integrate/windows-server-failover-clustering.html
Windows Event Logs Ingestion Overview - LogicMonitor. https://www.logicmonitor.com/support/lm-logs/windows-event-logs-ingestion-overview
Decoding Windows event logs: A definitive guide for incident responders - HackTheBox. https://www.hackthebox.com/blog/decoding-windows-event-logs-a-definitive-guide-for-incident-responders
Windows Security Event Log Best Practices - Graylog. https://graylog.org/post/windows-security-event-log-best-practices/
Configure added LSA protection | Microsoft Learn. https://learn.microsoft.com/en-us/windows-server/security/credentials-protection-and-management/configuring-additional-lsa-protection
How to troubleshoot high Lsass.exe CPU utilization on Active Directory Domain Controllers. https://learn.microsoft.com/en-us/troubleshoot/windows-server/active-directory/troubleshoot-high-lsass.exe-cpu-utilization
Create a failover cluster | Microsoft Learn. https://learn.microsoft.com/en-us/windows-server/failover-clustering/create-failover-cluster
Credential Dumping: Windows Authentication and Credential Management - ReliaQuest. https://reliaquest.com/blog/credential-dumping-part-1-a-closer-look-at-vulnerabilities-with-windows-authentication-and-credential-management/
Service overview and network port requirements - Windows Server | Microsoft Learn. https://learn.microsoft.com/en-us/troubleshoot/windows-server/networking/service-overview-and-network-port-requirements
The Resource Hosting Subsystem (Rhs.exe) process stops unexpectedly when you start a cluster resource in Windows Server 2008 R2 - Microsoft Support. https://support.microsoft.com/en-us/topic/the-resource-hosting-subsystem-rhs-exe-process-stops-unexpectedly-when-you-start-a-cluster-resource-in-windows-server-2008-r2-01e87c84-8cdd-d725-d8be-0c7cd6fe63d2
Microsoft Windows Server 2012 - 2016 Failover Cluster - SolarWinds Documentation. https://documentation.solarwinds.com/en/success_center/sam/content/sam-microsoft-windows-server-2008-r2-2012-r2-failover-cluster-sw5637.htm
Availability group lease health check timeout - SQL Server Always ... https://learn.microsoft.com/en-us/sql/database-engine/availability-groups/windows/availability-group-lease-healthcheck-timeout?view=sql-server-ver16
Resource DLL Functions | Microsoft Learn. https://learn.microsoft.com/en-us/previous-versions/windows/desktop/mscs/resource-dll-functions
Resource DLLs | Microsoft Learn. https://learn.microsoft.com/en-us/previous-versions/windows/desktop/mscs/resource-dlls
Manually re-create registry keys for cluster resources - SQL Server ... https://learn.microsoft.com/en-us/troubleshoot/sql/database-engine/failover-clusters/manually-re-create-resource-specific-registry-keys
Failover Cluster Functions | Microsoft Learn. https://learn.microsoft.com/en-us/previous-versions/windows/desktop/mscs/failover-cluster-functions
looksAlive and isAlive polling on MSCS - IBM. https://www.ibm.com/docs/en/ibm-mq/9.2?topic=mscs-looksalive-isalive-polling
Windows 10 DLL File Information - clusapi.dll - NirSoft. https://windows10dll.nirsoft.net/clusapi_dll.html
clusapi.dll free download - DLL-files.com. https://www.dll-files.com/clusapi.dll.html
Win Server 2022 Failover Cluster - Dead, can I remove all traces - Microsoft Community. https://answers.microsoft.com/en-us/windowserver/forum/all/win-server-2022-failover-cluster-dead-can-i-remove/4a30f0ed-4ead-4801-a71f-b07e09cbc85b
sql cluster not failing over. - Microsoft Q&A. https://learn.microsoft.com/en-us/answers/questions/530162/sql-cluster-not-failing-over
AG failure - Microsoft Tech Community. https://techcommunity.microsoft.com/blog/sqlserversupport/what-is-causing-the-always-on-ag-issue-is-it-cluster-ad-dns-or-sql/3781656
Exchange 2019 Cluster node is in Quarantine Status - Microsoft Q&A. https://learn.microsoft.com/en-us/answers/questions/603422/exchange-2019-cluster-node-is-in-quarantine-status
What is a failover cluster quorum witness in Windows Server ... https://learn.microsoft.com/en-us/windows-server/failover-clustering/what-is-quorum-witness?source=recommendations
Cluster service reserves and brings online disks - Windows Server ... https://learn.microsoft.com/en-us/previous-versions/troubleshoot/windows-server/cluster-service-reserves-brings-disk
Failover Clustering Networking Basics and Fundamentals. https://techcommunity.microsoft.com/blog/itopstalkblog/failover-clustering-networking-basics-and-fundamentals/1472460
What is a Microsoft Failover Cluster Virtual Adapter anyway? - Ask the Core Team. https://timon47.rssing.com/chan-5644360/all_p1.html
Understanding the state of your Cluster Shared Volumes | Microsoft ... https://techcommunity.microsoft.com/blog/failoverclustering/understanding-the-state-of-your-cluster-shared-volumes/371889
Use Cluster Shared Volumes in a Failover Cluster - Learn Microsoft. https://learn.microsoft.com/en-us/previous-versions/windows/it-pro/windows-server-2012-r2-and-2012/jj612868(v=ws.11)
What are SCSI Reservations and SCSI Persistent Reservations? - NetApp Knowledge Base. https://kb.netapp.com/on-prem/ontap/da/SAN/SAN-KBs/What_are_SCSI_Reservations_and_SCSI_Persistent_Reservations
NetFT Virtual Adapter Performance Filter | Microsoft Community Hub. https://techcommunity.microsoft.com/blog/failoverclustering/netft-virtual-adapter-performance-filter/372090
Intermittently failover of my SQL Server resources on Windows Server 2016 - Reddit. https://www.reddit.com/r/SQLServer/comments/qievvk/intermittently_failover_of_my_sql_server/
Failover Clustering Networking Basics and Fundamentals | Microsoft Community Hub. https://techcommunity.microsoft.com/blog/failoverclustering/failover-clustering-networking-basics-and-fundamentals/1706005
Recommended private heartbeat configuration on a cluster server - Learn Microsoft. https://learn.microsoft.com/en-us/troubleshoot/windows-server/high-availability/private-heartbeat-configuration-on-cluster-server
Tuning failover cluster network thresholds - Windows Server - Learn Microsoft. https://learn.microsoft.com/en-us/troubleshoot/windows-server/high-availability/iaas-sql-failover-cluster-network-thresholds
Tuning Failover Cluster Network Thresholds - Microsoft Community Hub. https://techcommunity.microsoft.com/blog/failoverclustering/tuning-failover-cluster-network-thresholds/371834
Troubleshooting a Failover Cluster using Windows Error Reporting | Microsoft Learn. https://learn.microsoft.com/en-us/windows-server/failover-clustering/troubleshooting-using-wer-reports
Cluster Shared Volume Performance Counters | Microsoft ... https://techcommunity.microsoft.com/blog/failoverclustering/cluster-shared-volume-performance-counters/371980
Hosting Windows Server Failover Cluster (WSFC) with shared disks on VMware vSphere: Doing it right!. https://blogs.vmware.com/apps/2019/05/wsfc-on-vsphere.html
About Services - Win32 apps | Microsoft Learn. https://learn.microsoft.com/en-us/windows/win32/services/about-services
Service control manager - Win32 apps | Microsoft Learn. https://learn.microsoft.com/en-us/windows/win32/services/service-control-manager
Microsoft Service Control Manager - OpenText. https://www.microfocus.com/documentation/arcsight/arcsight-smartconnectors-8.4/smartconnector-for-windows-event-log-native/Content/ms-security-control-manager/ms-secufity-control-manager.htm?TocPath=Configuring%20Log%20sources%7C__________12
Service Control Manager - Wikipedia. https://en.wikipedia.org/wiki/Service_Control_Manager
[MS-SCMR]: Overview - Learn Microsoft. https://learn.microsoft.com/en-us/openspecs/windows_protocols/ms-scmr/d5bd5712-fa64-44bf-9433-3651f6a5ce97
Can't bring a network name online in a cluster - Windows Server ... https://learn.microsoft.com/en-us/troubleshoot/windows-server/high-availability/troubleshoot-cannot-bring-network-name-online
Troubleshoot Cluster service account - Windows Server | Microsoft ... https://learn.microsoft.com/en-us/troubleshoot/windows-server/high-availability/troubleshoot-cluster-service-account
Use Kerberos authentication with Service Principal Name (SPN ... https://docs.azure.cn/en-us/azure-local/manage/kerberos-with-spn
NTLM and Kerberos Authentication - .NET Framework - Learn Microsoft. https://learn.microsoft.com/en-us/dotnet/framework/network-programming/ntlm-and-kerberos-authentication
Introduction to Plug and Play - Windows drivers | Microsoft Learn. https://learn.microsoft.com/en-us/windows-hardware/drivers/kernel/introduction-to-plug-and-play
Plug and Play Manager - Windows drivers | Microsoft Learn. https://learn.microsoft.com/en-us/windows-hardware/drivers/install/pnp-manager
Windows Kernel-Mode Plug and Play Manager - Learn Microsoft. https://learn.microsoft.com/en-us/windows-hardware/drivers/kernel/windows-kernel-mode-plug-and-play-manager
Understanding the Windows I/O System | Microsoft Press Store. https://www.microsoftpressstore.com/articles/article.aspx?p=2201309&seqNum=6
Hyper-V Step by Step: Windows Server Failover Clustering with SAN Environment - TechNet Articles - GitHub Pages. https://technet2.github.io/Wiki/articles/39987.hyper-v-step-by-step-windows-server-failover-clustering-with-san-environment.html
Firewall requirements for Azure Local - Azure Local | Microsoft Learn. https://learn.microsoft.com/en-us/azure/azure-local/concepts/firewall-requirements?view=azloc-2503
What are the port requirements for Distributed Availability Group (DAG)? - Learn Microsoft. https://learn.microsoft.com/en-us/answers/questions/2084386/what-are-the-port-requirements-for-distributed-ava
Microsoft Windows Server Failover Clustering (WSFC) with shared disks on VMware vSphere 7.x: Guidelines for supported configurations. https://knowledge.broadcom.com/external/article?legacyId=79616
Microsoft Windows Server Failover Clustering (WSFC) with shared disks on VMware vSphere 7.x: Guidelines for supported configurations. https://knowledge.broadcom.com/external/article/313230
best practice on Cluster network - Microsoft Q&A. https://learn.microsoft.com/en-us/answers/questions/544299/best-practice-on-cluster-network
Clustering information on IP address failover - Windows Server ... https://learn.microsoft.com/en-us/previous-versions/troubleshoot/windows-server/cluster-information-ip-address-failover
Running Windows Server Failover Clustering | Compute Engine Documentation. https://cloud.google.com/compute/docs/tutorials/running-windows-server-failover-clustering
Why We Need To Understand How Active Directory Affects SQL Server High Availability. https://www.edwinmsarmiento.com/why-we-need-to-understand-how-active-directory-affects-sql-server-high-availability/
Configure availability group listener - SQL Server Always On | Microsoft Learn. https://learn.microsoft.com/en-us/sql/database-engine/availability-groups/windows/create-or-configure-an-availability-group-listener-sql-server?view=sql-server-ver16
Prestage cluster computer objects in Active Directory Domain Services - Learn Microsoft. https://learn.microsoft.com/en-us/windows-server/failover-clustering/prestage-cluster-adds
Use Cluster Shared Volumes in a failover cluster | Microsoft Learn. https://learn.microsoft.com/en-us/windows-server/failover-clustering/failover-cluster-csvs
Always On failover cluster instances (SQL Server) - Learn Microsoft. https://learn.microsoft.com/en-us/sql/sql-server/failover-clusters/windows/always-on-failover-cluster-instances-sql-server?view=sql-server-ver16
Storage Spaces Direct overview | Microsoft Learn. https://learn.microsoft.com/en-us/windows-server/storage/storage-spaces/storage-spaces-direct-overview
Failover Clustering in Windows Server and Azure Local - Learn Microsoft. https://learn.microsoft.com/nl-be/windows-server/failover-clustering/failover-clustering-overview
Failover Clustering - Learn Microsoft. https://learn.microsoft.com/en-us/windows-server/failover-clustering/failover-clustering-overview
Troubleshooting Windows Cluster - High Availability (Clustering) forum. https://clustering72.rssing.com/chan-15188065/all_p192.html
Persistent reservation, windows 2003 sp2 | DELL Technologies. https://www.dell.com/community/en/conversations/microsoft/persistent-reservation-windows-2003-sp2/647f280ef4ccf8a8dec9b23a?commentId=647f62ddf4ccf8a8deeebd5c
Microsoft Windows Server Failover Cluster Validation Fails: Disk 0 Does Not Support SCSI-3 Persistent Reservations for Storage Spaces Subsystem. https://knowledge.broadcom.com/external/article/387269/microsoft-windows-server-failover-cluste.html
Understanding the storage pool cache in Azure Stack HCI and Windows Server clusters. https://learn.microsoft.com/en-us/windows-server/storage/storage-spaces/cache
Deploy Storage Spaces Direct on Windows Server - Learn Microsoft. https://learn.microsoft.com/en-us/windows-server/storage/storage-spaces/deploy-storage-spaces-direct
Monitor clusters with the Health Service - Azure Local | Microsoft Learn. https://learn.microsoft.com/en-us/azure/azure-local/manage/health-service-overview?view=azloc-24113
Modify Health Service settings - Azure Local | Microsoft Learn. https://learn.microsoft.com/en-us/azure/azure-local/manage/health-service-settings?view=azloc-24113
Set-Acl (Microsoft.PowerShell.Security). https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.security/set-acl?view=powershell-7.5
Get-Acl (Microsoft.PowerShell.Security). https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.security/get-acl?view=powershell-7.5
Access control lists - Win32 apps - Learn Microsoft. https://learn.microsoft.com/en-us/windows/win32/secauthz/access-control-lists
Automate CNOs and VCOs for SQL Server AAG - dbi services. https://www.dbi-services.com/blog/automate-cnos-and-vcos-for-sql-server-aag/
Security Descriptor | Microsoft Learn. https://learn.microsoft.com/en-us/previous-versions/windows/desktop/mscs/file-shares-security-descriptor
Recommendations for hardening resources - Microsoft Azure Well ... https://learn.microsoft.com/en-us/azure/well-architected/security/harden-resources
Best Practices for Securing Active Directory | Microsoft Learn. https://learn.microsoft.com/en-us/windows-server/identity/ad-ds/plan/security-best-practices/best-practices-for-securing-active-directory
Understand cluster and pool quorum on Azure Stack HCI and ... https://learn.microsoft.com/en-us/windows-server/storage/storage-spaces/quorum
Configure a flexible automatic failover policy for an availability group ... https://learn.microsoft.com/en-us/sql/database-engine/availability-groups/windows/configure-flexible-automatic-failover-policy?view=sql-server-ver16
Windows Server Failover Clustering for SQL Server Availability Groups, accessed April 21, 2025, https://www.starwindsoftware.com/resource-library/windows-server-failover-clustering-for-sql-server-availability-groups/

Windows Server Failover Clustering (WSFC): A Software Architecture Deep Dive