Monitor Windows Servers with Prometheus and Grafana

Introduction

Monitoring Windows servers in production requires a rigorous and methodical approach. Monitoring tools like Prometheus and Grafana are powerful, but their configuration must be precise to provide long-term reliability.

In this article, we describe how to set up a complete monitoring stack for Windows Server, including windows_exporter, Prometheus, Grafana, and Alertmanager. We will explore each layer to ensure clear and functional monitoring.

Production architecture for monitoring Windows servers

Understanding component topology

Before starting, it is essential to conceptualize the architecture of the monitoring system. A typical configuration includes the following elements:

windows_exporter: Exposes Windows server metrics via HTTP.
Prometheus: Retrieves and stores data and executes alert rules.
Grafana: Displays metrics through visual dashboards.
Alertmanager: Groups and sends notifications according to defined rules.

[IMAGE:index:images/windows-monitoring-architecture.svg:Monitoring architecture for Windows]

Good to know

Each layer must validate the previous layer to ensure consistency between data extraction, storage, visualization, and generated alerts.

Prerequisites

To follow this guide, you will need:

Windows Server 2019 or a later version on the monitored machines.
A functioning Prometheus deployment accessible by Windows servers.
A Grafana environment with administrative or provisioning rights.
Network connectivity between Prometheus and TCP port 9182 on Windows servers.
Alertmanager for notification management if you want to use alerts.

Install windows_exporter on Windows servers

Initial configuration

The windows_exporter tool ensures the conversion of system data into metrics that can be used by Prometheus. Start by installing the MSI package with a configuration adapted to your environment.

Installation command

Use the following script to install the exporter with a minimal set of collectors:

⚡PowerShell

1$msi = "C:\Temp\windows_exporter.msi"
2$collectors = "cpu,memory,logical_disk,net,os,physical_disk,service,system"
3 
4Start-Process msiexec.exe -Wait -ArgumentList @(
5  "/i", $msi,
6  "/qn",
7  "ENABLED_COLLECTORS=$collectors",
8  "LISTEN_PORT=9182",
9  "ADDLOCAL=FirewallException"
10)

This configures port 9182 to expose default metrics. If the server hosts specific roles, add the corresponding collectors under ENABLED_COLLECTORS.

Warning

Never expose port 9182 to public networks. Restrict access to the port through firewall rules.

Secure access to the metrics endpoint

Add firewall exceptions

If you already know the IP addresses of your Prometheus scrapers, configure them directly during installation:

⚡PowerShell

1Start-Process msiexec.exe -Wait -ArgumentList @(
2  "/i", "C:\Temp\windows_exporter.msi",
3  "/qn",
4  "ENABLED_COLLECTORS=cpu,memory,logical_disk,net,os,physical_disk,service,system",
5  "LISTEN_PORT=9182",
6  "REMOTE_ADDR=10.20.0.15",
7  "ADDLOCAL=FirewallException"
8)

Then verify the status of the metrics endpoint from the Prometheus server:

⚡PowerShell

1Test-NetConnection -ComputerName win01.contoso.local -Port 9182
2curl.exe -s http://win01.contoso.local:9182/metrics | Select-String "windows_exporter_build_info"

The test command confirms that:

Port 9182 is accessible.
Metrics actually come from windows_exporter.

Configure Prometheus to scrape Windows targets

Create a stable configuration

The job configuration in Prometheus determines the data retrieved and its labels. Here is an example:

📄YAML

1global:
2  scrape_interval: 30s
3 
4scrape_configs:
5  - job_name: windows-server
6    scrape_timeout: 10s
7    static_configs:
8      - targets:
9          - win01.contoso.local:9182
10          - win02.contoso.local:9182
11        labels:
12          environment: prod
13          role: app
14          site: denver

Before restarting Prometheus, verify the validity of the configuration file:

>_Bash

1promtool check config C:\Prometheus\prometheus.yml

Tip

Add labels like environment, role, and site to avoid duplicating dashboards or alert rules.

Add Grafana and import dashboards

Provision the data source

The definition of the data source in Grafana can be automated:

📄YAML

1apiVersion: 1
2 
3datasources:
4  - name: Prometheus
5    type: prometheus
6    access: proxy
7    url: http://prometheus.contoso.local:9090
8    isDefault: true

Create custom dashboards

To start, focus on these three main views:

Server health: CPU, memory, disk, and network.
Service status: Monitor only critical services.
Fleet overview: Instance labels and overall status.

Recommend saving dashboards as JSON in a version control manager.

Configure alert rules in production

Example rules

The following rules monitor critical anomalies such as exporter unavailability:

📄YAML

1groups:
2  - name: windows-server.rules
3    rules:
4      - alert: WindowsExporterDown
5        expr: up{job="windows-server"} == 0
6        for: 5m
7        labels:
8          severity: critical
9        annotations:
10          summary: "windows_exporter is down on {{ $labels.instance }}"
11          description: "Prometheus has not retrieved {{ $labels.instance }} for five minutes."

Routing in Alertmanager

Configure Alertmanager to route critical alerts to the right team:

📄YAML

1route:
2  receiver: operations
3  group_by: ["alertname", "instance", "job"]
4  routes:
5    - matchers:
6        - severity="critical"
7      receiver: pager
8receivers:
9  - name: operations
10    email_configs:
11      - to: [email protected]
12  - name: pager
13    pagerduty_configs:
14      - routing_key: REDACTED

Before any modifications, validate these rules:

>_Bash

1promtool check rules C:\Prometheus\rules\windows-server.rules.yml

Troubleshooting common issues

Use the list below to identify and correct the most frequent failures:

Symptom	Probable cause	Solution
Target unreachable	Port 9182 blocked or incorrect hostname	Fix firewall or DNS rules
Empty dashboard	Incorrect data source or wrong label	Check data source and labels
Missing metrics	Collector not enabled	Reinstall with the correct collector set
Noisy alerts	Tight thresholds or missing grouping	Add appropriate grouping keys

Production operations checklist

To ensure the continued reliability of your monitoring stack, follow these steps:

Fix the version of windows_exporter to avoid unexpected changes.
Review collectors with each server role change.
Standardize labels in Prometheus for analysis consistency.
Monitor retention and cardinality of metrics in Prometheus.
Refine the alert list each month to avoid unnecessary noise.
Back up dashboard and job JSON configurations in version control.

Important

The reliability of your monitoring structure depends on your ability to detect and fix failures quickly. Clear organization is essential.

Conclusion

Monitoring Windows servers in production requires rigor in the installation and configuration of tools. With windows_exporter, Prometheus, Grafana, and Alertmanager, you can build a reliable stack that alerts you effectively. Follow these best practices to transform your Windows servers into measurable and transparent systems.

Introduction

Production architecture for monitoring Windows servers

Understanding component topology

Before starting, it is essential to conceptualize the architecture of the monitoring system. A typical configuration includes the following elements:

windows_exporter: Exposes Windows server metrics via HTTP.
Prometheus: Retrieves and stores data and executes alert rules.
Grafana: Displays metrics through visual dashboards.
Alertmanager: Groups and sends notifications according to defined rules.

[IMAGE:index:images/windows-monitoring-architecture.svg:Monitoring architecture for Windows]

Good to know

Each layer must validate the previous layer to ensure consistency between data extraction, storage, visualization, and generated alerts.

Prerequisites

To follow this guide, you will need:

Windows Server 2019 or a later version on the monitored machines.
A functioning Prometheus deployment accessible by Windows servers.
A Grafana environment with administrative or provisioning rights.
Network connectivity between Prometheus and TCP port 9182 on Windows servers.
Alertmanager for notification management if you want to use alerts.

Install windows_exporter on Windows servers

Initial configuration

The windows_exporter tool ensures the conversion of system data into metrics that can be used by Prometheus. Start by installing the MSI package with a configuration adapted to your environment.

Installation command

Use the following script to install the exporter with a minimal set of collectors:

⚡PowerShell

1$msi = "C:\Temp\windows_exporter.msi"
2$collectors = "cpu,memory,logical_disk,net,os,physical_disk,service,system"
3 
4Start-Process msiexec.exe -Wait -ArgumentList @(
5  "/i", $msi,
6  "/qn",
7  "ENABLED_COLLECTORS=$collectors",
8  "LISTEN_PORT=9182",
9  "ADDLOCAL=FirewallException"
10)

This configures port 9182 to expose default metrics. If the server hosts specific roles, add the corresponding collectors under ENABLED_COLLECTORS.

Warning

Never expose port 9182 to public networks. Restrict access to the port through firewall rules.

Secure access to the metrics endpoint

Add firewall exceptions

If you already know the IP addresses of your Prometheus scrapers, configure them directly during installation:

⚡PowerShell

1Start-Process msiexec.exe -Wait -ArgumentList @(
2  "/i", "C:\Temp\windows_exporter.msi",
3  "/qn",
4  "ENABLED_COLLECTORS=cpu,memory,logical_disk,net,os,physical_disk,service,system",
5  "LISTEN_PORT=9182",
6  "REMOTE_ADDR=10.20.0.15",
7  "ADDLOCAL=FirewallException"
8)

Then verify the status of the metrics endpoint from the Prometheus server:

⚡PowerShell

1Test-NetConnection -ComputerName win01.contoso.local -Port 9182
2curl.exe -s http://win01.contoso.local:9182/metrics | Select-String "windows_exporter_build_info"

The test command confirms that:

Port 9182 is accessible.
Metrics actually come from windows_exporter.

Configure Prometheus to scrape Windows targets

Create a stable configuration

The job configuration in Prometheus determines the data retrieved and its labels. Here is an example:

📄YAML

1global:
2  scrape_interval: 30s
3 
4scrape_configs:
5  - job_name: windows-server
6    scrape_timeout: 10s
7    static_configs:
8      - targets:
9          - win01.contoso.local:9182
10          - win02.contoso.local:9182
11        labels:
12          environment: prod
13          role: app
14          site: denver

Before restarting Prometheus, verify the validity of the configuration file:

>_Bash

1promtool check config C:\Prometheus\prometheus.yml

Tip

Add labels like environment, role, and site to avoid duplicating dashboards or alert rules.

Add Grafana and import dashboards

Provision the data source

The definition of the data source in Grafana can be automated:

📄YAML

1apiVersion: 1
2 
3datasources:
4  - name: Prometheus
5    type: prometheus
6    access: proxy
7    url: http://prometheus.contoso.local:9090
8    isDefault: true

Create custom dashboards

To start, focus on these three main views:

Server health: CPU, memory, disk, and network.
Service status: Monitor only critical services.
Fleet overview: Instance labels and overall status.

Recommend saving dashboards as JSON in a version control manager.

Configure alert rules in production

Example rules

The following rules monitor critical anomalies such as exporter unavailability:

📄YAML

1groups:
2  - name: windows-server.rules
3    rules:
4      - alert: WindowsExporterDown
5        expr: up{job="windows-server"} == 0
6        for: 5m
7        labels:
8          severity: critical
9        annotations:
10          summary: "windows_exporter is down on {{ $labels.instance }}"
11          description: "Prometheus has not retrieved {{ $labels.instance }} for five minutes."

Routing in Alertmanager

Configure Alertmanager to route critical alerts to the right team:

📄YAML

1route:
2  receiver: operations
3  group_by: ["alertname", "instance", "job"]
4  routes:
5    - matchers:
6        - severity="critical"
7      receiver: pager
8receivers:
9  - name: operations
10    email_configs:
11      - to: [email protected]
12  - name: pager
13    pagerduty_configs:
14      - routing_key: REDACTED

Before any modifications, validate these rules:

>_Bash

1promtool check rules C:\Prometheus\rules\windows-server.rules.yml

Troubleshooting common issues

Use the list below to identify and correct the most frequent failures:

Symptom	Probable cause	Solution
Target unreachable	Port 9182 blocked or incorrect hostname	Fix firewall or DNS rules
Empty dashboard	Incorrect data source or wrong label	Check data source and labels
Missing metrics	Collector not enabled	Reinstall with the correct collector set
Noisy alerts	Tight thresholds or missing grouping	Add appropriate grouping keys

Production operations checklist

To ensure the continued reliability of your monitoring stack, follow these steps:

Fix the version of windows_exporter to avoid unexpected changes.
Review collectors with each server role change.
Standardize labels in Prometheus for analysis consistency.
Monitor retention and cardinality of metrics in Prometheus.
Refine the alert list each month to avoid unnecessary noise.
Back up dashboard and job JSON configurations in version control.

Important

The reliability of your monitoring structure depends on your ability to detect and fix failures quickly. Clear organization is essential.

Monitor Windows Servers with Prometheus and Grafana

Houssem MAKHLOUF

Related articles

Microsoft Purview: Optimize Data Lifecycle Management

New Microsoft 365 Security Adoption Model

Accelerating the Patching Process: Five Eyes Priorities

Monitor Windows Servers with Prometheus and Grafana

Houssem MAKHLOUF

Related articles

Microsoft Purview: Optimize Data Lifecycle Management

New Microsoft 365 Security Adoption Model

Accelerating the Patching Process: Five Eyes Priorities