Introduction
Monitoring Windows servers in production requires a rigorous and methodical approach. Monitoring tools like Prometheus and Grafana are powerful, but their configuration must be precise to provide long-term reliability.
In this article, we describe how to set up a complete monitoring stack for Windows Server, including windows_exporter, Prometheus, Grafana, and Alertmanager. We will explore each layer to ensure clear and functional monitoring.
Production architecture for monitoring Windows servers
Understanding component topology
Before starting, it is essential to conceptualize the architecture of the monitoring system. A typical configuration includes the following elements:
- windows_exporter: Exposes Windows server metrics via HTTP.
- Prometheus: Retrieves and stores data and executes alert rules.
- Grafana: Displays metrics through visual dashboards.
- Alertmanager: Groups and sends notifications according to defined rules.
[IMAGE:index:images/windows-monitoring-architecture.svg:Monitoring architecture for Windows]
Good to know
Each layer must validate the previous layer to ensure consistency between data extraction, storage, visualization, and generated alerts.
Prerequisites
To follow this guide, you will need:
- Windows Server 2019 or a later version on the monitored machines.
- A functioning Prometheus deployment accessible by Windows servers.
- A Grafana environment with administrative or provisioning rights.
- Network connectivity between Prometheus and TCP port 9182 on Windows servers.
- Alertmanager for notification management if you want to use alerts.
Install windows_exporter on Windows servers
Initial configuration
The windows_exporter tool ensures the conversion of system data into metrics that can be used by Prometheus. Start by installing the MSI package with a configuration adapted to your environment.
Installation command
Use the following script to install the exporter with a minimal set of collectors:
1$msi = "C:\Temp\windows_exporter.msi"2$collectors = "cpu,memory,logical_disk,net,os,physical_disk,service,system"3 4Start-Process msiexec.exe -Wait -ArgumentList @(5 "/i", $msi,6 "/qn",7 "ENABLED_COLLECTORS=$collectors",8 "LISTEN_PORT=9182",9 "ADDLOCAL=FirewallException"10)This configures port 9182 to expose default metrics. If the server hosts specific roles, add the corresponding collectors under ENABLED_COLLECTORS.
Warning
Never expose port 9182 to public networks. Restrict access to the port through firewall rules.
Secure access to the metrics endpoint
Add firewall exceptions
If you already know the IP addresses of your Prometheus scrapers, configure them directly during installation:
1Start-Process msiexec.exe -Wait -ArgumentList @(2 "/i", "C:\Temp\windows_exporter.msi",3 "/qn",4 "ENABLED_COLLECTORS=cpu,memory,logical_disk,net,os,physical_disk,service,system",5 "LISTEN_PORT=9182",6 "REMOTE_ADDR=10.20.0.15",7 "ADDLOCAL=FirewallException"8)Then verify the status of the metrics endpoint from the Prometheus server:
1Test-NetConnection -ComputerName win01.contoso.local -Port 91822curl.exe -s http://win01.contoso.local:9182/metrics | Select-String "windows_exporter_build_info"The test command confirms that:
- Port 9182 is accessible.
- Metrics actually come from windows_exporter.
Configure Prometheus to scrape Windows targets
Create a stable configuration
The job configuration in Prometheus determines the data retrieved and its labels. Here is an example:
1global:2 scrape_interval: 30s3 4scrape_configs:5 - job_name: windows-server6 scrape_timeout: 10s7 static_configs:8 - targets:9 - win01.contoso.local:918210 - win02.contoso.local:918211 labels:12 environment: prod13 role: app14 site: denverBefore restarting Prometheus, verify the validity of the configuration file:
1promtool check config C:\Prometheus\prometheus.ymlTip
Add labels like environment, role, and site to avoid duplicating dashboards or alert rules.
Add Grafana and import dashboards
Provision the data source
The definition of the data source in Grafana can be automated:
1apiVersion: 12 3datasources:4 - name: Prometheus5 type: prometheus6 access: proxy7 url: http://prometheus.contoso.local:90908 isDefault: trueCreate custom dashboards
To start, focus on these three main views:
- Server health: CPU, memory, disk, and network.
- Service status: Monitor only critical services.
- Fleet overview: Instance labels and overall status.
Recommend saving dashboards as JSON in a version control manager.
Configure alert rules in production
Example rules
The following rules monitor critical anomalies such as exporter unavailability:
1groups:2 - name: windows-server.rules3 rules:4 - alert: WindowsExporterDown5 expr: up{job="windows-server"} == 06 for: 5m7 labels:8 severity: critical9 annotations:10 summary: "windows_exporter is down on {{ $labels.instance }}"11 description: "Prometheus has not retrieved {{ $labels.instance }} for five minutes."Routing in Alertmanager
Configure Alertmanager to route critical alerts to the right team:
1route:2 receiver: operations3 group_by: ["alertname", "instance", "job"]4 routes:5 - matchers:6 - severity="critical"7 receiver: pager8receivers:9 - name: operations10 email_configs:11 - to: [email protected]12 - name: pager13 pagerduty_configs:14 - routing_key: REDACTEDBefore any modifications, validate these rules:
1promtool check rules C:\Prometheus\rules\windows-server.rules.ymlTroubleshooting common issues
Use the list below to identify and correct the most frequent failures:
| Symptom | Probable cause | Solution |
|---|---|---|
| Target unreachable | Port 9182 blocked or incorrect hostname | Fix firewall or DNS rules |
| Empty dashboard | Incorrect data source or wrong label | Check data source and labels |
| Missing metrics | Collector not enabled | Reinstall with the correct collector set |
| Noisy alerts | Tight thresholds or missing grouping | Add appropriate grouping keys |
Production operations checklist
To ensure the continued reliability of your monitoring stack, follow these steps:
- Fix the version of windows_exporter to avoid unexpected changes.
- Review collectors with each server role change.
- Standardize labels in Prometheus for analysis consistency.
- Monitor retention and cardinality of metrics in Prometheus.
- Refine the alert list each month to avoid unnecessary noise.
- Back up dashboard and job JSON configurations in version control.
Important
The reliability of your monitoring structure depends on your ability to detect and fix failures quickly. Clear organization is essential.
Conclusion
Monitoring Windows servers in production requires rigor in the installation and configuration of tools. With windows_exporter, Prometheus, Grafana, and Alertmanager, you can build a reliable stack that alerts you effectively. Follow these best practices to transform your Windows servers into measurable and transparent systems.



