You can monitor hundreds of host metrics, but a small set reliably precedes most incidents. Track those first.
The Core Four
CPU saturation, memory pressure, disk usage, and network errors cover the majority of host-level failures.
CPU
64%
p95
Memory
3.2/8 GB
Disk
71%
Net errors
0
/min
Watch Trends, Not Just Thresholds
A disk climbing steadily toward full is more actionable than a momentary CPU spike. Trends predict; thresholds react.
Tie Hosts to Services
Host metrics matter most in the context of the services running on them. Correlate the two so a noisy neighbor is obvious.