Lessons learned

Losing metrics with the prometheus pushgateway

how to properly push metrics to prometheus pushgateway

2 minute read

At work, we use Prometheus together with its pushgateway to monitor and alert on backup job execution. The other day we noticed a failing backup job that did not trigger an alert. Debugging quickly revealed that the pushgateway was losing metrics. Our metrics look like this: backup_last_success_unixtime{instance_name="some_hostname", job="backup_job"} We have several jobs with the same job label, running on different hosts, thus I thought these would be recorded as different timeseries.