SREcon Europe 2017 -- Day 1

I am currently in Dublin, Ireland, for SRECon Europe/Middle East/Africa 2017. The day started right off with a surprise at the registration desk, I wasn’t on the list of participants. Luckily the issue could be resolved quickly and I could enter the conference area. Care and Feeding of SRE The first keynote was delivered by Narayan Desai, an SRE manager from Google. As SRE enters the ops zeitgeist, much of the focus has been placed on tactics–techniques that individual operations teams can adopt to improve their effectiveness.

Losing metrics with the prometheus pushgateway

how to properly push metrics to prometheus pushgateway

At work, we use Prometheus together with its pushgateway to monitor and alert on backup job execution. The other day we noticed a failing backup job that did not trigger an alert. Debugging quickly revealed that the pushgateway was losing metrics. Our metrics look like this: backup_last_success_unixtime{instance_name="some_hostname", job="backup_job"} We have several jobs with the same job label, running on different hosts, thus I thought these would be recorded as different timeseries.

After more than 3 years, I am reviving my old blog. As my older posts are hardly relevant these days and I don’t have many readers anyways, I just start from scratch. If anyone is interested in any of my older posts, drop me a line and I can republish it.