HACKER Q&A
📣 social_quotient

How do you have the zero/non event alerts?


Thinking about system monitoring and metrics. I find it more and more challenging to get alerts on systems when something doesn't happen. I call it the zero alert. I've seen this on a lot of reporting systems we use ranging from Uptime Alerts that didn't get fired, to Emails from a cron that just never ran, or most recently DataDog on a AWS lambda function that just didn't run on object creation. Generally, we have lots of alerts for errors and exceptions but when an underlying system just isn't running there doesn't seem to be evident alerts to cover what I call the "zero alert" scenario.

How do you get alerted for things like -no click events on XYZ -no search results -Lambda or serverless function didn't run -Email didn't get triggered or cron didn't run -Overnight reports didn't execute -XYZ document didn't get downloaded

It seems there are reports for everything on the opposite side. Tons of click events, a bunch of lambdas got invoked or ran too long, report is in my inbox, top 10 document downloads. Curious how other people handle these sorts of things in a wide variety of systems and scenarios.


  👤 social_quotient Accepted Answer ✓
title correction : "How do you handle the zero/non event alerts?"