Better Polling and Alerting

Site24x7 currently offers a system of polling where there is a primary location and then up to x# of secondary pollers, for many companies that offer services in different areas and now with the advent of cloud computing where you have regions that your services are fed from it would be great to move away from this idea of customers only being in a certain area and start treating monitoring as it should, "Is my service available from any of the locations that my customers are in at any time." 

The point of failure with having one location only polling a site is that you end up having customers in one of your service area not being able to get to your service but you won't know for 30 minutes if it's not the primary location or never if it only goes down from fewer locations than your threshold. For example, let's say I have a US business that offers services to customers in a few states (California, Chicago, Texas, New York). If I want to monitor my service from those locations I have to decide where most of my customers currently are and make that my primary poller or at least one close to that location. In this example, let's say my main customer base is in Chicago; I'd set Chicago as my primary location and have polling set to 1 minute.

I currently only have the following option:

  • Chicago polling: every 1 minute
  • California, Texas, New York polling: every 30 minutes

Using the recommended thresholds set ( down from 3 locations ):

If my site is not accessible from Chicago, it will auto trigger a check from all locations to verify if it is actually down I will know about it within a minute or two if it also goes down from two other location, but if let's say my site is not accessible at the 30-minute mark for New York only I will not know because it's not the primary and will only show after the next poll; I won't even get a trouble alert saying "Your site is not accessible from New York." so that I can look into it.

My alerting options are either setting my alerting to trigger if it ever goes down from any 1 location which breaks the false positive recommendation or not get alerted at all if it doesn't meet the location criteria.

Would be nice to have the following:

  • Round-robin the locations on a minute cycle or whatever the minimum option is for your selected monitor type. This way you have all locations as primary but only one at a time can poll keeping the load low on Site24x7 infrastructure.
  • Keep the intermittent polling from all pollers at the same time but maybe drop it to 15 minutes instead of 30 minutes and make it work on schedule.
  • Ability to select trouble alerting if a monitor is down from 1 or more location but less than 3 locations.

Maybe have a multiple series for alerting like (similar to the new bulk changes setup):

Alert (Down, Trouble) if Down From (# of locations) like:

If I think of anything else, I'll add it in the comments. If anyone has ideas to expand on this, please feel free to add to the discussion.

