Go to All Forums

Feature request - Dependency Monitors

I would really like to have an option for configuring dependencies for monitors. Consider this scenario: Lets imagine 50 (or whatever number of web applications) running at a given data center or server. If the network or server goes down, I would receive alerts/notifications for all of those web sites. It would be better if we can configure a network map (such as in Icinga) or on a simpler level, have an ability to define monitor dependencies. In my scenario described, those 50 web site monitors would be dependent upon the gateway device or server upon which those 50 sites are dependent.

Without this feature, there is potential with a catastrophic type of failure to 1) burn through all notification credits very quickly 2) overload on call folks with an unmanageable deluge of SMS and phone calls.

In my example, those 50 web sites might be dependent upon the server being reachable (ping), and that server might be dependent upon a gateway device (or switch). I am thinking this is going to be a pretty important feature for anyone monitoring more than a couple services at a single location.
Like (4) Reply
Replies (33)

The request has been forwarded to development team to check the feasibility of having this feature implemented and add it in our product roadmap.  

Thanks.
Rafee
Like (0) Reply

We acknowledge the importance of this feature. 

We currently do have the ability to monitor the status of gateway servers and other hardware present inside the data center via the Site24x7 On-Premise Poller. I guess we just need to give a way to "associate the status".

BTW I would like to hear your comments on whether you would like to see the "dependent monitor" associated on a per monitor basis (i.e every website) or at a Monitor Group basis. Associating at a Monitor Group Basis makes it easier to manage when you have hundreds of monitors.
Like (0) Reply

I think associating at a monitor group basis or individual monitor basis both make sense. Having the ability for associating groups certainly makes sense in my use case.

Jamie
Like (0) Reply

I would take this association a few steps further. 

If you have a server monitor and then all kind of monitors related, i would love to get a full report view and to include data from all monitors in case of an outage.

For example. Lets say i have an IIS server that works with an MSSQL database. I would have server monitoring, IIS monitoring and MSSQL monitoring. Additionally i might have Website and web page analyzer.

If there is an outage or an incident related to performance. Today i have to manually gather data from all monitors to see whats happening. While if this monitors are correlated, you can send a full view of all data or get any parameter that's behaving out of the average.

Another feature that could empower the above idea, is to Sync the poll of related monitors. That way, the information gathered from different sources would be from the same time frame.



Like (0) Reply

I agree it is important to see monitors that are closely related in a more intuitive manner. Currently there is no automatic  modelling of monitors on the same physical host. Hence this has not been done. This is an important enhancement we have in mind. This will allow us to atleast list the monitors and its status in the Monitor Details view. Probably inline in the "Summary View".

>>  Today i have to manually gather data from all monitors to see whats happening.  

That said, we do have plans to provide a customizable dashboard. But that is some time away as we are working on getting our new client done.

We have the Monitor Group Concept for a bit more advanced use case. Say you have a Web App that uses 2 DBs, 10 App Servers, 10 physical servers. In order to see the status of these infrastructure apps in one place when there is an outage or for general reporting needs we have to group these "Monitors" under a "Monitor Group". The Monitor Groups can ideally serve the purpose of representing status of a "Business Application" in this context. This is something we will see more in the new reports that we are working on. The new reports would come with the new web client we are working on. 


>>Without this feature, there is potential with a catastrophic type of failure to 1) burn through all notification credits very >>quickly 2) overload on call folks with an unmanageable deluge of SMS and phone calls.

I re-read your original post and found that we should have an extra protection for this overload of alerts. For voice calls, we do have some safeguards in place. After a few calls we will begin to "consolidate and make a call". This is not present for SMS and EMail Alerts. However I guess we could look at providing the following new features:
1) A web link in the "SMS" or EMail  to   "Turn Off Alerts for the next hour"  for that "Contact".
2) Option in the Web Client > Operations Tab to "Turn Off Alerts for all users for the next hour" at an account level.

Thank you for your comments.

Like (0) Reply

I did experience that the phone call alerts do start to consolidate, the safeguard you mentioned, where multiple alerts are delivered in one phone call. So that is very good. I think the same thing makes sense for SMS.

Also, and ability to acknowledge problems would also go a long way, because I don't really like the idea of suspending monitors because then we don't get real uptime/downtime data, and there is risk of forgetting to un-suspend. Second, just disabling alerts per user has the same risk of forgetting to re-enable alerts and getting configuration back to where it should be after a problem. I think your options 1 and 2 go a long way towards addressing that issue, however I also think the real need is an ability to acknowledge problems in bulk.

Hope that helps!

Jamie
Like (0) Reply

The ability to acknowledge problems in bulk will come in to the product later this year. We hope to include it in the product by means of introducing an Alarms Tab. I can share some screenshots of a prototype that we have internally.

If you are interested in looking at it, please send an email to "support@site24x7.com Attn : Alarms Tab Prototype".

Thanks
Gibu      
Like (0) Reply

The dependency association should also be present for agent based server monitors. For example, I have my firewall monitored externally, it went down, basically taking the complete internet down, all the servers behind that went offline in monitoring dashboard, the system sent multiple alerts and SMS, although the servers were up and running the only thing was internet went down. So it would make sense if those monitors could depend on another monitor so that false alarms do not come up.
Like (0) Reply

Thank you Syed for this use case. 

@Sabarinathan, if you have any comments on how we handle and prevent lots of alerts getting generated for server monitors in the case when the firewall or some edge device at the customer end has a problem use case, please post here.
Like (0) Reply

Thanks for bringing up the Feature request. Currently we trigger alerts for all the Server Agents and it dependency monitors. We will take this feature immediately and will let you know once updated as soon as possible.
Like (0) Reply

Any updates to share on this feature? Its been under review for a while. Continues to be a high priority for my use case. We burn through sms credits like no other without this functionality.
Like (0) Reply

Sorry to say we have not started on this yet. However I like to reiterate we understand the importance of this capability.




Like (0) Reply

Well it happened today. A network outage not overcome by multiple redundancies caused hundreds of alerts, sms, and calls to go out to multiple personnel. This continues to be a hot item for our use case. 

If only I could have each of my host groups have a parent dependency of the cluster IP (ping monitor) this problem would be alleviated. Glad to see that this is under review, and just wanted to give it a friendly *bump*.

The good news? We definitely knew there was a problem thanks to Site 24x7.
Like (0) Reply

I also added Sub Grouping as a request which would play well with what you are looking for. Thanks for bringing this up.

https://forums.site24x7.com/topic/sub-grouping
Like (0) Reply

Jamie, as for SMS here is what we do to limit the crazy SMS usage.. We only have Site24x7 SMS alert the highly critical people and checks and we use PagerDuty for all other alerts and Site24x7 is integrated with them.

I do understand the crazy issues if a switch goes down and it sets off a chain reaction for everything under it.

https://www.pagerduty.com/pricing/
Like (0) Reply

Thanks framirez. I had considered pagerduty when I originally set about to have a workaround. Thanks for confirming the integration works for you. I was hoping to avoid subscribing to yet another service, but it may be my only option right now.


Like (0) Reply

Perhaps a faster way to see this implemented would be to have the feature of monitors in multiple groups and also the feature of adding sub groups.

After that is necessary to include the logic of checking if a group is already alerted or not before sending an alarm.


Like (0) Reply

+1 for dependency-based alerting rules feature. 
Like (0) Reply

+1 for Dependencies
Like (0) Reply

This has been under review for a while. Just wondering if there might be an update from Site24x7 team on implementation time-line or any related news.
Like (0) Reply

Hi Jamie,

Thanks for your patience. We are starting to work on this feature now. Will keep you updated on possible availability date once we get to that point.

Like (0) Reply

I agree with everyone here.  Dependency monitors is a must. 

Any update from Site24x7 on the progress and release date?

Thank you
Like (0) Reply

Hello,

Sorry the feature was delayed due to other priorities earlier.  We have now started working on this now. We will have it released in 6 - 8 weeks time.

Will update this post once it is live.

Thanks for your patience on this.

Raghavan
Like (0) Reply

Thanks you for the update!  We are a new customer in the first stages of roll-out.  Without this feature I don't see us implementing this solution across the organization.  Hoping this stays a high priority.

Much appreciated,
Kamil
Like (0) Reply

Hi Kamil,

Thanks for updating on your requirement. Just wanted to update on what is planned on this feature:

  • Support for allowing monitors under multiple groups
  • Support for creating sub groups under a monitor group up to 5 level.
  • Support for setting up dependency monitor / group for a Monitor or a Monitor Group.
  • Support for configuring threshold for group failure.
  • Suppressing alerts for a monitor when dependency monitor / group is Down.
Let me know if you have any further requirement on this feature.

Regards,
Raghavan
Like (0) Reply

Raghavan,

I think this list looks good.  I'd just like some clarification on alerts.  Will group based alerting be coming for the ride with this change (associating a monitor group with a user group as that's where alerting is specified)?  Currently it's only possible to specify an alert policy at the monitor level.  I think having a group level alerting profile would be useful and perhaps even required with this new functionality.

Much appreciated,
K
Like (0) Reply

Going through old forums as a new customer. Monitoring Group alerts still isn't implemented and is something we're in need of. I created a new forum/idea specifically for this https://forums.site24x7.com/topic/alert-on-group  Please upvote so it gains visibility! :) 
Like (0) Reply

What's the ETA on releasing the dependency feature?  For us this is the requirement before we deploy this to our entire production environment.  Thanks for the info!

Best,
Julian
Like (0) Reply

Hello All,

The much awaited alert suppression feature is now live.

Users can now go ahead and set up a dependency monitor. It is mandatory to associate the monitor to a monitor group and then map it to a dependency monitor. You should also enable the suppress alert option in the tab. Please take a look at the screenshot given below for more clarity.




Regarding the other enhancements as listed above by Raghavan, we are working on the same and shall update this thread once done.

Cheers!
Sushma.
Like (0) Reply

Hi

Love the dependency monitor feature, but had occasion this weekend to see the need for a possible change to it (or maybe this exists already?):

I needed to reboot a VM host running many VMs, all of which have the host as their dependency. I put the host's monitor into 'maintenance' on 24x7 thinking the individual VM monitors would take this into account and not alert me as I shut them down cleanly. No such luck.

What I am guessing is the dependency monitor is only looking for a DOWN condition on the target.

Could you add 'maintenance' into that consideration? Since I can see how this might not be wanted as a ubiquitous feature, can you perhaps add a toggle on the maintenance scheduler...something like "let dependents know"...or something so that I don't have to put all of the dependent monitors into maintenance mode individually?

TIA
-Brad
Like (0) Reply

The ability to select more than one dependent monitor would be awesome.

For example:

we have a page ( https://www.site.com/page1 ) that is dependent on ( https://www.site.com ) to be up but it's also dependent on a server ( server001 ) to also have port 443 open.

Or if we are working on site ( https://www.site.com )  then ( https://www.site.com/page1 ) and ( server001 )  should not send alerts because we already know the site is being worked on.

Like (0) Reply

Hi,

Thanks for raising the request with your usecase.

Do you think it is good to support configuring Monitor Group as dependednt for a monitor, thus it allows any number of dependent to be configured ?

 

Raghavan

Like (0) Reply

Hi Brad,

Alert suppression will be done for dependency monitor maintenance state as well. Apart from that, in the case of VMs, even without configuring explicit dependency, if the host monitor goes to maintenance state, VM monitors will also go to maintenance state.

Looks like you are facing some issue, please send a mail to support at site24x7 dot com for analyzing the issue further.

Thillai.
Like (0) Reply

Was this post helpful?