Go to All Forums

Announcing Critical status in Site24x7: Define multiple threshold alerts for a single monitored attribute

Site24x7 offers powerful, yet ingenious ways to define alerting thresholds and mitigate false alerts for your monitors. When a threshold breach occurs, Site24x7 sends out a real-time trouble alert with a detailed summary of issues to facilitate incident remediation. There can be situations, however, when you want to receive separate alerts for different threshold-level breaches of a single monitor attribute.

For example, you may want your on-call team to be alerted when your disk usage reaches 85 percent of hard drive capacity so your team can free up some disk space. Additionally, you may also want to send a high-priority alert if the issue persists and disk usage reaches 95 percent of your hard drive capacity.

Today, we're introducing an all-new alert status in Site24x7: Critical. This new alert status enables you to define dual thresholds for a single monitored attribute. When the unique threshold values continue to remain breached, separate trouble and critical alerts will be set off.

Working with critical alerts.

Critical alerts are displayed in orange. All your critical alerts can be easily sorted from the summary widget. In the Alarms view, you will have the additional option to filter alarms based on critical status.

 


Site24x7 allows you to set thresholds that offer flexibility when managing service outage alerts. Every monitor has a unique set of attributes for which the threshold values can be applied. With the introduction of critical alerts, we've revamped the current threshold profile form to include a dual-threshold alerting capability for single attributes. You can program the alarms engine to sense the breach and trigger a specific kind of alert, e.g. Trouble or Critical.

Once you save a threshold profile, you can associate it with a similar monitor. During monitoring, configured user alert groups will receive both trouble and critical status alerts for your monitored attribute, based on the increased severity of the threshold breaches. In addition to the dual thresholds and alerts, Site24x7 also supports dual IT automation actions for a single attribute during threshold breaches. Here are two unique business cases to show how Critical alerts work best.

Use case 1:

Say you're monitoring your server's CPU utilization to ensure it performs optimally. In order to promptly fix any issues that arise, you need to receive unique alerts each time a threshold breach occurs. But how?

First, create a server threshold profile by defining the threshold values and set unique alerts under the CPU Utilization threshold section. Then, associate the threshold profile with your server monitor. For example, when the actual reported CPU utilization is greater than or equal to 70 percent, the alarms engine automatically triggers a Trouble alert to the monitor's user alert group. If the reported CPU utilization is greater than 85 percent, it'll trigger a Critical alert to the monitor's user group. You can also receive either Down, Trouble or Critical status alerts when server processes go down.
 


Use case 2:

This use case shows the importance of setting up dual thresholds for the response time of your web application from the primary or secondary monitoring locations that you've configured.

As shown in the image below, you can configure a threshold profile for the monitor, and specify unique baseline values and alerts. Attach the threshold profile to a web transaction monitor. Once the thresholds are breached, unique alerts are instantly delivered to the monitor's user alert group.

For example, when the reported response time for the primary location is greater than or equal to 30 ms, the alarms engine will send a Trouble alert. Similarly, when the reported response time value is greater than 50 ms, it'll automatically trigger a Critical alert to the user group of the monitor. You can also define unique thresholds and alerts for all the attributes of a particular type of monitor.


A few more things to note

With the addition of Critical status, you can now label alerts as Trouble, Critical, or Down. Restricting the number of statuses helps maintain a clean and simple client-based configuration setting that simplifies alerting. You can also use our APIs to receive the details about critical alerts.

Keep in mind that the Critical alert status only augments the overall alerting capability in Site24x7; none of your existing monitor statuses will be impacted by this release.

For those users who do not wish to set static threshold values for alerting and want to proactively track incidents well in advance, check out our newly-released AI-powered Anomaly Dashboard. In the meantime, we’d love to hear your feedback on the Critical status, so please let us know what you think. If you have any questions, you can get in touch with us at support@site24x7.com.

Reply 6
Replies (0)