EC2 System and Instance Checks

21-Mar-2017 04:59 PM

In Cloudwatch, you have System and Instance checks. One checks the health of the physical host your EC2 instance is running on, the other checks the health of the EC2 instance itself:

https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/monitoring-system-instance-status-check.html

Last night we had an instance that failed the instance checks, and we were alerted by Cloudwatch itself. We did get an alert that the agent was disconnected, but no instance check failures. Are these metrics being monitored? They really are the base of EC2 monitoring.

Like (1) Reply

Replies (8)

Anonymous User

by Anonymous User

21-Mar-2017 05:38 PM

Hi Tom,

Both these metrics are in our Feature Road Map already. We will keep you updated shortly.

Thanks,

Yamini

Like (0) Reply

Tom

by Tom

21-Mar-2017 05:40 PM

Thanks for the quick reply Yamini!

Is the roadmap public? I would understand if it's not, but it might prevent feature requests that are already on the map. And it might allow us to upvote ;-)

Like (0) Reply

KS AnanthKumar

by KS AnanthKumar

21-Mar-2017 08:15 PM

Hi Tom,

Good that you got alerted via agent disconnection and thats one good problem an integrated monitor(EC2+Server) may solve for you.

If you are relying on EC2 instance monitoring alone, since our poll frequencies are for every 5 minutes , you may not get to know the next 5 minutes regarding a failure , even if you prefer the status based instance monitoring. Thats where an agent installed on the Server shall help you a lot with.

In our case, if the instance is getting shut down or in the process of doing so , we currently say it as a down post 5 minutes.

In order to get instant alerts of this kind , we would suggest you to go for the integrated monitor. We have decided to alert post a minute to avoid any kind of network flaps with respect to EC2(integrated version). We will bring this down may be based on user request.

Regards,

Ananthkumar K S

Like (0) Reply

Tom

by Tom

22-Mar-2017 12:45 AM

Hi Ananthkumar,

That would indeed be OK for when we integrate with the agent. In most cases we do, but we should be forced to. An EC2 instance without an agent that is in a running state (and thus not captured as having an issue by your current checks), but has an instance check failure would not be reported as DOWN or TROUBLE. That is basic monitoring, and it should be reported, regardless of using an agent.

System checks on the other hand can't be checked with an agent, it's an AWS level metric that is also vital.

So while the presence of the agent solves things for some cases, for others it doesn't.

Like (0) Reply

KS AnanthKumar

by KS AnanthKumar

25-Mar-2017 05:09 PM

Hi Tom,

Thanks for giving us some details on your requirement.

We will take it up asap .

In the meantime, we would like to understand the kinds of system checks , you currently do at the AWS level metrics. It would be great , if we can find a way to support those for you .

Regards,

Ananthkumar K S

Like (0) Reply

Tom

by Tom

26-Mar-2017 03:43 PM

Hi Ananthkumar,

There really is nothing more you can do in Cloudwatch then set alerts to the AWS provided metrics (https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/monitoring-system-instance-status-check.html#types-of-instance-status-checks).

AWS will alert if either of these checks.

Kind regards,

Tom

Like (0) Reply

KS AnanthKumar