Website Fiascos & Lessons Learned: Part 2 - Datacenter Monitoring
37% of IT teams learn of issues too late - when business users tell them (TRAC Research). To help you identify Web performance issues much faster, we focused our first blog~on proactive Web performance monitoring from your users’ perspective as your first line of defense.
Lets’ continue the discussion this week by reviewing some recent Website fiascos and root-cause.
- April - Government websites~down.~The problem? A server glitch
- Sept 20~- Glitch on bank's trading website~ CommSec. The problem? A server issue
- Nov 3~- Huge outage takes down 16 Gov. websites~in Singapore.~The problem? An infrastructure failure
As you can see, an underperforming back-end infrastructure is often the direct cause of costly outages and Website glitches for your organization Therefore, besides proactively monitoring your Website from your end-users’ perspective, you need to continuously oversee your datacenter infrastructure to prevent outages like these.~ Here are some quick tips to help you get started.
1. Identify which servers to oversee
Your monitoring scope (what to monitor and how) should include all servers involved in your Web application delivery chain - basically every server involved in the process of presenting content to your users. Here are some quick pointers to get you thinking:
- Mail servers
- Web servers / application servers / middleware servers
- DNS servers
- VMware resources
2. Define monitoring metrics and corresponding thresholds
For each monitored server you must define monitored metrics and corresponding thresholds that will trigger real-time alerts. For example, some organizations like to get notifications earlier (Disk space - 80% full), while others like to get alerted as it gets closer to a catastrophic event (Disk space - 95% full). Besides monitoring critical server metrics such as CPU, Disk, Memory, Process, Services or Network Utilization, you should also oversee connectivity and response time to ensure everything is functioning at peak performance - not just up and running.~ For example:
- Continuously test and validate that your DNS look-ups are working and your DNS server is resolving domain names correctly.
- Track complete mail server round trip time
If you are relying on virtualization technology you should oversee health, availability and performance of these resources as well. ~For example, monitor vSphere hosts and VMs performance to quickly identify ESX/ESXi Servers running short of resources. That way you can identify capacity constraint and bottlenecks much earlier, before your Web performance suffers and user experience is impacted.
3.~Define your alerting policies
Take the time to map out your response procedures and escalation teams. For example, if an Exchange server metric has been breached you should route this alert to your Mail Administrators. Similarly, problems in your databases should be sent to your DBAs, VMware related issues should be sent to IT and so on. Besides real-time alerting capabilities you should periodically track and compare performance evolution across all metrics over time (for servers and virtual resources) to identify performance deviations and potential datacenter bottlenecks much earlier. ~Remember, just one single server struggling to keep up with demand can knock-out your Website and impact your visitors.
Ready to ensure your back-end infrastructure is performing top-notch to avoid Website outages and a flurry of negative comments on Social Media?
Proactively monitor your Website from your user’s perspective and oversee your datacenter infrastructure.~ Sign-up for a free Site24x7 Web Performance Monitoring trial!
Site24x7 is delivered as Software-as-a-Service, which means that you can get started in minutes, no contracts, no long-term commitment, you can cancel anytime.