How to Diagnose Why Your Linux Server Crashed?
Although Linux servers are known to be reliable, occasionally computers fail. While an occasional system crash might not seem like much, in an enterprise setting, accurately diagnosing the cause of system failures is a must because a system failure can be an indicator of significant underlying software or hardware problems. While troubleshooting can be a difficult task, by using the tools mentioned in this guide, you can ensure that you are able to tackle even the toughest challenges.
Since every server is different, this guide isn’t intended to help you figure out the exact cause of your crash. Rather the focus is on the tools you can use to gain vital information to diagnose the issues.
Linux Process Management
After a server crash your first step should be to examine all running processes on your system to ensure everything is operating efficiently. The Top command built into Linux allows you to view CPU usage, Memory usage, Swap Memory, Cache Size, Buffer Size, Process PID, User, Commands and much more. The simplicity of Top makes it ideal for initial troubleshooting triage as the command allows you to view a wealth of system information almost instantly.
To use this command simply type top output into the console
Htop – A more Powerful Solution
To make your job as a server administrator easier, you will want to consider installing this tool as it enhances your troubleshooting abilities by providing you with more intuitive information than Top. Key features of Htop include a color coded interface; shortcut keys; horizontal and vertical scrolling; and much more.
Htop isn’t installed on Linux systems by default however it can be added to CentOS & RedHat based systems with the following command:~ yum install htop
On Ubuntu systems, Htop can be installed with the following command:~ sudo apt-get install htop
Analyze Network Traffic
Occasionally a server crash will be triggered by issues with network traffic. Effective packet analysis is crucial for determining whether a crash is triggered by issues in the datacenter, the clients system, or even directly on the server.
Tcpdump is one of the most widely used command-line network packet analysis tools available for Linux systems. It is a vital tool for server administrators because it allows them to capture or filter TCP/IP packets which are received or transferred on a specific interface over a network. From there, the program also allows you to save the data to a file for further analysis. While it is impractical to fully cover Tcpdump usage in this guide, Tecmint.com has a quick start guide available for server administration professionals to use at their convenience.
When you’re looking for simple network statistics, Netstat is an ideal tool for the task. This command can be used for monitoring incoming and outgoing network packets along with viewing interface statistics for each network device. As with Tcpdump, the number of commands is impractical to list here, but Tecmint.com has a helpful guide.
Wireshark is a vital tool for virtually every server administrator because it is one of the most robust and widely supported packet monitoring tools available to server administrators. Key features include:~ VoIP analysis, support for hundreds of communications protocols, ability to save in many different capture formats and much more. When it comes to any type of packet analysis, in most cases Wireshark is the only tool a server administrator needs.
Check the Logs
When all else fails, sifting through your server logs is one of the best ways to troubleshoot any errors. Usually the files will be located in the /var/log/syslog and the /var/log/ directories. Unfortunately viewing raw logs often is useless because they often contain thousands of entries and it is impossible to fully understand the data without log analysis tools.
Ways to Simplify Log Analysis
If you are trying to analyze general server traffic logs, having a quality analysis tool is crucial to making the task manageable. For web server traffic analysis, AWstats is the tool of choice for many server administrators because it is a free application which turns data into graphical insights. While AWstats doesn’t focus on low level information like the previously mentioned tools, there are a few key metrics it provides information on. With AWstats, data such as HTTP errors, cluster reports for load balanced servers, hourly traffic logs along with rush hour reports and the IP addresses of visitors are all crucial metrics to determining possible triggers of a server crash.
For users who need high performance log management solutions, ManageEngine offers EventLog Analyzer which provides a variety of tools suitable for enterprise clients who need to get to the bottom of server issues rapidly. Aside from helping to ensure compliance with PCI, FISMA, HIPAA, SOX and GLBA, EventLog Analyzer provides you with access to a wealth of log analysis tools that make sifting through data much easier. Aside from supporting a variety of system logs, EventLog Analyzer also includes file integrity monitoring capabilities, allowing you to take a more active approach to security.