Help Thread Dump/Heap Dump

Execute Thread Dump and Heap Dump to Automate Actions

Automate Thread Dump and Heap Dump execution to minimize manual intervention and improve application and server performance.

Use case

When an application exceeds its normal response time, say 400ms, it could be due to a potential deadlock or memory leak; in cases like these, generating a Thread Dump or Heap Dump when an application has a slow response can aid in determining the actual root cause.

Thread Dump is used to check whether you're stuck in a deadlock condition. Heap Dump is used to detect memory leaks.

Problem

After adding a background task to the application, the application's overall response time increases. We need to know the actual status of the objects that were created at that time.

Solution

Add thread and heap dump automation and associate them to the response time attribute so that the dumps are created during a threshold breach.

Add an automation

Supported server agent versions: 16.6.0 and above for Linux

  1. Log in to Site24x7 and go to Admin > IT Automation Templates (+). You can also navigate to Server > IT Automation Templates (+).
  2. Select Thread Dump or Heap Dump as the Type of Automation.
  3. Provide a Display Name for identification purposes.
    E.g., for the above case, the Display Name could be "Dump files for memory leak."
  4. Provide the absolute File Path to store the dump files.
    • If you save the dump files in the server agent's folder- /opt/site24x7/monagent, they will be erased when the agent is deleted. If you save the dump files outside the server agent's folder, they will remain even if the agent is deleted.
    • The heap dump files are saved in HPROF format whereas the thread dump files are saved in TXT format.
  5. Select Hosts for executing the Thread Dump or Heap Dump. You can select multiple hosts for parallel execution.
      You can choose $LOCALHOST to execute the automation on any host where there is a threshold or status change violation. When automation is enabled at the APM application level, it will run on all infringing servers in the application when a threshold or status change violation occurs.
  6. Enter a Time-out period (in seconds) representing the maximum amount of time the agent has to wait for the command execution to complete. After that, there will be a time-out error. This will be captured in the email report if the Send root cause analysis report by email when monitor is down option is configured as Yes.
      The time-out is set to 15 seconds by default. You can define a time-out between 1-90 seconds.
  7. You can choose to send an email of the automation result to the user group(s) configured in the notification profile. By default, it is set to No. This email will contain parameters including the automation name, type of automation, incident reason, destination hosts, and more.
      If you have multiple automations executed in one data collection, a consolidated email will be sent.
  8. Save the changes.
    Once an automation is added, schedule these automations to be executed one after the other.

 Notification Profile settings

Configure the following settings in the Notification Profile:

  • Notify as Down/Trouble after executing associated IT automation(s): When set to Yes, if your monitor still faces an outage even after executing the specified action, you'll be immediately alerted about the Down or Trouble status.
  • Suppress IT Automation of Dependent Monitors: When the status of the dependant resource is Down, execution of the IT automation is not performed.

Test the automation

Once you add an automation, go to the IT Automation Summary page (Server > IT Automation Templates) and use the icon to carry out a test run. Read more.

The test run would be applied to all the hosts selected for command execution. An exception to this would be the selection of $LOCALHOST as the only host.
Click on the IT Automation Logs to view the list of automations executed by date.

Map the automation

For an automation to be executed, map it with a monitor(s) or an attribute(s). This can be done in two ways:

Related articles

Was this document helpful?
Thanks for taking the time to share your feedback. We’ll use your feedback to improve our online help resources.

Help Thread Dump/Heap Dump