How to choose garbage collector for your application

Garbage collection (GC) is one of the core concepts of the Java Virtual Machine (JVM) but many tend to ignore it because its automated. In most scenarios, we start looking into GC when we over provision the resources for our applications, or if we want to optimize our spending on infrastructure. Alternatively, if we monitor our application usage regularly and optimize our applications' GC usage, it will reduce the technical debt.

What is a garbage collector?

Garbage collector is a process run by JVM to recycle unused memory footprints of the applications. Say, you create an object. If you post the usage of the object, it still stays in the memory, unless you remove it from the memory. To alleviate the pains of memory management, JVM automated this process by introducing garbage collectors.

How does GC work?

Live objects are tracked continuously and the rest of them are considered to be eligible for the next cycle of GC. Say, you have 10 objects created in an application flow. If five objects are still referenced in the application flow, they are considered to be live objects and the rest will be marked as eligible for next GC.

How does garbage collection work?-Site24x7

To determine which objects are no longer used, JVM runs the mark-and-sweep algorithm (https://en.wikipedia.org/wiki/Tracing_garbage_collection) in a periodic fashion. Reviewing the basics of GC (https://www.oracle.com/webfolder/technetwork/tutorials/obe/java/gc01/index.html) is important for understanding how a garbage collector is chosen.

Important factors for choosing a garbage collector

There are multiple factors to be considered, based on the type of environment, before choosing the appropriate garbage collector. But the most important factor stays the same: how to reduce the stop the world pauses.

There are multiple factors to be considered, based on the type of environment, before choosing the appropriate garbage collector. But the most important factor stays the same: how to reduce the stop the world pauses.

In order to get an idea on garbage collectors, lets have a look at the various options.

Serial garbage collector

One of the earliest, single-threaded mode garbage collector. This is still a good-to-use collector when you have a single core VM.

Important flags

  • -XX:+UseSerialGC — Used to enable Serial GC.

Parallel garbage collector

With the advent of multi-core processors, a multi-threaded garbage collector was released. This is good if you are running a batch processing application, like processing a large file by splitting it into smaller chunks where lengthy GC pauses are not a real cause of concern. Since multiple threads are chosen to perform GC here, the optimal choice of number of threads is important to achieve the best throughput out of this GC.

Important flags

  • -XX:+UseParallelGC — Used to enable parallel GC.
  • -XX:ParallelGCThreads — Configure the number of threads to run in parallel for GC.
  • -XX:MaxGCPauseMillis — Maximum target time for a GC pause.
  • -XX:GCTimeRatio — It defines the ratio between the time spent in GC and the time spent outside of GC. It improves the throughput of the application.

Concurrent Mark and Sweep garbage collector (CMS)

CMS runs concurrently with the application threads. Although it has lesser throughput than Parallel GC, it has low latency pause times. In a nutshell, the applications may have more stop the world events with lesser GC pause times. As it runs parallel with application threads, the memory footprint of CMS is expected to be more. It's a preferred garbage collector for most of the general purpose applications.

Important flags

  • XX:+UseConcMarkSweepGC — Used to enable CMS.
  • -XX:+UseCMSInitiatingOccupancyOnly — Used to of tenured space has to happen. Delaying the full GC may result in increased full GC pauses but with smaller heaps, configuring this will improve the full GC intervals.
  • -XX:+CMSScavengeBeforeRemark — Used to collect the clean up entities in young generation during the remark phase. This reduces the need to check in the cleanup phase and improves the GC pause times.
  • -XX:+ScavengeBeforeFullGC — Used to collect the clean up entities in full GC. This reduces the need to check in the cleanup phase and improves the GC pause times.
  • -XX:+CMSParallelRemarkEnabled — Used to do the remark phase in a parallel manner. By default, its single threaded.

G1 Collector

Based on the principle of dividing the heap into regions, this leads the G1 Collector to lesser fragmentation of memory as it clean up at each region. Since region-based information is maintained internally, the memory footprint is expected to be more than CMS, but GC pauses are expected to be predictable. A user can configure the max pause time and control that aspect as well, but the user also has to keep an eye on the efficiency of the clean up in such circumstances.

Important flags

  • -XX:+G1GC — Used to enable G1 GC.
  • -XX:InitiatingHeapOccupancyPercent — Used to control the concurrent mark phase of the young collection. Until this, old generation promotion happens in few cycles of young GC. It's important to lower the young GC promotion time to the old generation GC in remark phase, which results in a pause.
  • XX:G1PeriodicGCSystemLoadThreshold — Used to set the max load to be consumed for running the GC process.
  • -XX:G1NewSizePercent,-XX:G1MaxNewSizePercent — Used to improve the young-only phase times.
  • -XX:MaxGCPauseMillis — Used to control the maximum GC pause time.

Apart from these garbage collectors, there is one in experimental mode called the Z garbage collector. It's expected to address the concerns of low latency requirements and expected to perform better than G1 GC. Moving to a new garbage collector should not be a decision made quickly, but should result from a measured evaluation, and determined on an incremental basis.

The following are choices of garbage collectors:

Collector Advantages ​Use Cases
Serial GC Very low memory footprint Single threaded applications
Parallel GC High Throughput Batch processing
CMS Low GC pauses General applications
G1 GC Predictable / controlled GC pauses Applications working with larger heaps and uncontrolled GC delays

The three most important factors to be considered before the choice of the garbage collector are:

  1. Throughput - Decided by the time spent, by the application, running rather than waiting on the GC.
  2. Memory - Amount of memory required by the application to run in an optimal manner.
  3. Latency - The time taken for application to respond (inclusive of the expected GC pauses).