Garbage collection is a key component of many modern programming languages, including C#. It’s even hard to imagine what programming would look like in C#, and other modern languages like Java, Ruby, and many others, without this tool.
Despite being a valuable asset that makes a better programming experience, garbage collection can still give you a hard time, specifically with performance.
With that in mind, what can a C# developer do to ensure that C# garbage collection acts as a friend instead of a foe? How can you write code in such a way that you reap all the benefits of this tool without suffering from any of the issues it can cause?
That’s what today’s post is all about.
What is garbage? It’s something that was once useful, but it’s not anymore (like a broken device). Or it might be residues from some activity (vegetable peels, for instance?) In short, garbage is things you want to get rid of, because it wastes space or potentially can cause harm.
Guess what? Our programs also generate garbage. Think about objects that were created, performed their jobs and are now useless, but are still there occupying valuable space in memory. Shouldn’t we get rid of them? We should indeed, and this process is sometimes called “memory management.”
Older languages required developers to manage memory manually. They’d have to mark objects that were no longer in use as dead or inactive, freeing the memory used by these objects as available for the program.
The problem with this is that manually freeing objects could be an extremely hard and error-prone process. Failure in terminating obsolete objects often resulted in memory leaks. Terminating non-obsolete objects, on the other hand, could result in runtime errors and inconsistent behavior. In short: a real pain in the neck. Manual memory management prevented developers from fully focusing on the business logic of whatever applications they were writing. Instead, it put them in a constant state of worry. Talk about cognitive load!
In response to those problems, garbage collection was created. So here’s our definition for garbage collection:
Garbage collection is an automated process that is able to figure out which objects are no longer needed and get rid of them, freeing space in memory.
Types of garbage collection
With the definition bit out of the way, we can move on to learn more about garbage collection in C#. But to fully appreciate the traits and properties of C# GC, we have to first dive a little deeper into the different types of garbage collection that exist.
The following two types—reference counting and tracing—are by no means an exhaustive list. They’re simply meant to give you an overview of the main types of GC, so later you can understand how C# garbage collection fits into the bigger picture.
It’s also important to keep in mind that these are overall strategies for GC. Each of them can be implemented through a variety of algorithms, which can vary wildly in terms of performance and other characteristics.
Reference counting, as the name suggests, is the process of counting all the references that point to a given object. Every object in the program has a field that holds the number of references pointing to it. Every time a new reference is created, the count is increased. The inverse is also true—every time a reference ceases to exist, the count is decreased. When the count for a given object reaches zero, that means the object is inaccessible. In other words, it’s garbage, and thus can be reclaimed by the collector.
Reference counting has advantages and disadvantages. Its primary advantage is that objects can be reclaimed as soon as their counting reaches zero. This way, each object has a defined lifetime, and collection can occur without long pauses, which can make for better responsiveness in the application.
Now, for the disadvantages. Reference counting obviously requires a lot of frequent updates. Sometimes the collector will claim a single object, triggering a chain reaction whose effects will reverberate throughout the whole application. In addition, this approach requires extra space to store the reference count for each object in the application.
Finally, reference counting has trouble dealing with reference cycles. Think about an object that references its children, which, in turn, reference the parent back. Such objects will have a ref count greater than zero, preventing them from being collected, even if they’re inaccessible for external objects. There are approaches that can handle this issue, but at the cost of adding more overhead and complexity.
The other main overall strategy for garbage collection is tracing. This approach basically consists of determining which objects are reachable, following a path of references that begin with certain root objects.
To greatly simplify the process, we could say it works like this: GC accesses a root object. It marks it as active, then proceeds to access the objects the root objects point to, marking those active as well. It repeats these steps until all the reachable objects have been reached. The collector then marks all the objects it couldn’t get to as unreachable and claims them.
How GC works in C#
C# garbage collection belongs to the tracing variety. It’s often called a generational approach since it employs the concept of generations to figure out which objects are eligible for collection.
Memory is divided into spaces called generations. The collector starts claiming objects in the youngest generation. Then it promotes the survivors to the next generation. The C# garbage collection uses three generations in total:
- Generation 0—This generation holds short-lived objects. Here’s where the collection process happens most often. When you instantiate a new object, it goes in this generation by default. The exceptions are objects whose sizes are equal to or greater than 85,000 bytes. Those objects are large, so they go straight into generation 2.
- Generation 1—This is an intermediate space between the short-lived and long-lived layers.
- Generation 2—Finally, this is the generation that holds objects that live the longest in the application—sometimes as long as the whole duration of the app. GC takes place here less frequently.
According to the Microsoft docs, the following information is what GC uses to determine if an object is live:
- Stack roots. Stack variables provided by the just-in-time (JIT) compiler and stack walker. Note that JIT optimizations can lengthen or shorten regions of code within which stack variables are reported to the garbage collector.
- Garbage collection handles. Handles that point to managed objects and that can be allocated by user code or by the common language runtime.
- Static data. Static objects in application domains that could be referencing other objects. Each application domain keeps track of its static objects.
Before the start of a collection process, the collector stops all threads, except for the one responsible for triggering the collection. Then, the collection happens, following these steps:
- The collector fetches all live objects, starting with the root objects cited above.
- The references to the objects that will be compacted are updated.
- Finally, the collector eliminates dead objects, reclaims their space, and promotes them to the next generation.
Use C# GC to your advantage
Developers new to GC will sometimes ask, “When is it appropriate to force the collection to occur?” And the answer is: (almost) never. Think about it. The whole point of this GC thing is to free you from having to manually manage memory. Wouldn’t it be self-defeating to make you manually trigger the collection process?
“Fine,” you might say. “But is there anything I can do about all that? Is there some way to write code so I don’t put unnecessary stress on GC?”
Sure, there are some things you can do. You’ve just learned that generation 0 is the place where collection happens most often. And which kinds of objects live there? You guessed it: short-lived ones. So, one way to avoid putting additional pressure on the GC is to avoid excessive memory allocations, especially objects you know will have short lives.
You can also use structs. These are value objects and, as such, live on the stack. By using structs when it makes sense to, you avoid extra allocations that put more pressure on the GC.
Tools at your disposal
People often misinterpret this quote, but the way I see it, it means this: Don’t just go out doing things because you think you’ll get a performance gain. In the best-case scenario, the gains will be negligible. In the worst case, you might inadvertently harm performance and decrease code quality.
Instead, learn to measure. There are tools—such as Stackify’s Retrace—that let you easily track your application’s performance. When, and if, you notice something shady going on, that’s the time to act.
The performance itself is far from being the only thing you should monitor, though. You’d be wise to also track the garbage collection process itself. When your app throws an OutOfMemory exception, for instance, this could be a sign of memory leaks.
Such errors deserve investigation and Retrace can help you with that, since it can show you how many times the collection process takes place in a single minute and also the average duration of each collection.
- Equality in Java: Operators, Methods, and What to Use When - September 26, 2019
- What Is Infrastructure as Code? How It Works, Best Practices, Tutorials - September 5, 2019
- Understanding Absence in Ruby: Present, Blank, Nil, Empty - May 23, 2019
- Logging Levels 101 - February 8, 2019
- C# Garbage Collection Tutorial - February 1, 2019