Memory Management

2: Memory Management

Last Updated: 9/16/2020

Note: This is NOT engine documentation. I write these articles as I learn. As such, information present here may be incorrect or out-of-date. I'm leaving these pages online for historical purposes only.

Memory management is an essential part of game development. In engines like Unity and Unreal, this is taken care of for you by means of a garbage collector. In a garbage collected system, you simply leak everything, and periodically, your process halts so that the collector can analyze your process's memory to figure out what is no longer in use, and free that memory.

Pros
- Programmer doesn't need to manually free dynamically allocated memory
- Garbage Collector can spread out load of deallocation over many frames to reduce load
Cons
- The programmer cannot deallocate memory when they want to. Instead, all a programmer can do is set it to null and hope that the garbage collector takes care of it in a timely manner
- Garbage collectors are known to cause micro-stuttering, and in some cases, large lag spikes
- Garbage collected systems require on average 30-40% more RAM than a manual system. This is especially a problem on mobile, where RAM is scarce and the OS will terminate your process if you use too much.

When writing a typical application, garbage collectors offer more benefits than negatives, because fine control over memory is rarely rquired. However, in game development, smart use of resources is paramount, so the programmer can never escape manual memory management. In my experience, garbage collectors make memory management in games more difficult because instead of having control, the programmer must dance around an external system by providing it suggestions.

Unlike Unity and Unreal, the Godot game engine makes use of a simpler yet effective system: Automatic Reference Counting. In this system, dynamically allocated objects are informed of all their strong references in real time. When an object's reference count reaches 0, it knows that it is no longer needed and frees itself on the spot.

Reference counting addresses many of the concerns that garbage collectors have. In particular:

The programmer has control over when objects are freed, and can intelligently design their game to deallocate when most appropriate.
A deallocation-related performance problem will show up in a profiler, making optimization easy.
Only RAM that is currently in use is held, which is especially important for mobile devices.

In fact, Apple uses this technique extensively with its Swift and Objective-C programming languages, which is why iPhones can get away with having half the physical RAM compared to a similar Android phones and provide a smoother experience simultaneously.

However, it is still possible to leak in a reference-counted system with a reference cycle. If two objects, A and B, both have strong references to each other, their reference counts will never reach 0 and the objects will leak. To solve this, the reference counting system must provide a non-owning (weak) reference, and the programmer must know when to use them.

As of C++11, the STL provides many smart pointers for a variety of ownership models, including reference counting. However, using it is clunky and error prone. For example:

std::shared_ptr<Object> = std::make_shared<Object>(param1, param2);

That's a lot of code to write for every allocation! In addition, because std::shared_ptr<T> can apply to anything, it does not store the reference count in the object, meaning you can easily accidentally end up with two counts for the same object and prematurely delete an object. In addition, casting between types requires even more unpleasant verbose syntax.

I wanted the experience to feel like using a language like Java:

ObjectRef = new Object(param1, param2);

and I also wanted to make it impossible to accidentally create two counts for the same object. In addition, casting should be automatic wherever possible. Solving the double-counting problem is easy with a base class: RavEngine::SharedObject. This object exposed retain() and release() methods to manage an atomic count. Any reference countable object simply derives from RavEngine::SharedObject But retain() and release() won't call themselves, so I needed to implement my own smart pointer class.

The smart pointer class must have a few features:

Properly overload the -> operator to access members on the object for the appropriate derivation level
Retain on assignment or construction, and release on destruction
Casting up or down the inheritance hierarchy should be automatic.
Implement std::hash to be storable in hashing data structures like std::unordered_set and std::unordered_map

I was not familiar with how templates worked, so in my quest for ease-of-use I mistakenly turned to macros. I needed to have a parallel hierarchy of inheritance between the SharedObjects and their smart "Ref" pointers, in order to satisfy requirements 1, 3, and 4, so I created a macro that generated the appropriate class.

//this macro creates a NameRef class given a class Name and its base		
//also define its == and hash function			
#define MakeSharedRef(name,base)\		
class name ## Ref : public base ## Ref { \		
public:\		
    name ## Ref() : base ## Ref(){}\		
    name ## Ref(name* ptr) : base ## Ref(ptr){assert(dynamic_cast(ptr) != nullptr);}\		
    name ## Ref(SharedObject* ptr) : name ## Ref(static_cast(ptr)){assert(dynamic_cast(ptr) != nullptr);}\		
    name* operator->(){ return static_cast(get()); } \		
    name* operator*(){ return static_cast(get()); } \		
    bool operator==(const name ## Ref other) const{ return other.get() == get();}\		
};\		
namespace std {template<> struct hash {\		
    std::size_t operator()(const name ## Ref& k) const{return k.get()->Hash();}\		
};}\

The the client programmer would simply include the macro to generate the appropriate class:

class A : public RavEngine::SharedObject{
    // ...
}
class B : public A{
    // ...
}
MakeSharedRef(A, RavEngine::SharedObject);
MakeSharedRef(B, A);

Horrible, isn't it? While in the short term this provided what I needed, maintaining and debugging a macro like this quickly became a nightmare. I realized that I had implemented a really terrible template system using macros. The real answer is to implement it with templates. Shown below is part of my Ref class, with trivial parts such as copy constructors omitted:

template<typename T>
class Ref{
    T* ptr = nullptr;

    // ... 

    // conversion operator
    template<typename U>
    operator Ref<U>() const {
        static_assert(std::is_base_of<RavEngine::SharedObject, U>::value, "U is not a base class of SharedObject");
        static_assert(std::is_base_of<U, T>::value || std::is_base_of<T,U>::value,"This conversion is not an upcast or downcast");
        return static_cast<U*>(ptr);
    }
    
    // less (for std::set / std::multiset), requires T and U to have < operator defined
    template<typename U>
    bool operator<(const Ref<U>& other) const {
        return *get() < *other.get();
    }
};
//hash function for ref
namespace std {
    template<typename T>
    struct hash<Ref<T>> {
        //Strong pointer hash uses the object it is pointing at
        std::size_t operator()(const Ref<T>& k) const
        {
            return k->Hash();
        }
    };
}

This provides all that the macro did, without any extra code from client programmers, and is debuggable. In addition, I built a WeakRef<T> which is similar to Ref except it doesn't call retain or release. Since WeakRef has to deal with dangling pointers, SharedObjects store their weak references inside them using a fast thread-safe hashset, and on destruction, all current WeakRefs are notified to be set to nullptr. This is a technique known as Zeroing Weak References, and in this way, no smart pointer has a dangling pointer, and testing for nullptr is a valid way to check if a smart pointer is allocated. One can easily create a Shared pointer from a weak pointer and vice versa, using the relevant constructors.

Next up: Physics in ECS