2: Memory Management
Last Updated: 9/16/2020Note: This is NOT engine documentation. I write these articles as I learn. As such, information present here may be incorrect or out-of-date. I'm leaving these pages online for historical purposes only.
Memory management is an essential part of game development. In engines like Unity and Unreal, this is taken care of for you by means of a garbage collector. In a garbage collected system, you simply leak everything, and periodically, your process halts so that the collector can analyze your process's memory to figure out what is no longer in use, and free that memory.
-
Garbage collector pros and cons
-
Pros
- Programmer doesn't need to manually free dynamically allocated memory
- Garbage Collector can spread out load of deallocation over many frames to reduce load
-
Cons
- The programmer cannot deallocate memory when they want to. Instead, all a programmer can do is set it to null and hope that the garbage collector takes care of it in a timely manner
- Garbage collectors are known to cause micro-stuttering, and in some cases, large lag spikes
- Garbage collected systems require on average 30-40% more RAM than a manual system. This is especially a problem on mobile, where RAM is scarce and the OS will terminate your process if you use too much.
When writing a typical application, garbage collectors offer more benefits than negatives, because fine control over memory is rarely rquired. However, in game development, smart use of resources is paramount, so the programmer can never escape manual memory management. In my experience, garbage collectors make memory management in games more difficult because instead of having control, the programmer must dance around an external system by providing it suggestions.
Unlike Unity and Unreal, the Godot game engine makes use of a simpler yet effective system: Automatic Reference Counting. In this system, dynamically allocated objects are informed of all their strong references in real time. When an object's reference count reaches 0, it knows that it is no longer needed and frees itself on the spot.
Reference counting addresses many of the concerns that garbage collectors have. In particular:
- The programmer has control over when objects are freed, and can intelligently design their game to deallocate when most appropriate.
- A deallocation-related performance problem will show up in a profiler, making optimization easy.
- Only RAM that is currently in use is held, which is especially important for mobile devices.
However, it is still possible to leak in a reference-counted system with a reference cycle. If two objects, A and B, both have strong references to each other, their reference counts will never reach 0 and the objects will leak. To solve this, the reference counting system must provide a non-owning (weak) reference, and the programmer must know when to use them.
As of C++11, the STL provides many smart pointers for a variety of ownership models, including reference counting. However, using it is clunky and error prone. For example:
std::shared_ptr<Object> = std::make_shared<Object>(param1, param2);
That's a lot of code to write for every allocation!
In addition, because std::shared_ptr<T>
can apply to anything, it does not store the reference count in the object, meaning you can easily
accidentally end up with two counts for the same object and prematurely delete an object. In addition, casting between types requires
even more unpleasant verbose syntax.
I wanted the experience to feel like using a language like Java:
ObjectRef = new Object(param1, param2);
and I also wanted to make it impossible to accidentally create two counts for the same object. In addition, casting should be automatic
wherever possible. Solving the double-counting problem is easy with a base class: RavEngine::SharedObject
. This object
exposed retain()
and release()
methods to manage an atomic count. Any reference countable object simply derives from RavEngine::SharedObject
But retain()
and release()
won't call themselves, so I needed to implement my own smart pointer class.
The smart pointer class must have a few features:
-
Properly overload the
->
operator to access members on the object for the appropriate derivation level - Retain on assignment or construction, and release on destruction
- Casting up or down the inheritance hierarchy should be automatic.
-
Implement
std::hash
to be storable in hashing data structures likestd::unordered_set
andstd::unordered_map
//this macro creates a NameRef class given a class Name and its base
//also define its == and hash function
#define MakeSharedRef(name,base)\
class name ## Ref : public base ## Ref { \
public:\
name ## Ref() : base ## Ref(){}\
name ## Ref(name* ptr) : base ## Ref(ptr){assert(dynamic_cast(ptr) != nullptr);}\
name ## Ref(SharedObject* ptr) : name ## Ref(static_cast(ptr)){assert(dynamic_cast(ptr) != nullptr);}\
name* operator->(){ return static_cast(get()); } \
name* operator*(){ return static_cast(get()); } \
bool operator==(const name ## Ref other) const{ return other.get() == get();}\
};\
namespace std {template<> struct hash {\
std::size_t operator()(const name ## Ref& k) const{return k.get()->Hash();}\
};}\
The the client programmer would simply include the macro to generate the appropriate class:
class A : public RavEngine::SharedObject{
// ...
}
class B : public A{
// ...
}
MakeSharedRef(A, RavEngine::SharedObject);
MakeSharedRef(B, A);
Horrible, isn't it? While in the short term this provided what I needed, maintaining and debugging a macro like this quickly became a nightmare.
I realized that I had implemented a really terrible template system using macros. The real answer is to implement it with templates. Shown below is
part of my Ref class, with trivial parts such as copy constructors omitted:
template<typename T>
class Ref{
T* ptr = nullptr;
// ...
// conversion operator
template<typename U>
operator Ref<U>() const {
static_assert(std::is_base_of<RavEngine::SharedObject, U>::value, "U is not a base class of SharedObject");
static_assert(std::is_base_of<U, T>::value || std::is_base_of<T,U>::value,"This conversion is not an upcast or downcast");
return static_cast<U*>(ptr);
}
// less (for std::set / std::multiset), requires T and U to have < operator defined
template<typename U>
bool operator<(const Ref<U>& other) const {
return *get() < *other.get();
}
};
//hash function for ref
namespace std {
template<typename T>
struct hash<Ref<T>> {
//Strong pointer hash uses the object it is pointing at
std::size_t operator()(const Ref<T>& k) const
{
return k->Hash();
}
};
}
This provides all that the macro did, without any extra code from client programmers, and is debuggable. In addition, I built a WeakRef<T>
which is similar to Ref except it doesn't call retain or release. Since WeakRef has to deal with dangling pointers, SharedObjects store their weak references
inside them using a fast thread-safe hashset, and on destruction, all current WeakRefs are notified to be set to nullptr. This is a technique known as Zeroing Weak References, and in this way, no smart pointer has a dangling pointer, and testing for nullptr
is a valid way to check if a smart pointer is allocated. One can easily create a Shared pointer from a weak pointer and vice versa, using the relevant constructors.
Next up: Physics in ECS