This is necessary because we will always deadlock if a thread takes on multiple pump tasks since pump tasks never return.
This means when using separate threads for certain systems (like physics or rendering), we need to be sure that there are enough threads to have at least one per system (to ensure forward progress).
In order to make CommandQueueMT more maintainable this PR changes the
previous macro hell with variadic templates instead. This makes the
class far more explicit and will allow us to more easily change the way
the class functions in the future.
Furthermore this refactoring has allowed for some optimizations. In
particular by using std::forward to delay the decision of decaying the
type to as late as possible we are able to move the data from the
callsite into our Command buffer and later move it to the call.
In practice what this means is that compared to the old version instead
of copying values 3 times, we can now get away with 1 copy, and 1 move
for lvalues, and just 2 moves for rvalues. This saves quite a few
operations in a hot codepath.
We also now test to make sure that the amount of copies and moves are
what we expect. This way we can spot performance regressions in this
code easily.
Somewhat unscientifically, running TPS-demo by pressing enter and not
touching the controls average mspf, repeatable across many runs:
before: 6.467
after : 6.202
• `modernize-use-default-member-init` and `readability-redundant-member-init`
• Minor adjustments to `.clang-tidy` to improve syntax & remove redundancies