The function update_perf_report() is expensive and is called every
frame.
Most of it is not necessary unless the user calls get_perf_report
Affects #102173
Building with the following params (note: scu_build is enabled):
```
platform=linuxbsd builtin_embree=yes builtin_enet=no
builtin_freetype=yes builtin_graphite=yes builtin_harfbuzz=yes
builtin_libogg=no builtin_libpng=yes builtin_libtheora=no
builtin_libvorbis=no builtin_libwebp=no builtin_miniupnpc=no
builtin_pcre2=no builtin_zlib=yes builtin_zstd=no linker=mold
optimize=none debug_symbols=True tests=True dev_mode=True dev_build=True
use_llvm=yes use_lld=yes opengl3=no openxr=no disable_xr=yes -j 24
scu_build=yes scu_limit=256
```
Results in compiler errors:
```
In file included from servers/register_server_types.cpp:99:
servers/xr/xr_interface.h:52:7: error: redefinition of 'RefCounted'
52 | class XRInterface : public RefCounted {
| ^
./servers/rendering/rendering_method.h:40:21: note: expanded from macro
'XRInterface'
40 | #define XRInterface RefCounted
| ^
./core/object/ref_counted.h:37:7: note: previous definition is here
37 | class RefCounted : public Object {
| ^
```
This happens because of:
```
#ifdef XR_DISABLED
// RendererSceneCull::render_camera is empty when 3D is disabled, but
// it and RenderingMethod::render_camera have a parameter for
XRInterface.
#define XRInterface RefCounted
#else
#include "servers/xr/xr_interface.h"
#endif // XR_DISABLED
```
In rendering_method.h
This batches together a couple of micro-optimizations that were discovered in profiling and analyzing disassembly.
Importantly, this reduces the amount of instructions in a heavy loop. This has the biggest impact when there are lots of objects and lights in the scene as the function is called once per objects + once per light that touches the object
Clustered performs the following shadow rendering steps
1. Process objects [0; 10) for cascade 0.
2. Process objects [10; 30) for cascade 1.
3. Process objects [30; 100) for cascade 2.
4. Upload objects [0; 100) to GPU.
5. Draw all cascades.
Mobile was supposed to be doing the same, but instead was doing:
1. Process objects [0; 10) for cascade 0.
2. Upload objects [0; 10) to GPU.
3. Process objects [10; 30) for cascade 1.
4. Upload objects [0; 30) to GPU.
5. Process objects [30; 100) for cascade 2.
6. Upload objects [0; 100) to GPU.
7. Draw all cascades.
That is, always reuploaded everything from scratch.
Therefore it pointlessly (and with geometric growth) wasted BW.
Clustered performs the following shadow rendering steps
1. Process objects [0; 10) for cascade 0.
2. Process objects [10; 30) for cascade 1.
3. Process objects [30; 100) for cascade 2.
4. Upload objects [0; 100) to GPU.
5. Draw all cascades.
Mobile was supposed to be doing the same, but instead was doing:
1. Process objects [0; 10) for cascade 0.
2. Upload objects [0; 10) to GPU.
3. Process objects [10; 30) for cascade 1.
4. Upload objects [0; 30) to GPU.
5. Process objects [30; 100) for cascade 2.
6. Upload objects [0; 100) to GPU.
7. Draw all cascades.
That is, always reuploaded everything from scratch.
Therefore it pointlessly (and with geometric growth) wasted BW.