This work is a heavily refactored and rewritten from TheForge's initial
code.
TheForge's original code had too many race conditions and was
fundamentally flawed as it was too easy to incur into those data races
by accident.
However they identified the proper places that needed changes, and the
idea was sound. I used their work as a blueprint to design this work.
This PR implements:
- Introduction of UMA buffers used by a few buffers
(most notably the ones filled by _fill_instance_data).
Ironically this change seems to positively affect PC more than it does
on Mobile.
Updates D3D12 Memory Allocator to get GPU_UPLOAD heap support.
Metal implementation by Stuart Carnie.
Co-authored-by: Stuart Carnie <stuart.carnie@gmail.com>
Co-authored-by: TheForge team
Introduce a specialised `texture_get_data` for `RenderDeviceDriver`,
which can retrieve the texture data from the GPU driver for shared
textures (`TEXTURE_USAGE_CPU_READ_BIT`).
Closes#108115
This change introduces a new protected type, `ReflectedShaderStage` to
`RenderingShaderContainer` that derived types use to access SPIR-V and
the reflected module, `SpvReflectShaderModule` allowing implementations
to use the reflection information to compile their platform-specific
module.
* Fixes memory leak in `reflect_spirv` that would not deallocate the
`SpvReflectShaderModule` if an error occurred.
* Removes unnecessary allocation when creating `SpvReflectShaderModule`
by passing `NO_COPY` flag to `spvReflectCreateShaderModule2`
constructor function.
* Replaces `VectorView` with `Span` for consistency
* Fixes unnecessary allocations in D3D12 shader container in
`_convert_spirv_to_nir` and `_convert_spirv_to_dxil` which implicitly
converted the old `VectorView` to a `Vector`
If the allocation is small enough that it enters the
if (p_size <= SMALL_ALLOCATION_MAX_SIZE) {} block, Godot would call
vmaFindMemoryTypeIndexForBufferInfo with the wrong parameters.
This can cause vmaFindMemoryTypeIndexForBufferInfo to potentially
misbehave on some cards or drivers.
Fixes regression introduced in #102830
Might potentially reopen#101850 (I doubt it, but it's possible)
Must be backported to 4.4
VMA handles memory allocation on certain devices better than our custom VK code, so we might as well use it
Co-authored-by: Matias N. Goldberg <dark_sylinc@yahoo.com.ar>
This avoids crashing on devices when a number of varyings greater than the device limit is used.
For now this accurately prints an error when compiling the shader, but the error text only pops up in the editor if the number of user varyings is above the limit.
PR #90993 needed to get rid of VMA_MEMORY_USAGE_AUTO_PREFER_HOST because
we no longer used vmaCreateBuffer so we could specify the allocation
callbacks.
This however resulted in the wrong memory pool being chosen, causing
signficant performance slowdown.
Indicate additional preferred flags to help VMA select the proper pool.
Fixes#101905
These misaligned accesses are shown in all of our CI hooks. It turned
out to not be difficult to fix.
It is likely that this will improve performance for aarch64.