Commit graph

5261 commits

Author SHA1 Message Date
Rodrigo Locatti
bca641cf8a astc_decoder: Reimplement Layers
Reimplements the approach to decoding layers in the compute shader. Fixes multilayer astc decoding when using Vulkan.
2021-03-13 12:16:03 -05:00
ameerj
1a1076f154 astc_decoder: Fix out of bounds memory access
resolves a crash with some anamolous textures found in Astral Chain.
2021-03-13 12:16:03 -05:00
ameerj
cac341dbc7 renderer_vulkan: Accelerate ASTC decoding
Co-Authored-By: Rodrigo Locatti <reinuseslisp@airmail.cc>
2021-03-13 12:16:03 -05:00
ameerj
bdec905c4a host_shaders: Modify shader cmake integration to allow for larger shaders
using a raw string to encapsulate the entire shader code limits us to shaders of size less than 2KB. This change overcomes this limitation.
2021-03-13 12:16:03 -05:00
ameerj
113734e488 renderer_opengl: Accelerate ASTC texture decoding with a compute shader
ASTC texture decoding is currently handled by a CPU decoder for GPU's without native ASTC decoding support (most desktop GPUs). This is the cause for noticeable performance degradation in titles which use the format extensively.

This commit adds support to accelerate ASTC decoding using a compute shader on OpenGL for GPUs without native support.
2021-03-13 12:16:03 -05:00
bunnei
5922788d3e Merge pull request #6028 from bunnei/raster-cache
video_core: rasterizer_accelerated: Use a flat array instead of interval_map for cached pages.
2021-03-12 21:57:27 -08:00
bunnei
044f6a53c9 video_core: rasterizer_accelerated: Fix un/signed mismatch. 2021-03-12 21:52:49 -08:00
Rodrigo Locatti
1f0cb72f40 Merge pull request #5891 from ameerj/bgra-ogl
renderer_opengl: Use compute shaders to swizzle BGR textures on copy
2021-03-09 02:47:51 -03:00
bunnei
04c1eff0e4 Merge pull request #6021 from ReinUsesLisp/skip-cache-heuristic
buffer_cache: Heuristically decide to skip cache on uniform buffers
2021-03-08 17:48:55 -08:00
ameerj
fb79ff4cab texture_cache: Blacklist BGRA8 copies and views on OpenGL
In order to force the BGRA8 conversion on Nvidia using OpenGL, we need to forbid texture copies and views with other formats.

This commit also adds a boolean relating to this, as this needs to be done only for the OpenGL api, Vulkan must remain unchanged.
2021-03-04 14:14:49 -05:00
ameerj
594860b216 renderer_opengl: Swizzle BGR textures on copy
OpenGL does not natively support BGR internal formats, which causes many BGR textures to render incorrectly, with Red and Blue channels swapped.

This commit aims to address this by swizzling the blue and red channels on texture copies when a BGR format is encountered.
2021-03-04 14:14:19 -05:00
bunnei
688e937d8f Merge pull request #5989 from ReinUsesLisp/cmdpool
vk_command_pool: Reduce the command pool size from 4096 to 4
2021-03-04 11:07:31 -08:00
bunnei
1f1170eb3d video_core: rasterizer_accelerated: Fix delta check ordering. 2021-03-02 17:48:02 -08:00
bunnei
02870daa16 video_core: rasterizer_accelerated: Improve error handling & fix implicit conversion. 2021-03-02 17:44:02 -08:00
bunnei
489b5cca7c video_core: rasterizer_accelerated: Use a flat array instead of interval_map for cached pages.
- Uses a fixed 64MB for the cache instead of an ever growing map.
- Slightly faster by using atomics instead of a single mutex for access.
- Thanks for Rodrigo for the idea.
2021-03-02 16:57:53 -08:00
ReinUsesLisp
06028cda0c buffer_cache: Heuristically decide to skip cache on uniform buffers
Some games benefit from skipping caches (Pokémon Sword), and others
don't (Animal Crossing: New Horizons). Add an heuristic to decide this
at runtime.

The cache hit ratio has to be ~98% or better to not skip the cache.
There are 16 frames of buffer.
2021-03-02 02:44:19 -03:00
ameerj
37d4ac1f6e gpu_thread: Remove Async NVDEC placeholders
This commit removes early placeholders for an implementation of async nvdec. With recent changes to the source code, the placeholders are no longer accurate, and can cause a nullptr dereference due to the nature of the cdma_pusher lifetime.
2021-02-28 22:03:00 -05:00
bunnei
5fa255b19c Merge pull request #5984 from jbeich/gcc-freebsd
common,video-core: unbreak GCC 11 build on FreeBSD 13
2021-02-27 14:15:00 -07:00
bunnei
cfe967f1ac Merge pull request #5953 from bunnei/memory-refactor-1
Kernel Rework: Memory updates and refactoring (Part 1)
2021-02-27 12:48:35 -07:00
Kelebek1
f924b0efce Implement glDepthRangeIndexeddNV 2021-02-24 22:26:53 +00:00
ReinUsesLisp
da2876ad7b vk_command_pool: Reduce the command pool size from 4096 to 4
This allows drivers to reuse memory more easily and preallocate less.
The optimal number has been measured booting Pokémon Sword.
2021-02-23 19:08:24 -03:00
Jan Beich
7936e43cc1 video_core: add missing header after a4e811af27
src/video_core/shader_notify.cpp: In member function 'void VideoCore::ShaderNotify::MarkShaderComplete()':
src/video_core/shader_notify.cpp:33:10: error: 'unique_lock' is not a member of 'std'
   33 |     std::unique_lock lock{mutex};
      |          ^~~~~~~~~~~
src/video_core/shader_notify.cpp:6:1: note: 'std::unique_lock' is defined in header '<mutex>'; did you forget to '#include <mutex>'?
    5 | #include "video_core/shader_notify.h"
  +++ |+#include <mutex>
    6 |
src/video_core/shader_notify.cpp: In member function 'void VideoCore::ShaderNotify::MarkSharderBuilding()':
src/video_core/shader_notify.cpp:38:10: error: 'unique_lock' is not a member of 'std'
   38 |     std::unique_lock lock{mutex};
      |          ^~~~~~~~~~~
src/video_core/shader_notify.cpp:38:10: note: 'std::unique_lock' is defined in header '<mutex>'; did you forget to '#include <mutex>'?
2021-02-23 00:04:36 +00:00
bunnei
adc9097952 Merge pull request #5936 from Kelebek1/Offsets
Offsets for TexelFetch and TextureGather in Vulkan
2021-02-21 21:23:45 -07:00
Morph
f542011e0c gl_disk_shader_cache: Log total shader entries count on game load 2021-02-20 11:08:19 -05:00
bunnei
c9770f92d8 Merge pull request #5924 from ReinUsesLisp/inline-bindings
vk_update_descriptor: Inline and improve code for binding buffers
2021-02-19 12:27:10 -08:00
bunnei
5dbcaa2970 hle: kernel: Migrate PageHeap/PageTable to KPageHeap/KPageTable. 2021-02-18 16:16:25 -08:00
bunnei
0872ba7130 Merge pull request #4973 from ameerj/nvdec-opt
nvdec: Reuse allocated buffers and general cleanup
2021-02-18 15:12:07 -08:00
ReinUsesLisp
76e2d40963 vk_rasterizer: Fix loading shader addresses twice
This was recently introduced on a wrongly rebased commit.
2021-02-15 21:34:13 -03:00
bunnei
0b63701ebf Merge pull request #5923 from ReinUsesLisp/vk-dirty-pipeline
fixed_pipeline_cache: Use dirty flags to lazily update key
2021-02-15 13:17:27 -08:00
Kelebek1
16a5c56b7c Review 1 2021-02-15 05:26:28 +00:00
Kelebek1
4e04e95a8e Implement texture offset support for TexelFetch and TextureGather and add offsets for Tlds
Formatting
2021-02-15 00:36:37 +00:00
bunnei
fddde225c5 yuzu: Various frontend improvements to avoid crashes and improve experience on Linux. 2021-02-14 00:20:41 -08:00
ReinUsesLisp
ec1854363e vk_resource_pool: Load GPU tick once and compare with it
Other minor style improvements. Rename free_iterator to hint_iterator,
to describe better what it does.
2021-02-13 17:53:58 -03:00
ReinUsesLisp
7fa30ea272 vk_update_descriptor: Inline and improve code for binding buffers
Allow compilers with our settings inline hot code.
2021-02-13 17:46:24 -03:00
ReinUsesLisp
261380d2b6 fixed_pipeline_cache: Use dirty flags to lazily update key
Use dirty flags to avoid building pipeline key from scratch on each draw
call. This saves a bit of unnecesary work on each draw call.
2021-02-13 17:44:47 -03:00
ameerj
c18cef2a9b gl_texture_cache: Lazily create non-sRGB texture views for sRGB formats
This creates non-sRGB texture views for sRGB texture formats to allow for interfacing with these views in compute shaders using imageLoad and imageStore.

Co-Authored-By: Rodrigo Locatti <reinuseslisp@airmail.cc>
2021-02-13 13:27:50 -05:00
ameerj
01dec35df3 rebase, fix name shadowing, more const 2021-02-13 13:07:56 -05:00
ameerj
c0ccf9eac5 Address PR feedback
Co-Authored-By: LC <712067+lioncash@users.noreply.github.com>
2021-02-13 13:07:56 -05:00
ameerj
427eca063d streamline cdma_pusher/command_classes 2021-02-13 13:07:56 -05:00
ameerj
e97cd00753 streamline cdma_pusher/command_classes 2021-02-13 13:07:53 -05:00
ameerj
be6c487b4e nvdec cleanup 2021-02-13 13:07:31 -05:00
Morph
8c2e076292 Merge pull request #5919 from ReinUsesLisp/stream-buffer-tragic
gl_stream_buffer/vk_staging_buffer_pool: Fix size check
2021-02-13 21:25:45 +08:00
ReinUsesLisp
898de871a9 vk_master_semaphore: Mark gpu_tick atomic operations with relaxed order 2021-02-13 05:57:28 -03:00
ReinUsesLisp
6f5d45aecc vk_staging_buffer_pool: Inline tick tests
Load the current tick to a local variable, moving it out of an atomic
and allowing us to compare the value without going through a pointer
each time. This should make the loop more optimizable.
2021-02-13 05:14:11 -03:00
ReinUsesLisp
2f40ef90c5 gl_stream_buffer/vk_staging_buffer_pool: Fix size check
Fix a tragic off-by-one condition that causes Vulkan's stream buffer to
think it's always full, using fallback memory. The OpenGL was also
affected by this bug to a lesser extent.
2021-02-13 05:11:48 -03:00
LC
113cd18847 Merge pull request #5916 from ameerj/maxwell-gl-unused
maxwell_to_gl: Remove unused code
2021-02-13 02:55:59 -05:00
ReinUsesLisp
bf10ce380b vulkan_device: Require VK_EXT_robustness2
We are already using robustness2 features without requiring it
explicitly, causing potential crashes on drivers without the extension.
Requiring this at boot allows better diagnostics for it and formalizes
our usage on the extension.
2021-02-13 03:31:50 -03:00
ReinUsesLisp
3f190b946c video_core: Fix clang build issues 2021-02-13 02:26:47 -03:00
ReinUsesLisp
50d8c1eb35 vk_staging_buffer_pool: Fix softlock when stream buffer overflows
There was still a code path that could wait on a timeline semaphore tick
that would never be signalled.

While we are at it, make use of more STL algorithms.
2021-02-13 02:18:38 -03:00
ReinUsesLisp
0cc70777ca vk_buffer_cache: Add support for null index buffers
Games can bind a null index buffer (size=0) where all indices are
evaluated as zero. VK_EXT_robustness2 doesn't support this and all
drivers segfault when a null index buffer is passed to
vkCmdBindIndexBuffer.

Workaround this by creating a 4 byte buffer and filling it with zeroes.
If it's read out of bounds, robustness takes care of returning zeroes as
indices.
2021-02-13 02:18:38 -03:00