Commit graph

930 commits

Author SHA1 Message Date
bunnei
bba996075e Merge pull request #4953 from lioncash/shader-shadow
shader_bytecode: Eliminate variable shadowing
2020-11-20 16:58:14 -08:00
Lioncash
b482ef0f06 shader_bytecode: Make use of [[nodiscard]] where applicable
Ensures that all queried values are made use of.
2020-11-20 02:20:37 -05:00
Lioncash
b0172c1888 shader_bytecode: Eliminate variable shadowing 2020-11-20 02:13:45 -05:00
ReinUsesLisp
bdd8e12504 maxwell_3d: Use insert instead of loop push_back
This reduces the overhead of bounds checking on each element.
It won't reduce the cost of allocation because usually this vector's
capacity is usually large enough to hold whatever we push to it.
2020-11-11 19:52:19 -03:00
ReinUsesLisp
45cfd67ae6 maxwell_3d: Move code to separate functions
Deduplicate some code and put it in separate functions so it's easier to
understand and profile.
2020-11-11 19:52:19 -03:00
ReinUsesLisp
016fb60ff5 shader/arithmetic: Implement FCMP immediate + register variant
Trivially add the encoding for this.
2020-10-28 17:05:41 -03:00
ReinUsesLisp
31c5fbd15d video_core: Enforce -Wclass-memaccess 2020-10-09 16:46:11 -03:00
ReinUsesLisp
ca90a52bea video_core: Enforce -Wunused-variable and -Wunused-but-set-variable 2020-10-02 21:19:35 -03:00
Lioncash
e457001dce General: Make use of std::nullopt where applicable
Allows some implementations to avoid completely zeroing out the internal
buffer of the optional, and instead only set the validity byte within
the structure.

This also makes it consistent how we return empty optionals.
2020-09-22 17:32:33 -04:00
Lioncash
e486fc26a8 fermi_2d: Make use of designated initializers
Same behavior, less repetition. We can also ensure all members of Config
are initialized.
2020-09-18 13:55:21 -04:00
ReinUsesLisp
1c61cf29b6 video_core: Initialize renderer with a GPU
Add an extra step in GPU initialization to be able to initialize render
backends with a valid GPU instance.
2020-08-22 01:51:45 -03:00
bunnei
e6984ce40f Merge pull request #4519 from lioncash/semi
maxwell_3d: Resolve -Wextra-semi warning
2020-08-16 00:55:15 -04:00
Lioncash
6dd94f94e6 maxwell_3d: Resolve -Wextra-semi warning
Semicolons after a function definition aren't necessary.
2020-08-14 08:13:41 -04:00
ReinUsesLisp
9a83f8794b textures/decoders: Fix block linear to pitch copies
There were two issues with block linear copies. First the swizzling was
wrong and this commit reimplements them.

The other issue was that these copies are generally used to download
render targets from the GPU and yuzu was not downloading them from
host GPU memory unless the extreme GPU accuracy setting was selected.
This commit enables cached memory reads for all accuracy levels.

- Fixes level thumbnails in Super Mario Maker 2.
2020-08-10 20:45:03 -03:00
ReinUsesLisp
2691f5e6b9 video_core/textures: Add and use SwizzleSliceToVoxel, and minor style changes
Change GOB sizes from free-functions to constexpr constants.

Add SwizzleSliceToVoxel, a function that swizzles a 2D array of pixels
into a 3D texture and use it for 3D copies.
2020-07-10 04:09:32 -03:00
ReinUsesLisp
ef1ba82f42 maxwell_dma: Rename registers to match official docs and reorder
Rename registers in the MaxwellDMA class to match Nvidia's official
documentation. This one can be found here:

https://github.com/NVIDIA/open-gpu-doc/blob/master/classes/dma-copy/clb0b5.h

While we are at it, reorganize the code in MaxwellDMA to be separated in
different functions.
2020-07-07 19:19:33 -03:00
bunnei
7048f18d87 Merge pull request #4147 from ReinUsesLisp/hset2-imm
shader/half_set: Implement HSET2_IMM
2020-06-26 23:14:56 -04:00
David Marcec
da2cbc76fc Addressed issues 2020-06-24 12:09:03 +10:00
David Marcec
74e48d2a98 Macro HLE support 2020-06-24 12:09:01 +10:00
ReinUsesLisp
43e5214413 shader/half_set: Implement HSET2_IMM
Add HSET2_IMM. Due to the complexity of the encoding avoid using
BitField unions and read the relevant bits from the code itself.
This is less error prone.
2020-06-22 20:51:18 -03:00
bunnei
8d47f26be9 Merge pull request #4049 from ReinUsesLisp/separate-samplers
shader/texture: Join separate image and sampler pairs offline
2020-06-13 13:48:27 -04:00
ReinUsesLisp
77fbdd7e1d texture_cache: Implement rendering to 3D textures
This allows rendering to 3D textures with more than one slice.
Applications are allowed to render to more than one slice of a texture
using gl_Layer from a VTG shader.

This also requires reworking how 3D texture collisions are handled, for
now, this commit allows rendering to slices but not to miplevels. When a
render target attempts to write to a mipmap, we fallback to the previous
implementation (copying or flushing as needed).

- Fixes color correction 3D textures on UE4 games (rainbow effects).
- Allows Xenoblade games to render to 3D textures directly.
2020-06-08 05:01:00 -03:00
ReinUsesLisp
2f28ac0ada shader/texture: Join separate image and sampler pairs offline
Games using D3D idioms can join images and samplers when a shader
executes, instead of baking them into a combined sampler image. This is
also possible on Vulkan.

One approach to this solution would be to use separate samplers on
Vulkan and leave this unimplemented on OpenGL, but we can't do this
because there's no consistent way of determining which constant buffer
holds a sampler and which one an image. We could in theory find the
first bit and if it's in the TIC area, it's an image; but this falls
apart when an image or sampler handle use an index of zero.

The used approach is to track for a LOP.OR operation (this is done at an
IR level, not at an ISA level), track again the constant buffers used as
source and store this pair. Then, outside of shader execution, join
the sample and image pair with a bitwise or operation.

This approach won't work on games that truly use separate samplers in a
meaningful way. For example, pooling textures in a 2D array and
determining at runtime what sampler to use.

This invalidates OpenGL's disk shader cache :)

- Used mostly by D3D ports to Switch
2020-06-05 00:24:51 -03:00
bunnei
8bf40a4abc Merge pull request #4009 from ogniK5377/macro-jit-prod
video_core: Implement Macro JIT
2020-06-04 11:40:52 -04:00
David Marcec
9eb0c2c15e Default init labels and use initializer list for macro engine 2020-06-04 22:23:07 +10:00
David Marcec
cb42f51dc1 Mark parameters as const 2020-06-03 16:33:38 +10:00
David Marcec
d9082de7ea Pass by reference instead of copying parameters 2020-06-02 16:37:06 +10:00
bunnei
25e850e83c Merge pull request #3998 from ReinUsesLisp/init-3d
maxwell_3d: Initialize more registers to their expected value
2020-06-01 16:11:56 -04:00
David Marcec
05eeb7de3d Implement macro JIT 2020-05-30 11:40:04 +10:00
ReinUsesLisp
d1e0f2095c maxwell_3d: Reduce severity of logs that can be spammed
These logs were killing performance on some games when they were
spammed. Reduce them to Debug severity.
2020-05-28 18:23:25 -03:00
ReinUsesLisp
de665d6485 maxwell_3d: Initialize line widths
Initialize line widths to avoid setting a line width of zero.
2020-05-27 16:53:43 -03:00
ReinUsesLisp
2dba4bc34f maxwell_3d: Initialize polygon modes
NVN expects this to be initialized as Fill, otherwise games that never
bind a rasterizer state will log an invalid polygon mode.
2020-05-27 16:52:52 -03:00
bunnei
b0d4fdb600 Merge pull request #3899 from ReinUsesLisp/float-comparisons
shader_ir: Add separate instructions for ordered and unordered comparisons and fix NE on GLSL
2020-05-13 09:51:14 -04:00
ReinUsesLisp
d8598a8717 shader_ir: Separate float-point comparisons in ordered and unordered
This allows us to use native SPIR-V instructions without having to
manually check for NAN.
2020-05-09 04:55:15 -03:00
bunnei
4fb531a576 Merge pull request #3885 from ReinUsesLisp/viewport-swizzles
video_core: Implement viewport swizzles with NV_viewport_swizzle
2020-05-08 15:16:53 -04:00
bunnei
a1d3586e60 Merge pull request #3815 from FernandoS27/command-list-2
GPU: More optimizations to GPU Command List Processing and DMA Copy Optimizations
2020-05-05 17:12:42 -04:00
ReinUsesLisp
1f5d41ac05 vk_graphics_pipeline: Implement viewport swizzles with NV_viewport_swizzle 2020-05-04 18:31:17 -03:00
ReinUsesLisp
ff7b0d7329 maxwell_3d: Add viewport swizzles 2020-05-04 17:50:59 -03:00
bunnei
99075400e3 Merge pull request #3808 from ReinUsesLisp/wait-for-idle
{maxwell_3d,buffer_cache}: Implement memory barriers using 3D registers
2020-05-03 02:43:18 -04:00
bunnei
6a57e5fc49 Merge pull request #3807 from ReinUsesLisp/fix-depth-clamp
maxwell_3d: Fix depth clamping register
2020-04-30 13:07:31 -04:00
bunnei
e043a955f1 Merge pull request #3799 from ReinUsesLisp/iadd-cc
shader: Implement P2R CC, IADD Rd.CC and IADD.X
2020-04-30 12:56:36 -04:00
Fernando Sahmkow
9f9e662f1f Clang Format and Documentation. 2020-04-28 14:02:51 -04:00
Fernando Sahmkow
4d2b23d3c5 MaxwellDMA: Optimize micro copies. 2020-04-28 13:44:14 -04:00
ReinUsesLisp
8835d40024 {maxwell_3d,buffer_cache}: Implement memory barriers using 3D registers
Drop MemoryBarrier from the buffer cache and use Maxwell3D's register
WaitForIdle.

To implement this on OpenGL we just call glMemoryBarrier with the
necessary bits.

Vulkan lacks this synchronization primitive, so we set an event and
immediately wait for it. This is not a pretty solution, but it's what
Vulkan can do without submitting the current command buffer to the queue
(which ends up being more expensive on the CPU).
2020-04-28 02:18:12 -03:00
Fernando Sahmkow
b916b58702 VideoCore/Engines: Refactor Engines CallMethod. 2020-04-27 21:47:58 -04:00
ReinUsesLisp
b82e61dff0 maxwell_3d: Fix depth clamping register
Using deko3d as reference:
4e47ba0013/source/maxwell/gpu_3d_state.cpp (L42)

We were using bits 3 and 4 to determine depth clamping, but these are
the same both enabled and disabled:

state->depthClampEnable ? 0x101A : 0x181D

The same happens on Nvidia's OpenGL driver, where they do something like
this (default capabilities, GL 4.5 compatibility):

(state & DEPTH_CLAMP) != 0 ? 0x201a : 0x281c

There's always a difference between the first bits in this register, but
bit 11 is consistently disabled on both deko3d/NVN and OpenGL. This
commit changes yuzu's behaviour to use bit 11 to determine depth
clamping.

- Fixes depth issues on Super Mario Odyssey's intro.
2020-04-27 20:50:14 -03:00
bunnei
d952b2e1da Merge pull request #3742 from FernandoS27/command-list
Optimize GPU Command Lists and Introduce Fast GPU Time Option
2020-04-27 00:18:46 -04:00
Rodrigo Locatti
0f7a89c2ef Merge pull request #3753 from ReinUsesLisp/ac-vulkan
{gl,vk}_rasterizer: Add lazy default buffer maker and use it for empty buffers
2020-04-26 01:55:43 -03:00
ReinUsesLisp
6404cd824b shader/arithmetic_integer: Implement IADD.X
IADD.X takes the carry flag and adds it to the result. This is generally
used to emulate 64-bit operations with 32-bit registers.
2020-04-25 22:56:11 -03:00
bunnei
05a54192f8 Merge pull request #3734 from ReinUsesLisp/half-float-mods
decode/arithmetic_half: Fix HADD2 and HMUL2 absolute and negation bits
2020-04-25 00:41:43 -04:00